Posted by Timo Kohlberger and Yuan Liu, Software Engineers, Google Health
The progress in machine learning (ML) for medical imaging that helps doctors provide better diagnoses has partially been driven by the use of large, meticulously labeled datasets. However, dataset size can be limited in real life due to privacy concerns, low patient volume at partner institutions, or by virtue of studying rare diseases. Moreover, to ensure that ML models generalize well, they need training data that span a range of subgroups, such as skin type, demographics, and imaging devices. Requiring that the size of each combinatorial subgroup (e.g., skin type A with skin condition B, taken by camera C) is also sufficiently large can quickly become impractical.
Today we are happy to share two projects aimed at both improving the diversity of ML training data, and increasing the effective amount of available training data for medical applications. The first project is a configurable method for generation of synthetic skin lesion images in order to improve coverage of rarer skin types and conditions. The second project uses synthetic images as training data to develop an ML model, that can better interpret different biological tissue types across a range of imaging devices.
Generating Diverse Images of Skin Conditions In “DermGAN: Synthetic Generation of Clinical Skin Images with Pathology”, published in the Machine Learning for Health (ML4H) workshop at NeurIPS 2019, we address problems associated with data diversity in de-identified dermatology images taken by consumer grade cameras. This work addresses (1) the scarcity of imaging data representative of rare skin conditions, and (2) the lower frequency of data covering certain Fitzpatrick skin types. Fitzpatrick skin types range from Type I (“pale white, always burns, never tans”) to Type VI (“darkest brown, never burns”), with datasets generally containing relative few cases at the “boundaries”. In both cases, data scarcity problems are exacerbated by the low signal-to-noise ratio common in the target images, due to the lack of standardized lighting, contrast and field-of-view; variability of the background, such as furniture and clothing; and the fine details of the skin, like hair and wrinkles.
To improve diversity in the skin images, we developed a model, called DermGAN, which generates skin images that exhibit the characteristics of a given pre-specified skin condition, location, and underlying skin color. DermGAN uses an image-to-image translation approach, based on the pix2pix generative adversarial network (GAN) architecture, to learn the underlying mapping from one type of image to another.
DermGAN takes as input a real image and its corresponding, pre-generated semantic map representing the underlying characteristics of the real image (e.g., the skin condition, location of the lesion, and skin type), from which it will generate a new synthetic example with the requested characteristics. The generator is based on the U-Net architecture, but in order to mitigate checkerboard artifacts, the deconvolution layers are replaced with a resizing layer, followed by a convolution. A few customized losses are introduced to improve the quality of the synthetic images, especially within the pathological region. The discriminator component of DermGAN is solely used for training, whereas the generator is evaluated both visually and for use in augmenting the training dataset for a skin condition classifier.
Overview of the generator component of DermGAN. The model takes an RGB semantic map (red box) annotated with the skin condition’s size and location (smaller orange rectangle), and outputs a realistic skin image. Colored boxes represent various neural network layers, such as convolutions and ReLU; the skip connections resemble the U-Net and enable information to be propagated at the appropriate scales.
The top row shows generated synthetic examples and the bottom row illustrates real images of basal cell carcinoma (left) and melanocytic nevus (right). More examples can be found in the paper.
In addition to generating visually realistic images, our method enables generation of images of skin conditions or skin types that are more rare and that suffer from a paucity of dermatologic images.
DermGAN can be used to generate skin images (all with melanocytic nevus in this case) with different background skin types (top, by changing the input skin color) and different-sized lesions (bottom, by changing the input lesion size). As the input skin color changes, the lesion changes appearance to match what the lesion would look like on different skin types.
Early results indicated that using the generated images as additional data to train a skin condition classifier may improve performance at detecting rare malignant conditions, such as melanoma. However, more work is needed to explore how best to utilize such generated images to improve accuracy more generally across rarer skin types and conditions.
Generating Pathology Images with Different Labels Across Diverse Scanners The focus quality of medical images is important for accurate diagnoses. Poor focus quality can trigger both false positives and false negatives, even in otherwise accurate ML-based metastatic breast cancer detection algorithms. Determining whether or not pathology images are in-focus is difficult due to factors such as the complexity of the image acquisition process. Digitized whole-slide images could have poor focus across the entire image, but since they are essentially stitched together from thousands of smaller fields of view, they could also have subregions with different focus properties than the rest of the image. This makes manual screening for focus quality impractical and motivates the desire for an automated approach to detect poorly-focused slides and locate out-of-focus regions. Identifying regions with poor focus might enable re-scanning, or yield opportunities to improve the focusing algorithms used during the scanning process.
In our second project, presented in “Whole-slide image focus quality: Automatic assessment and impact on AI cancer detection”, published in the Journal of Pathology Informatics, we develop a method of evaluating de-identified, large gigapixel pathology images for focus quality issues. This involved training a convolutional neural network on semi-synthetic training data that represent different tissue types and slide scanner optical properties. However, a key barrier towards developing such an ML-based system was the lack of labeled data — focus quality is difficult to grade reliably and labeled datasets were not available. To exacerbate the problem, because focus quality affects minute details of the image, any data collected for a specific scanner may not be representative of other scanners, which may have differences in the physical optical systems, the stitching procedure used to recreate a large pathology image from captured image tiles, white-balance and post-processing algorithms, and more. This led us to develop a novel multi-step system for generating synthetic images that exhibit realistic out-of-focus characteristics.
We deconstructed the process of collecting training data into multiple steps. The first step was to collect images from various scanners and to label in-focus regions. This task is substantially easier than trying to determine the degree to which an image is out of focus, and can be completed by non-experts. Next, we generated synthetic out-of-focus images, inspired by the sequence of events that happen prior to a real out-of-focus image is captured: the optical blurring effect happens first, followed by those photons being collected by a sensor (a process that adds sensor noise), and finally software compression adds noise.
A sequence of images showing step-wise out-of-focus image generation. Images are shown in grayscale to accentuate the difference between steps. First, an in-focus image is collected (a) and a bokeh effect is added to produce a blurry image (b). Next, sensor noise is added to simulate a real image sensor (c), and finally JPEG compression is added to simulate the sharp edges introduced by post-acquisition software processing (d). A real out-of-focus image is shown for comparison (e).
Our study shows that modeling each step is essential for optimal results across multiple scanner types, and remarkably, enabled the detection of spectacular out-of-focus patterns in real data:
An example of a particularly interesting out-of-focus pattern across a biological tissue slice. Areas in blue were recognized by the model to be in-focus, whereas areas highlighted in yellow, orange, or red were more out of focus. The gradation in focus here (represented by concentric circles: a red/orange out-of-focus center surrounded by green/cyan mildly out-of-focus, and then a blue in-focus ring) was caused by a hard “stone” in the center that lifted the surrounding biological tissue.
Implications and Future Outlook Though the volume of data used to develop ML systems is seen as a fundamental bottleneck, we have presented techniques for generating synthetic data that can be used to improve the diversity of training data for ML models and thereby improve the ability of ML to work well on more diverse datasets. We should caution though that these methods are not appropriate for validation data, so as to avoid bias such as an ML model performing well only on synthetic data. To ensure unbiased, statistically-rigorous evaluation, real data of sufficient volume and diversity will still be needed, though techniques such as inverse probability weighting (for example, as leveraged in our work on ML for chest X-rays) may be useful there. We continue to explore other approaches to more efficiently leverage de-identified data to improve data diversity and reduce the need for large datasets in the development of ML models for healthcare.
Acknowledgements These projects involved the efforts of multidisciplinary teams of software engineers, researchers, clinicians and cross functional contributors. Key contributors to these projects include Timo Kohlberger, Yun Liu, Melissa Moran, Po-Hsuan Cameron Chen, Trissia Brown, Jason Hipp, Craig Mermel, Martin Stumpe, Amirata Ghorbani, Vivek Natarajan, David Coz, and Yuan Liu. The authors would also like to acknowledge Daniel Fenner, Samuel Yang, Susan Huang, Kimberly Kanada, Greg Corrado and Erica Brand for their advice, members of the Google Health dermatology and pathology teams for their support, and Ashwin Kakarla and Shivamohan Reddy Garlapati for their team for image labeling.
Or, suppose instead I said to you that you should “Drive Safely: It’s the Law” – how would you react?
Perhaps I might say “Drive Safely or Get a Ticket.”
I could be even more succinct and simply say: Drive Safely.
These are all ways to generally say the same thing.
Yet, how you react to them can differ quite a bit.
Why would you react differently to these messages that all seem to be saying the same thing?
Because how the message is phrased will create a different kind of social context that your underlying social norms will react to.
If I simply say “Drive Safely”, it’s a rather perfunctory form of wording the message.
It’s quick, consisting of only two words. You likely would barely notice the message and you might also think that of course it’s important to drive safely. You might ignore the message due to it seemingly being obvious, or you might notice it and think to yourself that it’s kind of a handy reminder but that in the grand scheme of things it wasn’t that necessary, at least not for you (maybe it was intended for riskier drivers, you assume).
Consider next the version that says “Thank You for Driving Safely.”
This message is somewhat longer, having now five words, and takes more effort to read. As you parse the words of the message, the opening element is that you are being thanked for something. We all like being thanked. What is it that you are being thanked for, you might wonder. You then get to the ending of the message and realize you are being thanked for driving safely.
Most people would then maybe get a small smile on their face and think that this was a mildly clever way to urge people to drive safely. By thanking people, it gets them to consider that they need to do something to get the thanks, and the thing they need to do is drive safely. In essence, the message tries to create a reciprocity with the person – you are getting a thank you handed to you, and you in return are supposed to do something, namely you are supposed to drive safely.
Suppose you opt to not drive safely?
You’ve broken the convention of having been given something, the thanks, when it really was undeserved. In theory, you’ll not want to break such a convention and therefore will be motivated to drive safely. I’d say that none of us will necessarily go out of our way to drive safely merely due to the aspect that you need to repay the thank-you. On the other hand, maybe it will be enough of a social nudge that it puts you into a mental mindset of continuing to drive safely. It’s not enough to force you into driving safely, but it might keep you going along as a safe driver.
What about the version that says “Drive Safely: It’s the Law” and your reaction to it?
In this version, you are being reminded to drive safely and then you are being forewarned that it is something you are supposed to do. You are told that the law requires you to drive safely. It’s not really a choice per se, and instead it is the law. If you don’t drive safely, you are a lawbreaker. You might get into legal trouble.
The version that says “Drive Safely or Get a Ticket” is similar to the version warning you about the law, and steps things up a further notch.
If I tell you that something isn’t lawful, you need to make a mental leap that if you break the law there are potentially adverse consequences. In the case of the version telling you straight out that you’ll get a ticket, there’s no ambiguity about the aspect that not only must you drive safely but indeed there is a distinct penalty for not doing so.
None of us likes getting a ticket.
We’ve all had to deal with traffic tickets and the trauma of getting points dinged on our driving records, possibly having our car insurance rates hiked, and maybe needing to go to traffic school and suffer through boring hours of re-learning about driving. Yuk, nobody wants that. This version that mentions the ticket provides a specific adverse consequence if you don’t comply with driving safely.
The word-for-word wording of the drive safely message is actually quite significant as to how the message will be received by others and whether they will be prompted to do anything because of the message.
I realize that some of you might say that it doesn’t matter which of those wordings are used.
Aren’t we being rather tedious in parsing each such word?
Seems like a lot of focus on something that otherwise doesn’t need any attention. Well, you’d actually be somewhat mistaken in the assumption that those variants of wording do not make a difference. There are numerous psychology and cognition studies that show that the wording of a message can have an at times dramatic difference as to whether people notice the message and whether they take it to heart.
I’ll concentrate herein on one such element that makes those messages so different in terms of impact, namely due to the use of reciprocity.
Importance Of Reciprocity
Reciprocity is a social norm.
Cultural anthropologists suggest that it is a social norm that cuts across all cultures and all of time.
In essence, we seem to have always believed in and accepted reciprocity in our dealings with others, whether we explicitly knew it or not.
I tell you that I’m going to help you with putting up a painting on your wall. You now feel as though you owe me something in return. It might be that you would pay me for helping you. Or, it could be something else such as you might do something for me, such as you offer to help me cook a meal. We’re then balanced. I helped you with the painting, you helped me with the meal. In this case, we traded with each other, me giving you one type of service, and you providing in return to me some kind of service.
Of course, the trades could have been something other than a service.
I help you put up the painting (I’m providing a service to you), and you then hand me a six pack of beer. In that case, I did a service for you, and you gave me a product in return (the beers). Maybe instead things started out that you gave me a six-pack of beer (product) and I then offered to help put up your painting (a service). Or, it could be that you hand me the six pack of beers (product), and I hand you a pair of shoes (product).
In either case, one aspect is given to the other person, and the other person provides something in return. We seem to just know that this is the way the world works.
Is it in our DNA?
Is it something that we learn as children? Is it both?
There are arguments to be made about how it has come to be.
Regardless of how it came to be, it exists and actually is a rather strong characteristic of our behavior.
Let’s further unpack the nature of reciprocity.
I had mentioned that you gave me a six-pack of beer and I then handed you a pair of shoes. Is that a fair trade? Maybe those shoes are old, worn out, and have holes in them. You might not need them and even if you needed them you might not want that particular pair of shoes. Seems like an uneven trade. You are likely to feel cheated and regret the trade. You might harbor a belief that I was not fair in my dealings with you. You might expect that I will give you something else of greater value to make-up for the lousy shoes.
On the other hand, maybe I’m not a beer drinker and so you’re having given me beers seemed like an odd item to give to me. I might have thought that I’d give you an odd item in return. Perhaps in my mind, the trade was even. Meanwhile, in your mind, the trade was uneven.
There’s another angle too as to whether the trade was intended as a positive one or something that is a negative one. We both are giving each other things of value and presumably done in a positive way. It could be a negative action kind of trade instead. I hit you in the head with my fist, and so you then kick me in the shin. Negative actions as a reciprocity. It’s the old eye-for-an-eye kind of notion.
Time is a factor in reciprocity too. I will help you put up your painting. Perhaps the meal you are going to help me cook is not going to take place until several months from today. That’s going to be satisfactory in that we both at least know that there is a reciprocal arrangement underway.
If I help you with the painting, and there’s no discussion about what you’ll do for me, I’d walk away thinking that you owe me. You might also be thinking the same. Or, you could create an imbalance by not realizing you owe me, or maybe you are thinking that last year you helped me put oil into my car and so that’s what makes us even now on this most current trade.
Difficulties Of Getting Reciprocity Right
Reciprocity can be dicey.
There are ample ways that the whole thing can get com-bobbled.
I do something for you, you don’t do anything in return.
I do something for you of value N, and you provide in return something of perceived value Y that is substantively less than N. I do something for you, and you pledge to do something for me that’s a year from now, meanwhile I maybe feel cheated because I didn’t get more immediate value and also if you forget a year from now to make-up the trade then I forever might become upset. And so on.
I am assuming that you’ve encountered many of these kinds of reciprocity circumstances in your lifetime. You might not have realized at the time they were reciprocity situations. We often fall into them and aren’t overtly aware of it.
One of the favorite examples about reciprocity in our daily lives involves the seemingly simple act of a waiter or waitress getting a tip after having served a meal. Studies show that if the server brings out the check and includes a mint on the tray holding the check, this has a tendency to increase the amount of the tip. The people that have eaten the meal and are getting ready to pay will feel as though they owe some kind of reciprocity due to the mint being there on the tray. Research indicates that the tip will definitely go up by a modest amount as a result of the act of providing the mint.
A savvy waiter or waitress can further exploit this reciprocity effect. If they look you in the eye and say that the mint was brought out just for you and your guests, this boosts the tip even more so. The rule of reciprocity comes to play since the value of the aspect being given has gone up, namely it was at first just any old mint and now it is a special mint just for you all, and thus the trade in kind by you is going to increase to match somewhat to the increase in value of the offering. The timing involved is crucial too, in that if the mint was given earlier in the meal, it would not have as great an impact as coming just at the time that the payment is going to be made.
As mentioned, reciprocity doesn’t work on everyone in the same way.
The mint trick might not work on you, supposing you hate mints or you like them but perceive it of little value. Or, if the waiter or waitress has irked you the entire meal, it is unlikely that the mint at the end is going to dig them out of a hole. In fact, sometimes when someone tries the reciprocity trick, it can backfire on them. Upon seeing the mint and the server smiling at you, if you are already ticked-off about the meal and the service, it could actually cause you to go ballistic and decide to leave no tip or maybe ask for the manager and complain.
Here’s a recap then about the reciprocity notion:
Reciprocity is a social norm of tremendous power that seems to universally exist
Often fall into a reciprocity and don’t know it
Usually a positive action needs to be traded for another in kind
Usually a negative action needs to be traded for another in kind
An imbalance in the perceived trades can mar the arrangement
Trades can be services or products or combinations thereof
Time can be a factor as to immediate, short-term, or long-term
AI Autonomous Cars And Social Reciprocity
What does this have to do with AI self-driving driverless autonomous cars?
At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One crucial aspect of the AI will be the interaction with the human occupants of the self-driving car, and as such, the AI should be crafted to leverage reciprocity.
One of the areas of open research and discussion involves the nature of the interaction between the AI of a self-driving car and the human occupants that will be using the self-driving car. Some AI developers with a narrow view seem to think that all that the interaction consists of would be the human occupants saying to drive them to the store or to home, and that’s it.
This is a naive view.
The human occupants are going to want to have the AI much abler to carry on a conversation.
For my article about natural language processing and AI self-driving cars, see: https://aitrends.com/features/socio-behavioral-computing-for-ai-self-driving-cars/
For explanations capabilities of AI for the human occupants, see my article: https://aitrends.com/selfdrivingcars/explanation-ai-machine-learning-for-ai-self-driving-cars/
For a kind of reciprocity of the behavior of AI self-driving cars toward other cars and car drivers, see my article about the Prisoner’s Dilemma or also known as the tit-for-tat: https://aitrends.com/selfdrivingcars/tit-for-tat-and-ai-self-driving-cars/
Let’s consider an example of social reciprocity involving a passenger and driver dialogue.
You get into the AI self-driving car and decide you want to go to work.
Almost as though you are giving an instruction to a GPS, you merely indicate the address of work, and the self-driving car then proceeds to drive you there. Suppose that during the journey, you want to get some coffee, doing so before you get to work. You would want to convey this interim destination to the AI.
Before the days of Alexa and Siri, we might have all accepted the idea that we might need to enter such instructions into an on-board computer tablet, but I think we are now expecting to be able to have a more natural audio interaction with the AI system. If you’ve gone to the trouble to buy an expensive and shiny AI self-driving car, you certainly would hope that some effort went into making the system interactive with the human occupants.
A barebones “conversational” AI might be this:
You: Take me to work.
AI self-driving car: Okay
What does the word “Okay” mean in this situation? Does it mean that the AI is acknowledging that you have said you want to go to work, but that it doesn’t necessarily mean that the AI should immediately do so. Or, maybe it means that the AI is going to immediately start the self-driving car and drive you to work. Or, perhaps the AI misunderstood what you said and thought you said “I hate work” and therefore the AI is merely saying “Okay” as a placeholder word to seem responsive.
Let’s try this again:
You: Take me to work.
AI self-driving car: Starting to your work now.
This is a little better since the AI has echoed back to you that you want to go to work, and furthermore it is stating when this action will occur, specifically that it will occur right away or now. This seems somewhat flat though. If you were conversing with a human chauffeur, they probably would be more embellishing in what they might say.
Here’s another try:
You: Take me to work.
AI self-driving car: Good morning, I will head to your work at 1234 Lauren Street. Hope you will have a pleasant day at work!
This provides more specific feedback, including the work address, which helps to confirm that you really intended to go to work. It might be that you said the word “work” out of habit and meant to say “home” – by hearing the address it might cause you to realize that you had said the wrong word. The part of the message that says “good morning” and the other part that says “have a pleasant day” are added embellishments that presumably give a softer feeling to the messaging and makes things seem less robotic.
One criticism of having the AI utter “good morning” and “have a pleasant day” is that it implies perhaps that the AI actually means those things.
When I speak those words to you, you assume that I as a human have measured out those words and that I presumably know what it means to have a good morning, and so with my knowledge about the nature of mornings, I am genuinely hoping that you have a good one. If you see the words “good morning” written on a poster, you don’t consider that the poster knows anything about the meaning of those words. When the AI system speaks those words, you are likely to be “fooled” into thinking that the AI system “understands” the nature of mornings and is basing those words on a sense of the world.
But, the AI of today is more akin to the poster, it is merely showcasing those words and does not yet (at least) comprehend the true meaning of those words.
Do we want the AI to seem to be more aware than it really is?
That’s an important question. If the human occupants believe that the AI has some form of human awareness and knowledge, the human occupant might get themselves into a pickle by trying to converse with the AI system. Suppose the human starts to suffer a heart attack, and believes that the AI has human-like understanding, and so the human says “help, I’m having a heart attack” – if you said this to another adult, the human adult would likely realize that you are in trouble, they might call 911, they might try to aid you, etc.
The AI of today would not know what you’ve said per se. You might have been misled into thinking that it would. If you believed that the AI was not so capable as a human, you might instead have uttered “take me to the nearest hospital” which then hopefully is similar to “take me to work” in that the system can parse the words and realize that it is a driving instruction. The AI would presumably then alter the driving path and instead drive the self-driving car to a nearby hospital.
This argument also comes up when AI developers put other human-like speaking mannerisms into their systems. Humans for example will often say “you know” or have pauses in their speaking or say “uh” while speaking. To-date, most of the AI natural language processing systems have generated spoken words that are streamlined and omit those human mannerisms. Lately, some AI developers have been including them. This gives the AI spoken words a closer to human-like feel.
Here’s an example:
You: Take me to work.
AI self-driving car: Uh, Okay, sounds good… will do so.
The added elements to the response are going to suggest to the human listener that the AI is composing speech in a manner similar to a human. This anthropomorphizes the AI system, which as I’ve mentioned before can be good if you believe that’s a desirable way to bond with the human occupants, but it can also be bad in that it can mislead as to what the AI system can and cannot really do and understand.
The joint interaction in a self-driving car between the human occupants and the AI are potentially going to involve life and death matters.
This is an important point to keep in mind. If you are simply developing an online search engine that verbally interacts with humans or a system that can aid them in ordering toothbrushes and towels, whether the human believes that the AI is actually comprehending them is not particularly life threatening. In the case of a self-driving car, the human might need to offer insight to the AI about the driving task, or the AI might need to get input from the human about the driving task, any of which if improperly undertaken could have dire consequences.
For my article about the framework for AI self-driving cars, see: https://aitrends.com/selfdrivingcars/framework-ai-self-driving-driverless-cars-big-picture/
For my article about common sense reasoning and AI self-driving cars, see: https://aitrends.com/selfdrivingcars/common-sense-reasoning-and-ai-self-driving-cars/
Digging Into The AI-Human Social Reciprocity
This then takes us to the reciprocity aspects.
Should the AI system of the self-driving car carry on conversations that also invoke the use of reciprocity?
If you are of a mind that the AI should not over-inflate what it can actually comprehend, you’d probably say that it should not also imbue reciprocity.
If you are seeking to have the AI appear as sentient as it can, the use of reciprocity would be a core element to achieve that sense, since it is so common among humans and part-and-parcel of our daily lives.
Here’s an example:
You: Take me to work.
AI self-driving car: Happy to take you to work. Please make sure to keep your seat belts on.
You likely can detect the subtle reciprocity involved in the dialogue. The AI implies that it is doing something for the human occupant, taking them to work, and the punch-up is that the AI is “happy” to do so. This creates a human-like aspect as though the AI was doing this as a service for you. The trade is that in return, you are being asked to keep your seat belts on.
If we strip away the reciprocity, we’d get something like this instead:
You: Take me to work.
AI self-driving car: Okay. Keep your seat belts on.
This version is similar to the earlier example about the drive safely message. We’ve now got a plainer and straightforward kind of instruction or maybe even an edict, which was the same with the “drive safely” message. The “happy to take you to work” was more akin to the “thank you for driving safely” message that created a kind of quid-pro-quo element to the dialogue.
If we make the messaging more along the negative side, it might be something like this:
You: Take me to work.
AI self-driving car: Okay. Keep your seat belts on or I’ll stop the car and you won’t get to work on time.
Whoa! This sounds like some kind of fierce AI that is threatening you.
There are AI developers that would argue that this message is actually better than the others because it makes abundantly clear the adverse consequence if the human does not wear their seat belts.
Yes, it’s true that it does spell out the consequences, but it also perhaps sets up a “relationship” with the human occupant that’s going to be an angry one. It sets the tone in a manner that might cause the human to consider in what manner they want to respond back to the AI (angrily!).
If the AI system is intended to interact with the human occupants in a near-natural way, the role of reciprocity needs to be considered.
It is a common means of human to human interaction. Likewise, the AI self-driving car will be undertaking the driving task and some kind of give-and-take with the human occupants is likely to occur.
We believe that as AI Natural Language Processing (NLP) capabilities get better, incorporating reciprocity will further enhance the seeming natural part of natural language processing.
It is prudent though to be cautious in overstepping what can be achieved and the life-and-death consequences of human and AI interaction in a self-driving car context needs to be kept in mind.
Copyright 2020 Dr. Lance Eliot
This content is originally posted on AI Trends.
[Ed. Note: For reader’s interested in Dr. Eliot’s ongoing business analyses about the advent of self-driving cars, see his online Forbes column: https://forbes.com/sites/lanceeliot/]
Posted by Nathan Frey, Senior Software Engineer, Google Research, Los Angeles and Zheng Sun, Senior Software Engineer, Google Research, Mountain View
Videos filmed and edited for television and desktop are typically created and viewed in landscape aspect ratios (16:9 or 4:3). However, with an increasing number of users creating and consuming content on mobile devices, historical aspect ratios don’t always fit the display being used for viewing. Traditional approaches for reframing video to different aspect ratios usually involve static cropping, i.e., specifying a camera viewport, then cropping visual contents that are outside. Unfortunately, these static cropping approaches often lead to unsatisfactory results due to the variety of composition and camera motion styles. More bespoke approaches, however, typically require video curators to manually identify salient contents on each frame, track their transitions from frame-to-frame, and adjust crop regions accordingly throughout the video. This process is often tedious, time-consuming, and error-prone.
To address this problem, we are happy to announce AutoFlip, an open source framework for intelligent video reframing.AutoFlip is built on top of the MediaPipe framework that enables the development of pipelines for processing time-series multimodal data. Taking a video (casually shot or professionally edited) and a target dimension (landscape, square, portrait, etc.) as inputs, AutoFlip analyzes the video content, develops optimal tracking and cropping strategies, and produces an output video with the same duration in the desired aspect ratio.
Left: Original video (16:9). Middle: Reframed using a standard central crop (9:16). Right: Reframed with AutoFlip (9:16). By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content.
AutoFlip Overview AutoFlip provides a fully automatic solution to smart video reframing, making use of state-of-the-art ML-enabled object detection and tracking technologies to intelligently understand video content. AutoFlip detects changes in the composition that signify scene changes in order to isolate scenes for processing. Within each shot, video analysis is used to identify salient content before the scene is reframed by selecting a camera mode and path optimized for the contents.
Shot (Scene) Detection A scene or shot is a continuous sequence of video without cuts (or jumps). To detect the occurrence of a shot change, AutoFlip computes the color histogram of each frame and compares this with prior frames. If the distribution of frame colors changes at a different rate than a sliding historical window, a shot change is signaled. AutoFlip buffers the video until the scene is complete before making reframing decisions, in order to optimize the reframing for the entire scene.
Video Content Analysis We utilize deep learning-based object detection models to find interesting, salient content in the frame. This content typically includes people and animals, but other elements may be identified, depending on the application, including text overlays and logos for commercials, or motion and ball detection for sports.
The face and object detection models are integrated into AutoFlip through MediaPipe, which uses TensorFlow Lite on CPU. This structure allows AutoFlip to be extensible, so developers may conveniently add new detection algorithms for different use cases and video content. Each object type is associated with a weight value, which defines its relative importance — the higher the weight, the more influence the feature will have when computing the camera path.
Left: People detection on sports footage. Right: Two face boxes (‘core’ and ‘all’ face landmarks). In narrow portrait crop cases, often only the core landmark box can fit.
Reframing After identifying the subjects of interest on each frame, logical decisions about how to reframe the content for a new view can be made. AutoFlip automatically chooses an optimal reframing strategy — stationary, panning or tracking — depending on the way objects behave during the scene (e.g., moving around or stationary). In stationary mode, the reframed camera viewport is fixed in a position where important content can be viewed throughout the majority of the scene. This mode can effectively mimic professional cinematography in which a camera is mounted on a stationary tripod or where post-processing stabilization is applied. In other cases, it is best to pan the camera, moving the viewport at a constant velocity. The tracking mode provides continuous and steady tracking of interesting objects as they move around within the frame.
Based on which of these three reframing strategies the algorithm selects, AutoFlip then determines an optimal cropping window for each frame, while best preserving the content of interest. While the bounding boxes track the objects of focus in the scene, they typically exhibit considerable jitter from frame-to-frame and, consequently, are not sufficient to define the cropping window. Instead, we adjust the viewport on each frame through the process of Euclidean-norm optimization, in which we minimize the residuals between a smooth (low-degree polynomial) camera path and the bounding boxes.
Top: Camera paths resulting from following the bounding boxes from frame-to-frame. Bottom: Final smoothed camera paths generated using Euclidean-norm path formation. Left: Scene in which objects are moving around, requiring a tracking camera path. Right: Scene where objects stay close to the same position; a stationary camera covers the content for the full duration of the scene.
AutoFlip’s configuration graph provides settings for either best-effort or required reframing. If it becomes infeasible to cover all the required regions (for example, when they are too spread out on the frame), the pipeline will automatically switch to a less aggressive strategy by applying a letterbox effect, padding the image to fill the frame. For cases where the background is detected as being a solid color, this color is used to create seamless padding; otherwise a blurred version of the original frame is used.
AutoFlip Use Cases We are excited to release this tool directly to developers and filmmakers, reducing the barriers to their design creativity and reach through the automation of video editing. The ability to adapt any video format to various aspect ratios is becoming increasingly important as the diversity of devices for video content consumption continues to rapidly increase. Whether your use case is portrait to landscape, landscape to portrait, or even small adjustments like 4:3 to 16:9, AutoFlip provides a solution for intelligent, automated and adaptive video reframing.
What’s Next? Like any machine learning algorithm, AutoFlip can benefit from an improved ability to detect objects relevant to the intent of the video, such as speaker detection for interviews or animated face detection on cartoons. Additionally, a common issue arises when input video has important overlays on the edges of the screen (such as text or logos) as they will often be cropped from the view. By combining text/logo detection and image inpainting technology, we hope that future versions of AutoFlip can reposition foreground objects to better fit the new aspect ratios. Lastly, in situations where padding is required, deep uncrop technology could provide improved ability to expand beyond the original viewable area.
While we work to improve AutoFlip internally at Google, we encourage contributions from developers and filmmakers in the open source communities.
Acknowledgments We would like to thank our colleagues who contributed to Autoflip, Alexander Panagopoulos, Jenny Jin, Brian Mulford, Yuan Zhang, Alex Chen, Xue Yang, Mickey Wang, Justin Parra, Hartwig Adam, Jingbin Wang, and Weilong Yang; MediaPipe team who helped with open sourcing, Jiuqiang Tang, Tyler Mullen, Mogan Shieh, Ming Guang Yong, and Chuo-Ling Chang.
Posted by Shreeyak Sajjan, Research Engineer, Synthesis AI and Andy Zeng, Research Scientist, Robotics at Google
Optical 3D range sensors, like RGB-D cameras and LIDAR, have found widespread use in robotics to generate rich and accurate 3D maps of the environment, from self-driving cars to autonomous manipulators. However, despite the ubiquity of these complex robotic systems, transparent objects (like a glass container) can confound even a suite of expensive sensors that are commonly used. This is because optical 3D sensors are driven by algorithms that assume all surfaces are Lambertian, i.e., they reflect light evenly in all directions, resulting in a uniform surface brightness from all viewing angles. However, transparent objects violate this assumption, since their surfaces both refract and reflect light. Hence, most of the depth data from transparent objects are invalid or contain unpredictable noise.
Transparent objects often fail to be detected by optical 3D sensors. Top, Right: For instance, glass bottles do not show up in the 3D depth imagery captured from an Intel® RealSense™ D415 RGB-D camera. Bottom: A 3D visualization via point clouds constructed from the depth image.
Enabling machines to better sense transparent surfaces would not only improve safety, but could also open up a range of new interactions in unstructured applications — from robots handling kitchenware or sorting plastics for recycling, to navigating indoor environments or generating AR visualizations on glass tabletops.
To address this problem, we teamed up with researchers from Synthesis AI and Columbia University to develop ClearGrasp, a machine learning algorithm that is capable of estimating accurate 3D data of transparent objects from RGB-D images. This is made possible by a large-scale synthetic dataset that we are also releasing publicly today. ClearGrasp can work with inputs from any standard RGB-D camera, using deep learning to accurately reconstruct the depth of transparent objects and generalize to completely new objects unseen during training. This in contrast to previous methods, which required prior knowledge of the transparent objects (e.g., their 3D models), often combined with maps of background lighting and camera positions. In this work, we also demonstrate that ClearGrasp can benefit robotic manipulation by incorporating it into our pick and place robot’s control system, where we observe significant improvements in the grasping success rate of transparent plastic objects.
ClearGrasp uses deep learning to recover accurate 3D depth data of transparent surfaces.
A Visual Dataset of Transparent Objects Massive quantities of data are required to train any effective deep learning model (e.g., ImageNet for vision or Wikipedia for BERT), and ClearGrasp is no exception. Unfortunately, no datasets are available with 3D data of transparent objects. Existing 3D datasets like Matterport3D or ScanNet overlook transparent surfaces, because they require expensive and time-consuming labeling processes.
To overcome this issue, we created our own large-scale dataset of transparent objects that contains more than 50,000 photorealistic renders with corresponding surface normals (representing the surface curvature), segmentation masks, edges, and depth, useful for training a variety of 2D and 3D detection tasks. Each image contains up to five transparent objects, either on a flat ground plane or inside a tote, with various backgrounds and lighting.
Some example data of transparent objects from the ClearGrasp synthetic dataset.
We also include a test set of 286 real-world images with corresponding ground truth depth. The real-world images were taken by a painstaking process of replacing each transparent object in the scene with a painted one in the same pose. The images are captured under a number of different indoor lighting conditions, using various cloth and veneer backgrounds and containing random opaque objects scattered around the scene. They contain both known objects, present in the synthetic training set, and novel objects.
Left: The real-world image capturing setup, Middle: Custom user interface enables precisely replacing each transparent object with a spray-painted duplicate, Right: Example of captured data.
The Challenge While the distorted view of the background seen through transparent objects confounds typical depth estimation approaches, there are clues that hint at the objects’ shape. Transparent surfaces exhibit specular reflections, which are mirror-like reflections that show up as bright spots in a well-lit environment. Since these visual cues are prominent in RGB images and are influenced primarily by the shape of the objects, convolutional neural networks can use these reflections to infer accurate surface normals, which then can be used for depth estimation.
Specular reflections on transparent objects create distinct features that vary based on the object shape and provide strong visual cues for estimating surface normals.
Most machine learning algorithms try to directly estimate depth from a monocular RGB image. However, monocular depth estimation is an ill-posed task, even for humans. We observed large errors in estimating the depth of flat background surfaces, which compounds the error in depth estimates for the transparent objects resting atop them. Therefore, rather than directly estimating the depth of all geometry, we conjectured that correcting the initial depth estimates from an RGB-D 3D camera is more practical — it would enable us to use the depth from the non-transparent surfaces to inform the depth of transparent surfaces.
The ClearGrasp Algorithm ClearGrasp uses 3 neural networks: a network to estimate surface normals, one for occlusion boundaries (depth discontinuities), and one that masks transparent objects. The mask is used to remove all pixels belonging to transparent objects, so that the correct depths can be filled in. We then use a global optimization module that starts extending the depth from known surfaces, using the predicted surface normals to guide the shape of the reconstruction, and the predicted occlusion boundaries to maintain the separation between distinct objects.
Overview of our method. The point cloud was generated using the output depth and is colored with its surface normals.
Each of the neural networks was trained on our synthetic dataset and they performed well on real-world transparent objects. However, the surface normal estimations for other surfaces, like walls or fruits, were poor. This is because of the limitations of our synthetic dataset, which contains only transparent objects on a ground plane. To alleviate this issue, we included some real indoor scenes from the Matterport3D and ScanNet datasets in the surface normals training loop. By training on both the in-domain synthetic dataset and out-of-domain real word dataset, the model performed well on all surfaces in our test set.
Surface Normal estimation on real images when trained on a) Matterport3D and ScanNet only (MP+SN), b) our synthetic dataset only, and c) MP+SN as well as our synthetic dataset. Note how the model trained on MP+SN fails to detect the transparent objects. The model trained on only synthetic data picks up the real plastic bottles remarkably well, but fails for other objects and surfaces. When trained on both, our model gets the best of both worlds.
Results Overall, our quantitative experiments show that ClearGrasp is able to reconstruct depth for transparent objects with much higher fidelity than alternative methods. Despite being trained on only synthetic transparent objects, we find our models are able to adapt well to the real-world domain — achieving very similar quantitative reconstruction performance on known objects across domains. Our models also generalize well to novel objects with complex shapes never seen before.
To check the qualitative performance of ClearGrasp, we construct 3D point clouds from the input and output depth images, as shown below (additional examples available on the project webpage). The resulting estimated 3D surfaces have clean and coherent reconstructed shapes — important for applications, such as 3D mapping and 3D object detection — without the jagged noise seen in monocular depth estimation methods. Our models are robust and perform well in challenging conditions, such as identifying transparent objects situated in a patterned background or differentiating between transparent objects partially occluding one another.
Qualitative results on real images. Top two rows: results on known objects. Bottom two rows: results on novel objects. The point clouds, colored with their surface normals, are generated from the corresponding depth images.
Most importantly, the output depth from ClearGrasp can be directly used as input to state-of-the-art manipulation algorithms that use RGB-D images. By using ClearGrasp’s output depth estimates instead of the raw sensor data, our grasping algorithm on a UR5 robot arm saw significant improvements in the grasping success rates of transparent objects. When using the parallel-jaw gripper, the success rate improved from a baseline of 12% to 74%, and from 64% to 86% with suction.
Manipulation of novel transparent objects using ClearGrasp. Note the challenging conditions: textureless background, complex object shapes and the directional light causing confusing shadows and caustics (the patterns of light that occur when light rays are reflected or refracted from a surface).
Limitations & Future Work A limitation of our synthetic dataset is that it does not represent accurate caustics, due to the limitations of rendering with traditional path-tracing algorithms. As a result, our models confuse bright caustics coupled with shadows to be independent transparent objects. Despite these drawbacks, our work with ClearGrasp shows that synthetic data remains a viable approach to achieve competent results for learning-based depth reconstruction methods. A promising direction for future work is improving the domain transfer to real-world images by generating renders with physically-correct caustics and surface imperfections such as fingerprints.
With ClearGrasp, we demonstrate that high-quality renders can be used to successfully train models that perform well in the real world. We hope that our dataset will drive further research on data-driven perception algorithms for transparent objects. Download links and more example images can be found on our project website and our GitHub repository.
Acknowledgements This research was done by Shreeyak Sajjan (Synthesis.ai), Matthew Moore (Synthesis.ai), Mike Pan (Synthesis.ai), Ganesh Nagaraja (Synthesis.ai), Johnny Lee, Andy Zeng, and Shuran Song (Columbia University). We would like to thank Ryan Hickman for managerial support, Ivan Krasin and Stefan Welker for fruitful technical discussions, Cameron (@camfoxmusic) for sharing 3D models of his potion bottles and Sharat Sajjan for helping with web design.
The countryside can be a breathtaking relief from the confines of the big city and the suburbs.
In the United States, any geographical area that is not considered within an urban area is generally considered the countryside, often referred to simply as a rural area.
Typically, a rural area in the United States is sparsely populated. The population density is relatively low and the landscape is rather large. You can often drive across a rural area for miles upon miles and see nothing other than rolling hills, majestic mountains, open flat lands, and oftentimes large-scale farms.
Whenever I drive from the bustling and freeway-clogged environ of Southern California up to the Silicon Valley area in Northern California, it is a splendor to witness the Central California portion of the state. The inland and non-coastal route consists of around 450 miles that provides more than half of the vegetables, nuts, and fruits that are grown in the United States.
Many tourists are surprised to discover that California has an agricultural belt all of its own.
If you come to do touristy kinds of activities, you’d likely go to see Hollywood and Disneyland in Southern California, and perhaps go up to San Francisco to see the Golden Gate Bridge and ride the famous trolleys, but otherwise would not consider spending much time in the central rural area unless you had keen interest in farms and ranches.
Next time that you find yourself munching on almonds, apricots, tomatoes, grapes, asparagus, and other such delicious items, please make sure to thank California since the odds are high that those items were grown in our central inland areas. Driving on the main route from Los Angeles to San Francisco, consisting of either highway 5 or the CA 99, you can pretty much expect to see farms that appear to stretch to the horizon. When it is growing season, there are zillions of rows of crops being grown. When the crops have been harvested, it becomes an endless dirt patch that awaits being planted for the next iteration of the agricultural cycle.
Rural Areas And Driving Time Aspects
I remember one time that I opted to visit one of the farms while making my way on Interstate 5. It was going to be an interesting visit since I hadn’t been on a working farm for many ages (when my children were young, I often took them to visit a farm, so they could see what goes on in the rural areas and learn how our food is grown). For this visit, I had prearranged to meet with a farmer to discuss some of the advances taking place in AgriTech, which is the term used to refer to the advent of high-tech being infused into agriculture.
Side note, there’s ample opportunity to combine AI with AgriTech and doing so is considered a next wave of high-tech for the agricultural realm. For those of you looking for fertile ground to use AI, consider agriculture.
I admit that I was expecting to see a farmhouse that did not have indoor plumbing, barely had electricity, and the work on the farm was being done by horse and plows.
I subsequently realized that I’d been to too many old-time farms that are more Disneyland-like than the real thing.
When I got to the farm on this more recent trip, I was impressed at the high-tech aspects involved in contemporary farming. They had a satellite dish to make sure they could keep tabs on the prices of commodities and were quite sophisticated in their crop management and forecasting. Much of the farming equipment was high-tech equipped and it was apparent that I needed to update my mental model about what happens on a farm.
It was also fascinating to realize that when the families that lived in these rural areas often drove to the nearest town to get supplies or get their children to school, they lamented that it took maybe thirty to forty-five minutes each way to do so.
I say this is fascinating because my daily commute for work in traffic frenzied Los Angeles is more than an hour each way, and yet the distance I travel is a fraction of the distance they needed to go.
If they were complaining about a 30-minute to 45-minute drive, it made me shrug and stifle a mild laugh, since I endure an hour or more drive. Plus, I might add forlornly, my hour drive is not nearly as pretty and serene. My freeway driving consists of looking at the backs of cars and seeing garish billboards, rather than admiring stately looking cows in pastures and seeing budding tomatoes on the vine).
In other words, though some might mistake the distance as being a huge factor while driving in a rural area, it could amount to the same amount of driving time as driving while in the suburbs and big cities.
My commute is bogged down by lots of traffic and the speed I can go is maybe an average of 15-20 miles per hour. For rural driving, there is usually much less traffic and the average speed can be more akin to 40 to 60 miles per hour. Ironically, it seems, their driving time and my driving time is about the same, even though the distance covered is quite different.
In Los Angeles, I am confronted with cars that want to play bumper bashing games, along with pedestrians that dart across busy streets and cause the drivers to radically hit their brakes, playing a kind of Frogger game. You probably would at first assume that driving in rural areas would be a grand relief since there would presumably not be the aspects of cars within inches of each other and nutty pedestrians that are willing to risk their lives to get across the street like a chicken with its head cut-off.
Driving Fatalities Statistics Count
Surprisingly, according to stats provided by governmental highway agencies, car related fatalities in rural areas was nearly 50% of the traffic deaths in the United States, and yet the percentage of the U.S. population in rural areas is only around 20%. Thus, driving in the rural areas is actually a lot more dangerous than you might imagine.
There are various theories about why the driving fatalities rate per capita is so high in the rural areas in comparison to the urban and city areas.
Some say that it is due to the curved roads in rural areas, preventing drivers from seeing around a bend, or perhaps taking curves too fast and skidding into an accident.
Another guess is that the lack of street lighting at night in many rural areas makes it more likely that drivers will not see objects or the roadway or other cars, and therefore the drivers are more apt to hit something than in an urban area that is replete with street lighting.
A somewhat popular theory is that the drivers go very fast in rural areas, being unencumbered by other traffic, and they get themselves into driving troubles that they cannot readily get out of, due to a lack of response time if they had instead been going slower.
Highway Hypnosis Complications
There is also the vaunted “highway hypnosis” that can cause a driver to get into a car accident.
I remember when learning to drive that my driving instructor warned us about the dangers of highway hypnosis. If you aren’t familiar with the phrase itself, I’m sure you are familiar with what it consists of. Mainly, it has to do with becoming zombie-like as a driver when you are driving over large distances in a monotonous landscape and with little or no traffic.
What seems to happen is that your mind becomes dulled, perhaps doing so due to the lack of any changing scenery and the non-use of your thinking processes to handle the driving task. One might say that you are mentally on autopilot.
I remember one terrifying time that I was driving on a country road and doing so for hours on end, and all of a sudden, a deer darted across the road. This was by far worse than any pedestrian in the city darting across the road because I was completely mentally ill-prepared for the deer. Sure, there were lots of deer crossing warning signs, but when you don’t actually see any deer for hours at a time, you mentally begin to disregard the signs. Maybe the signs are only meant to scare you into going slower, you perhaps begin to think, or the deer only cross at a certain time of the year and by luck you aren’t driving on the roads at that time of the year (so your mind blanks out the possibility of a deer appearing any time soon).
When the deer leaped onto the road ahead of me, I even thought it was either a mirage or a gag. It could be that all the deer roadway warning signs had planted the idea of a deer into my brain, and so I was imagining that a deer was suddenly in the roadway. Or, I figured it was maybe a fake deer, a mannequin deer, which had fallen off the back of a truck that was on its way to setup a Christmas display showcasing Santa and his reindeer.
All in all, it took me a solid several seconds to register in my mind that it was an actual deer, and it was actually in my way, and I was actually going to hit it. Thankfully, I swerved, and it moved, so we missed hitting each other, though this took maybe a year or two off my lifetime due to the scare and panic that struck me when it happened. I guess you could say that I was in the grip of highway hypnosis that led to my dulled response (that’s what I was going to have my attorney allege at trial, if I got busted for hitting a deer, if I had struck it!).
Besides the trance or zombie kind of mental state, there’s another kind of mental trickery that can befall you while driving in a rural area. It is called velocitation.
This consists of getting used to going at a high speed and causing you to gradually lose awareness of how fast you are really going. You’ve certainly experienced this. The most likely scenario involves coming off a freeway where you had been going 65 miles per hour and driving onto an off-ramp that is rated at perhaps 30 miles per hour.
When you get onto the on-ramp, you might not realize you are going over twice the speed as recommended for the off-ramp. If you start to brake to ease off the 65 miles per hour, going at say 50 miles per hour might seem like you are going at 30 miles per hour. In essence, going even just slightly slower seems like you are going a lot slower. Your mind gets messed-up about being able to gauge your true speed.
Let’s then add to our list of reasons why rural driving is dangerous by including the potential for getting your mind immersed into highway hypnosis, and also that you might become mentally stagnant about your speed and suffer from velocitation.
Of course, these same kinds of mental maladies can occur for drivers in urban areas too. I mention this to emphasize that it is not something that only occurs in rural driving. I’d say it is more prone to occur and more likely to happen with greater frequency for rural drivers, which arises because of the prevalent driving landscape involved in rural areas.
Lack Of Street Markings
Here’s another aspect about rural driving that is generally more prevalent in rural areas than in other areas, namely the classic unmarked driveways, entrances, exits, and crossroads.
In the normal city driving and suburbia driving, the odds are high that any driveway into or out of a house or property is going to be well-marked and readily seen. Same is the case for entrances into a mall or exits from a school ground. Sure, there might be the occasional exceptions, but I dare say it is usually painted or posted and made apparent by local transportation authorities because of the volume of traffic that goes nearby.
When I drove out to the farm to visit the modern-day farmer, I ended-up on some back-roads by mistake.
There were roads that did not appear on my GPS mapping system. There were hardly any posted signs. The entrance into some of the roads was hidden by trees and other items. I also nearly got banged into by a pick-up truck that sprung from a driveway that I did not see. The pick-up truck was akin to the deer that I had encountered. Yes, I realize that I should be expecting to see pick-up trucks while driving around farms, but having one just dart out from an unmarked driveway caught me off-guard (we didn’t collide, thankfully).
Some call these points at which an unmarked passageway intersects with a fast-moving road to be considered an “instant intersection” and usually is not on a map and is just something that locals know to be watchful about. Locals keep a keen eye for those notorious intersections. An outsider such as me, not being familiar with the roads that I was driving on, could not even predict when those instant intersections were going to “instantaneously” rear their ugly heads. Obviously, if another car wasn’t going to come along at those points, it made little difference to me that they existed, and it was only when another vehicle might magically appear that I was then at risk of collision.
Some of you are maybe saying that I was driving too fast. Slow down, Lance! If you don’t know where those hidden intersections are, you just need to watch your speed and go slow enough to deal with them when they occur.
I found that trying to go slow in some of these rural locations was perhaps as dangerous as going fast.
When I was going slow, there was bound to be a local that was driving fast (just my luck, I guess). My slowness and when combined with their fastness were often a recipe for disaster. They were barreling down a road that they drive every day and came upon my slow-moving car. At times, besides getting a hefty dose of a horn honking from them, it would tend toward a dangerous moment of my either getting hit by the faster moving car, or the faster moving car went around me and potentially put us both in danger if another car was coming toward us.
I certainly did see the need to watch my speed and knew that there might be slow-moving tractors or other slow-moving vehicles from time-to-time. I’ve never had a herd of sheep or cows block me while on a rural road, though I did one time have an entire family of ducks. It was one of those memorable driving moments. Up ahead were some ducks, waddling across the street. I slowed down to a crawl and did not want to scare them. I came to a stop some distance from them and watched in amazement as they took their time, waddle, waddle, waddle.
Believe it or not, a friend of mine later told me that I should have driven right up to them and honked my horn.
Why, you might ask?
He said that by my being quiet, I was deluding them into thinking that going across a car-driven street was safe to do. It would get them into thorny trouble down-the-road, so to speak. If instead I had given them a really big scare, it would have convinced them to never try crossing a street again and presumably someday save their lives. What do you think, did I do a disservice to those cute ducks?
In any case, another factor about rural driving can be the roads at times might not be well maintained.
The roads might suffer from heavy vehicles tearing up the asphalt surface. Rains and cold weather can beat-up the road pavement. There can be roads that are merely packed dirt. In foul weather, some roads can become muddy messes, or on a paved road the potholes are hidden by a layer of rain water.
Road signs might not exist or might be torn and worn. I’ve seen instances of road signs that do exist but are no longer relevant. One said that a gas station was a quarter mile ahead. At the quarter mile mark, there was nothing left but an abandoned set of gas pumps. One sign was a street sign that seemed to mark a street that did not exist and perhaps never did exist, since there wasn’t anything that suggested a road had once been where the sign sat. Maybe the sign maker was hopeful that a street would one day be put there, and in a self-fulfilling prophecy kind of way had put up the sign.
In short, rural areas are their own kind of driving realm with a pronounced kind of landscape and driving challenges.
You can certainly encounter many of the same kinds of driving challenges in any suburban or urban area. Rural areas though tend to have more of and a larger-scale kind of specialty driving task aspects.
There’s no question that someone that can drive in an urban setting will likely be able to drive in a rural setting, which is important to keep in mind.
I highlight that you can drive in an urban setting and likewise be able to drive in a rural setting because I am trying to indicate that the driving skills are roughly the same. When I helped my children learn to drive, I didn’t particularly have to take them to a rural area so that someday they could drive in a rural area. They learned enough about driving in an urban area that they could readily translate their driving skills into being useful for rural driving.
That being the case, there are subtleties that can make a difference when driving in a rural area. As mentioned, you might need to be wary of highway hypnosis, velocitation, roads that are in bad shape, hidden entrances and driveways, high speeds over lengthy stretches, and deal with other drivers that take their rural roadways for granted and aren’t on-the-look for less-familiar with the landscape drivers. There are also the jaywalking ducks and sheep to be dealt with, which I must say are easier on the eyes than those human pedestrians that give you the death-to-all-drivers stare when they are illegally crossing a city street.
Rural Driving And AI Self-Driving Autonomous Cars
What does this have to do with AI self-driving driverless autonomous cars?
At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One crucial aspect, we believe, involves having the AI be able to drive a self-driving car in rural areas, in addition to being able to drive in the city and urban areas.
Allow me to elaborate.
I’d like to first clarify and introduce the notion that there are varying levels of AI self-driving cars. The topmost level is considered Level 5. A Level 5 self-driving car is one that is being driven by the AI and there is no human driver involved. For the design of Level 5 self-driving cars, the automakers are even removing the gas pedal, brake pedal, and steering wheel, since those are contraptions used by human drivers. The Level 5 self-driving car is not being driven by a human and nor is there an expectation that a human driver will be present in the self-driving car. It’s all on the shoulders of the AI to drive the car.
For self-driving cars less than a Level 5, there must be a human driver present in the car. The human driver is currently considered the responsible party for the acts of the car. The AI and the human driver are co-sharing the driving task. In spite of this co-sharing, the human is supposed to remain fully immersed into the driving task and be ready at all times to perform the driving task. I’ve repeatedly warned about the dangers of this co-sharing arrangement and predicted it will produce many untoward results.
For my overall framework about AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/framework-ai-self-driving-driverless-cars-big-picture/
For the levels of self-driving cars, see my article: https://aitrends.com/selfdrivingcars/richter-scale-levels-self-driving-cars/
For why AI Level 5 self-driving cars are like a moonshot, see my article: https://aitrends.com/selfdrivingcars/self-driving-car-mother-ai-projects-moonshot/
For the dangers of co-sharing the driving task, see my article: https://aitrends.com/selfdrivingcars/human-back-up-drivers-for-ai-self-driving-cars/
Let’s focus herein on the true Level 5 self-driving car. Much of the comments apply to the less than Level 5 self-driving cars too, but the fully autonomous AI self-driving car will receive the most attention in this discussion.
Here’s the usual steps involved in the AI driving task:
Sensor data collection and interpretation
Virtual world model updating
AI action planning
Car controls command issuance
Another key aspect of AI self-driving cars is that they will be driving on our roadways in the midst of human driven cars too. There are some pundits of AI self-driving cars that continually refer to a utopian world in which there are only AI self-driving cars on the public roads. Currently there are about 250+ million conventional cars in the United States alone, and those cars are not going to magically disappear or become true Level 5 AI self-driving cars overnight.
Indeed, the use of human driven cars will last for many years, likely many decades, and the advent of AI self-driving cars will occur while there are still human driven cars on the roads. This is a crucial point since this means that the AI of self-driving cars needs to be able to contend with not just other AI self-driving cars, but also contend with human driven cars. It is easy to envision a simplistic and rather unrealistic world in which all AI self-driving cars are politely interacting with each other and being civil about roadway interactions. That’s not what is going to be happening for the foreseeable future. AI self-driving cars and human driven cars will need to be able to cope with each other.
For my article about the grand convergence that has led us to this moment in time, see: https://aitrends.com/selfdrivingcars/grand-convergence-explains-rise-self-driving-cars/
See my article about the ethical dilemmas facing AI self-driving cars: https://aitrends.com/selfdrivingcars/ethically-ambiguous-self-driving-cars/
For potential regulations about AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/assessing-federal-regulations-self-driving-cars-house-bill-passed/
For my predictions about AI self-driving cars for the 2020s, 2030s, and 2040s, see my article: https://aitrends.com/selfdrivingcars/gen-z-and-the-fate-of-ai-self-driving-cars/
Returning to the rural area aspects, there are a number of AI driving elements that come to play when an AI self-driving car encounters a rural landscape.
Debunking False Notions About Rural Areas
I’d like to first tackle a misconception that seems to be spreading about the notion of AI self-driving cars being deployed in rural areas, namely that it won’t be worthwhile to have AI self-driving cars in rural areas.
This notion is exemplified by an article in a major automotive magazine last year in which there was an article entitled “Autonomous State,” and an alleged expert on self-driving cars indicated that there is no benefit to having an AI self-driving car in rural areas, asserting that since there is so minuscule of a rural population that it isn’t worth having a ride-sharing self-driving car be situated in those parts of the country.
I would certainly argue that claiming there is no benefit to having an AI self-driving car in a rural area is absolutely wrong.
There are in fact many benefits.
If the word “benefit” means that there must be some (one or more) advantages to having an AI self-driving car, doing so over the use of a conventional or legacy car, and if the suggestion is that there is no advantage of having an AI self-driving car over having a conventional car while in rural areas, I believe we can easily poke a hole in that balloon.
When I met with the farmer at his modern-day farmhouse, he indicated that each morning and afternoon he or his wife drove their children to and from the school, taking about 30 to 40 minutes each way to make the drive. This meant that either he or his wife had to leave the farm to simply drive the children to school. It also meant that one of the two (the husband or the wife) was unavailable to work the farm during that driving task.
The farmer also indicated that each day they typically would need to go get supplies from various supply depots that were in various areas of the rural community. Once again, either he or his wife made those drives. And, once again, the driving task denied one of them of actually working the farm since they were only acting as a driver during those supply runs.
I realize you might want to counter-argue that it would be “only” maybe an hour or two of their day to do the driving, but I’d like to point out that this is still nonetheless a drain on their available time to work the farm. Furthermore, here’s an added twist that is not simply a labor oriented time-based factor per se.
He mentioned that there have been occasions when their daughter or son was at school and became sick and wanted to come home right away. Usually, unfortunately, he and his wife were both in a remote spot of the farm, each working the land, the cattle, the crops, etc. Upon getting a call from the school, one of them had to quickly get from the remote part of the farm and back to the farmhouse, and then drive from the farmhouse over to the school. This was being done under duress in that they would naturally be concerned about getting to their child as rapidly as they could.
If they had available an AI self-driving car, the self-driving car could routinely take the children back-and-forth to school.
This would relieve the farmer and his wife from making the drive and thus add time to their labor efforts towards the farm itself.
The AI self-driving car would also be ready for any emergency situation such as the children getting sick while at school, and could be remotely dispatched by the farmer, electronically commanding it from afar while in the remote areas of the farm. The AI self-driving car would then drive to the school, doing so from the farmhouse, pick-up the child, and whisk the child back to the farmhouse. While inside the AI self-driving car, the camera inside the self-driving car would allow the parents to remotely interact with the child and see how the child was doing.
In short, an AI self-driving car would absolutely aid the farmer and his family, doing so by acting as an automated chauffeur for the children and for making supply runs.
I’m sure there are lots of other uses they could come up with for an AI self-driving car.
I also have focused primarily on the rural aspects of a farm, but it should be hopefully self-evident that an AI self-driving car could be handy for other rural landscapes beyond just a farm.
With the spread-out nature of a rural area, any kind of human driving is going to likely be time consuming and there are bound to be many kinds of circumstances for which having an AI self-driving car would be highly prized.
Perhaps you live in a rural area and go to work each day, leaving at home your elderly grandma. She is too old to drive a car and cannot get around on her own. With an AI self-driving car, she would have greater mobility. This could come to play on everyday desires of going someplace, and it could also be especially helpful for moments when she might need to see a doctor or go get her medicines.
For more about the elderly and AI self-driving cars, see my article: https://aitrends.com/ethics-and-social-issues/elderly-boon-bust-self-driving-cars/
For family trips in AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/family-road-trip-and-ai-self-driving-cars/
For the aspects of ridesharing and AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/ridesharing-services-and-ai-self-driving-cars-notably-uber-in-or-uber-out/
For the affordability of AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/affordability-of-ai-self-driving-cars/
Considering The ROI Matter
Okay, I believe I’ve well-expunged the idea that there is no benefit of an AI self-driving car for rural areas. That was easy. I’ll consider that perhaps the notion of “no benefit” was actually meant to be more akin to the idea that there is not a viable ROI (Return on Investment) related to having an AI self-driving car in a rural area.
In other words, you cannot usually look only at benefits when weighing the value of something, but also need to look at the costs too. You then compare the benefits to the costs and try to calculate whether the benefits end-up outweighing the costs. If there is a suitable ROI, you could assert that the benefits are outweighing the costs and therefore the item is worth investing in. If there is not a suitable ROI, you would likely assert that the costs outweigh the benefits and therefore the matter is not likely sensible to invest in.
Therefore, rather than suggesting there aren’t any benefits of having an AI self-driving car in a rural area, it would be more sensible for someone to try to argue that there isn’t a sufficient ROI. By recasting the argument into the use of an ROI, you can escape the rather obvious counter-pounding that there are indeed clear-cut benefits. The question becomes whether those benefits outweigh the costs or not.
I’ve previously tackled the topic of ridesharing and AI self-driving cars, and also assessed the affordability of AI self-driving cars for consumers. Allow me to quickly recap some key elements of those relevant topics herein.
Many view that AI self-driving cars will predominantly be used for ride-sharing purposes. This makes sense in that suppose you go to work and need a lift to get there, you might opt to use an AI self-driving car on a ride-sharing basis to do so. It is predicted by some self-driving car pundits that consumers will gradually eschew owning their own car and will mainly use ride-sharing AI self-driving cars as their mode of transportation.
I’ve broken that kind of thinking about car ownership by pointing out that consumers could presumably do likewise in terms of turning a self-driving car into a ride-sharing money maker for themselves. You use your AI self-driving car to get you to work, and then allow your AI self-driving car the rest of the day to earn money as a ride-sharing service. When you finish your work day, it picks you up and takes you home. While at home at night, you send out your AI self-driving car for further ride-sharing money-making activity.
In that case, your AI self-driving car is a money maker. This allows you to potentially afford the likely higher cost of an AI self-driving car over a conventional car. Owning an AI self-driving car could be a means to make money on-the-side, or it could even become your primary source of making money. Why should the auto makers or ride-sharing firms make that money when you could do so instead? Today, the tough thing is finding human drivers to drive cars, but with the AI self-driving car you have no need to deal with the human driver hiring aspects.
Let’s return to the rural setting.
I’ve earlier herein indicated that the population is usually sparsely distributed in a rural locale.
The question of making money off an AI self-driving car as a ride-sharing vehicle becomes whether the sparseness of the population defeats the potential of making money.
In a big city environment, an AI self-driving car as a ride-sharing vehicle is presumably going to readily have paying riders and do so back-to-back. There will be lots of short rides and many of them in a city or urban setting (hopefully; though this must be tempered by the amount of competition, since it could be that we end-up with zillions of AI self-driving cars all trying to grab the same ride-sharing requests!).
Suppose the farmer that I met had an AI self-driving car. He could use it for taking the kids to school and for doing the other supply depot errands for him. Could he also offer it up as a ride-sharing service to other people in the rural area? Yes, of course. The downside would be that it would likely be spending a lot of its time merely getting to wherever the next customer was and thus not earning money directly per se when it was merely in transit.
In the case of the urban or city setting, many pundits are assuming that an AI self-driving car won’t need to use a lot of time to get to its next customer and that customers will be aplenty in a limited geographical distance. I say this because money producing models about AI self-driving cars are often based on the belief that there will be little non-use time and that an AI self-driving car will pretty much continually be toting around paying riders.
I’m not so sure those models are right and are perhaps optimistically assuming that there is little or no competition. The other day, I took a ride-sharing service to the airport and the human driver told me that it was better for him to sit at the airport and wait for his next potential customer, even though there are lots of other ride-sharing vehicles also waiting, versus his getting back into the downtown area to find a customer.
He indicated that the downtown area was a worse random-chance of finding a paying customer and also that the short hops were killing him in that he would get a short hop that paid just a few bucks and then be idle for a long time. He said that by picking up someone from the airport, it would be a longer haul and more money than by simply rushing back into the downtown area.
In the case of rural areas, we cannot axiomatically assume that a ride-sharing use of an AI self-driving car is doomed to a poor or insufficient ROI. It certainly might seem that way and one’s intuition seems to suggest it. But, it also depends upon the competition. If every farmer opts to buy an AI self-driving car and do so while living in the same rural area, it would tend to imply that they aren’t going to be able to use their AI self-driving car as a money maker since everyone else nearby also has one anyway. On the other hand, if only some buy an AI self-driving car, there is a chance that it could be a money maker in that rural area.
Back to the aspect of whether there is any benefit of an AI self-driving car being in a rural area, I’d claim that there is absolutely a benefit. In terms of whether there is a sound ROI, I’d say that we’d need to consider the particulars of a rural area and know more about what the cost of the AI self-driving car will be, along with how much competition there will be. Some rural areas could be handsome ROI’s and others not.
I’d like to also re-think this benefits question by turning the question in a different way.
If you take the position that there is no beneficial basis for having an AI self-driving car in a rural area, regardless of how you come to that calculation, you are also then silently asserting that the rural area will continue to use conventional or legacy cars to get around.
Essentially, you are dooming the rural area to continuing with conventional cars or at least AI self-driving cars that are not autonomous.
Pretty quick of you to cast about 20% of the United States population into a bucket wherein they are not able to enjoy the use of AI self-driving cars. Even if those people are widely dispersed, it still seems like a hefty sized market and one that would be foolish to ignore.
Perhaps several farmers might band together to purchase an AI self-driving car and use it as a kind of community-oriented ride-sharing service. Maybe the local community bands together and gets a fleet of AI self-driving cars and uses local tax dollars to run the fleet. My point being that it does not necessarily need to be the case that an AI self-driving car is owned by a single individual or a family.
Another aspect is that we aren’t yet calculating in all of this the lost time to doing human driving, in the sense that if the farmer or his wife were able to work the farm longer, what is the value of their time and how does it equate to the cost of the AI self-driving car?
There are also the safety aspects that we probably would be best to consider in all of this discussion about rural areas and AI self-driving cars.
I had mentioned that nearly half of the car fatalities in the United States occur in the rural areas. If that’s the case, and if you are suggesting that AI self-driving cars are not “worthwhile” having in rural areas, and if we are to assume that AI self-driving cars will dramatically curtail the death rate of car accidents, you are then condemning the rural areas to continue to be a slaughterhouse based on human driving foibles (that’s a bit of hyperbole, which I use only to help make the point herein).
Hopefully, the deploying of AI self-driving cars in rural areas would reduce the chances of getting into a car related fatality. An AI self-driving car needs to be able to avoid getting into any kind of highway hypnosis, which humans fall prey to. An AI self-driving car needs to be able to avoid getting into a velocitation mode, which humans do.
Can an AI self-driving car handle the long stretches of monotonous driving that occurs in rural areas?
That certainly ought to be the case.
Can AI self-driving cars cope with the winding curved roads and the unexpected “instant intersections” of poorly labeled driveways and entrances? That’s a tougher requirement, for sure. There is an added chance that the AI might do better by the use of Machine Learning (ML) and Deep Learning (DL), wherein the more that any AI self-driving cars cover that same landscape, it could be shared with other AI self-driving cars via cloud-based learning and put them on the same plain as a “local” that is familiar with the roads and their idiosyncratic elements via their use of OTA (Over-The-Air) updating.
See my article about plasticity and Deep Learning: https://aitrends.com/ai-insider/plasticity-in-deep-learning-dynamic-adaptations-for-ai-self-driving-cars/
See my article about Ensemble Machine Learning: https://aitrends.com/selfdrivingcars/ensemble-machine-learning-for-ai-self-driving-cars/
See my article about Machine Learning benchmarks: https://aitrends.com/selfdrivingcars/machine-learning-benchmarks-and-ai-self-driving-cars/
See my article about Federated Machine Learning: https://aitrends.com/selfdrivingcars/federated-machine-learning-for-ai-self-driving-cars/
For aspects about OTA, see my article: https://aitrends.com/selfdrivingcars/air-ota-updating-ai-self-driving-cars/
V2V And V2I Come To Play
Another potential advantage for AI self-driving cars being safer than human drivers could involve the use of V2V (vehicle-to-vehicle) electronic communications, along with V2I (vehicle-to-infrastructure) electronic communications.
I had earlier mentioned that when driving in a rural area, another car came upon me that was moving quite fast and I presumed it was a local that knew the roads well. The other driver was somewhat caught by surprise at my slower moving car. I was somewhat caught by surprise by the other driver that suddenly came upon me.
With AI self-driving cars, the AI of one car could electronically communicate with another AI self-driving car, using V2V, and forewarn the other one that they are both coming upon each other. This could be essential for also dealing with the “instant intersection” situations. Even though one AI self-driving car might not be able to visually or via radar detect another AI self-driving car that is coming out of a driveway that is blocked by a clump of trees, they might be able to communicate via V2V to let each other know of the other one’s presence. They would then adjust their driving accordingly.
I’ve coined the word “omnipresence” to refer to multiple AI self-driving cars that share with each other the status of a roadway. This could be handy for when several AI self-driving cars are in the vicinity of each other and driving on say a mountain road. One AI self-driving car up ahead might alert the others that a deer just darted across the road. Another AI self-driving car might have detected rock debris in the road and alerted the other AI self-driving cars to be wary of the blockage. And so on.
The roadway infrastructure might also communicate with the AI self-driving cars. Via V2I, a hidden driveway might beacon out a message to let any AI self-driving cars driving nearby know that there is a hidden driveway there. Thus, even if there had not yet been any other AI self-driving cars that went past that driveway and could alert others, the beacon itself would do so.
For my article about omnipresence and AI self-driving cars, see: https://aitrends.com/selfdrivingcars/omnipresence-ai-self-driving-cars/
For the use of 5G and AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/5g-and-ai-self-driving-cars/
For the pranking of AI self-driving cars, see my article: https://aitrends.com/selfdrivingcars/pranking-of-ai-self-driving-cars/
For more about safety of AI self-driving cars, see my article: https://aitrends.com/ai-insider/safety-and-ai-self-driving-cars-world-safety-summit-on-autonomous-tech/
For the egocentric mindset of AI developers, see my article: https://aitrends.com/selfdrivingcars/egocentric-design-and-ai-self-driving-cars/
I’d vote that AI developers should be honing AI self-driving cars to be able to drive in rural areas.
In spite of some that are suggesting there won’t be any benefit of AI self-driving cars being used in rural areas, it is my view that they not only might have a viable ROI numerically, they could also save lives in rural areas, and that would presumably have some economic benefit too.
AI self-driving cars might involve individual ownership by rural livers, and/or it might involve collectives that jointly obtain an AI self-driving car and put it to use in their rural area.
For those AI developers that assume their AI self-driving car will readily work in a rural area if it already works in an urban setting, I’d advise that you reconsider that assumption. There are enough twists and turns to make it worthwhile to enhance the AI to cope specifically with the aspects of rural driving. I had earlier indicated that when I helped my children learn to drive that I had not needed to explicitly cover rural driving, but once they did do some rural driving on their own, they had to learn the nuances thereof. AI developers ought to bake those nuances directly into their AI self-driving car capabilities.
Will an AI self-driving car enjoy watching someone tip over a cow?
Will an AI self-driving car become captivated by the wide expanse of majestic crops that stretch to the horizon.
Would an AI self-driving car become an essential and valued element in rural living?
I’d bet so.
We went from horse and plow to conventional cars, and the transformation to AI self-driving cars is likely to be as dramatic and valuable.
Rural areas will welcome AI self-driving cars and the benefits will be substantive, you can mark my words on that.
Copyright 2020 Dr. Lance Eliot
This content is originally posted on AI Trends.
[Ed. Note: For reader’s interested in Dr. Eliot’s ongoing business analyses about the advent of self-driving cars, see his online Forbes column: https://forbes.com/sites/lanceeliot/]