Posted by Andrew Zaldivar, Responsible AI Advocate, Google Research, on behalf of the TFCO Team
Many technologies that use supervised machine learning are having an increasingly positive impact on peoples’ day-to-day lives, from catching early signs of illnesses to filtering inappropriate content. There is, however, a growing concern that learned models, which generally satisfy the narrow requirement of minimizing a single loss function, may have difficulty addressing broader societal issues such as fairness, which generally requires trading-off multiple competing considerations. Even when such factors are taken into account, these systems may still be incapable of satisfying such complex design requirements, for example that a false negative might be “worse” than a false positive, or that the model being trained should be “similar” to a pre-existing model.
The TensorFlow Constrained Optimization (TFCO) library makes it easy to configure and train machine learning problems based on multiple different metrics (e.g. the precisions on members of certain groups, the true positive rates on residents of certain countries, or the recall rates of cancer diagnoses depending on age and gender). While these metrics are simple conceptually, by offering a user the ability to minimize and constrain arbitrary combinations of them, TFCO makes it easy to formulate and solve many problems of interest to the fairness community in particular (such as equalized odds and predictive parity) and the machine learning community more generally.
How Does TFCO Relate to Our AI Principles? The release of TFCO puts our AI Principles into action, further helping guide the ethical development and use of AI in research and in practice. By putting TFCO into the hands of developers, we aim to better equip them to identify where their models can be risky and harmful, and to set constraints that ensure their models achieve desirable outcomes.
What Are the Goals? Borrowing an example from Hardt et al., consider the task of learning a classifier that decides whether a person should receive a loan (a positive prediction) or not (negative), based on a dataset of people who either are able to repay a loan (a positive label), or are not (negative). To set up this problem in TFCO, we would choose an objective function that rewards the model for granting loans to those people who will pay them back, and would also impose fairness constraints that prevent it from unfairly denying loans to certain protected groups of people. In TFCO, the objective to minimize, and the constraints to impose, are represented as algebraic expressions (using normal Python operators) of simple basic rates.
Instructing TFCO to minimize the overall error rate of the learned classifier for a linear model (with no fairness constraints), might yield a decision boundary that looks like this:
Illustration of a binary classification dataset with two protected groups: blue and orange. For ease of visualization, rather than plotting each individual data point, the densities are represented as ovals. The positive and negative signs denote the labels. The decision boundary drawn as a black dashed line separating positive predictions (regions above the line) and negative (regions below the line) labels, chosen to maximize accuracy.
This is a fine classifier, but in certain applications, one might consider it to be unfair. For example, positively-labeled blue examples are much more likely to receive negative predictions than positively-labeled orange examples, violating the “equal opportunity” principle. To correct this, one could add an equal opportunity constraint to the constraint list. The resulting classifier would now look something like this:
Here the decision boundary is chosen to maximize the accuracy, subject to an equal opportunity (or true positive rate) constraint.
How Do I Know What Constraints To Set? Choosing the “right” constraints depends on the policy goals or requirements of your problem and your users. For this reason, we’ve striven to avoid forcing the user to choose from a curated list of “baked-in” problems. Instead, we’ve tried to maximize flexibility by enabling the user to define an extremely broad range of possible problems, by combining and manipulating simple basic rates.
This flexibility can have a downside: if one isn’t careful, one might attempt to impose contradictory constraints, resulting in a constrained problem with no good solutions. In the context of the above example, one could constrain the false positive rates (FPRs) to be equal, in addition to the true positive rates (TPRs) (i.e., “equalized odds”). However, the potentially contradictory nature of these two sets of constraints, coupled with our requirement for a linear model, could force us to find a solution with extremely low accuracy. For example:
Here the decision boundary is chosen to maximize the accuracy, subject to both the true positive rate and false positive rate constraints.
With an insufficiently-flexible model, either the FPRs of both groups would be equal, but very large (as in the case illustrated above), or the TPRs would be equal, but very small (not shown).
Can It Fail? The ability to express many fairness goals as rate constraints can help drive progress in the responsible development of machine learning, but it also requires developers to carefully consider the problem they are trying to address. For example, suppose one constrains the training to give equal accuracy for four groups, but that one of those groups is much harder to classify. In this case, it could be that the only way to satisfy the constraints is by decreasing the accuracy of the three easier groups, so that they match the low accuracy of the fourth group. This probably isn’t the desired outcome.
A “safer” alternative is to constrain each group to independently satisfy some absolute metric, for example by requiring each group to achieve at least 75% accuracy. Using such absolute constraints rather than relative constraints will generally keep the groups from dragging each other down. Of course, it is possible to ask for a minimum accuracy that isn’t achievable, so some conscientiousness is still required.
The Curse of Small Sample Sizes Another common challenge with using constrained optimization is that the groups to which constraints are applied may be under-represented in the dataset. Consequently, the stochastic gradients we compute during training will be very noisy, resulting in slow convergence. In such a scenario, we recommend that users impose the constraints on a separate rebalanced dataset that contains higher proportions from each group, and use the original dataset only to minimize the objective.
For example, in the Wiki toxicity example we provide, we wish to predict if a discussion comment posted on a Wiki talk page is toxic (i.e., contains “rude, disrespectful or unreasonable” content). Only 1.3% of the comments mention a term related to “sexuality”, and a large fraction of these comments are labelled toxic. Hence, training a CNN model without constraints on this dataset leads to the model believing that “sexuality” is a strong indicator of toxicity and results in a high false positive rate for this group. We use TFCO to constrain the false positive rate for four sensitive topics (sexuality, gender identity, religion and race) to be within 2%. To better handle the small group sizes, we use a “re-balanced” dataset to enforce the constraints and the original dataset only to minimize the objective. As shown below, the constrained model is able to significantly lower the false positive rates on the four topic groups, while maintaining almost the same accuracy as the unconstrained model.
Comparison of unconstrained and constrained CNN models for classifying toxic comments on Wiki Talk pages.
Intersectionality – The Challenge of Fine Grained Groups Overlapping constraints can help create equitable experiences for multiple categories of historically marginalized and minority groups. Extending beyond the above example, we also provide a CelebA example that examines a computer vision model for detecting smiles in images that we wish to perform well across multiple non-mutually-exclusive protected groups. The false positive rate can be an appropriate metric here, since it measures the fraction of images not containing a smiling face that are incorrectly labeled as smiling. By comparing false positive rates based on available age group (young and old) or sex (male and female) categories, we can check for undesirable model bias (i.e., whether images of older people that are smiling are not recognized as such).
Comparison of unconstrained and constrained CNN models for classifying toxic comments on Wiki Talk pages.
Under the Hood Correctly handling rate constraints is challenging because, being written in terms of counts (e.g., the accuracy rate is the number of correct predictions, divided by the number of examples), the constraint functions are non-differentiable. Algorithmically, TFCO converts a constrained problem into a non-zero-sum two-player game (ALT’19, JMLR’19). This framework can be extended to handle the ranking and regression settings (AAAI’20), more complex metrics such as the F-measure (NeurIPS’19a), or to improve generalization performance (ICML’19).
It is our belief that the TFCO library will be useful in training ML models that take into account the societal and cultural factors necessary to satisfy real-world requirements. Our provided examples (toxicity classification and smile detection) only scratch the surface. We hope that TFCO’s flexibility enables you to handle your problem’s unique requirements.
Acknowledgements This work was a collaborative effort by the authors of TFCO and associated research papers, including Andrew Cotter, Maya R. Gupta, Heinrich Jiang, Harikrishna Narasimhan, Taman Narayan, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, and Seungil You.
Software developers are using AI to help write and review code, detect bugs, test software and optimize development projects. This assistance is helping companies to deploy new software more efficiently, and to allow a new generation of developers to learn to code more easily.
These are conclusions of a recent report on AI in software development published by Deloitte and summarized in a recent article in Forbes. Authors David Schatsky and Sourabh Bumb describe how a range of companies have launched dozens of AI-driven software development tools over the past 18 months. The market is growing with startups raising $704 million in the year ending September 2019.
The new tools can be used to help reduce keystrokes, detect bugs as software is being written and automate many of the tests needed to confirm the quality of software. This is important in an era of increasing reliance on open source code, which can come with bugs.
While some fear automation may take jobs away from coders, the Deloitte authors see it as unlikely.
“For the most part, these AI tools are helping and augmenting humans, not replacing them,” Schatsky stated. “These tools are helping to democratize coding and software development, allowing individuals not necessarily trained in coding to fill talent gaps and learn new skills. There is also AI-driven code review, providing quality assurance before you even run the code.”
A study from Forrester in 2018 found that 37 percent of companies involved in software development were using coding tools powered by AI. The percentage is likely to be higher now, with companies such as Tara, DeepCode, Kite, Functionize and Deep TabNine and many others providing automated coding services.
Success seems to be accelerating the trend. “Many companies that have implemented these AI tools have seen improved quality in the end products, in addition to reducing both cost and time,” stated Schatsky.
The Deloitte study said AI can help alleviate a chronic shortage of talented developers. Poor software quality cost US organizations an estimated $319 billion last year. The application of AI has the potential to mitigate these challenges.
Deloitte sees AI helping in many stages of software development, including: project requirements, coding review, bug detection and resolution, more through testing, deployment and project management.
IBM Engineer Learned AI Development Lessons from Watson Project
IBM Distinguished Engineer Bill Higgins, based in Raleigh, NC, who has spent 20 years in software development at the company, recently published an account on the impact of AI in software development in Medium.
Organizations need to “unlearn” the patterns for how they have developed software in the past. “If it’s difficult for an individual to adapt, it’s a million times harder for a company to adapt,” the author stated.
Higgins was the lead for IBM’s AI for developers mission within the Watson group. “It turned out my lack of personal experience with AI was an asset,” he stated. He had to go through his own learning journey and thus gained deeper understanding and empathy for developers needing to adapt.
To learn about AI in software development, Higgins said he studied how others have applied it (the problem space) and the cases in which using AI is superior to alternatives (the solution space). This was important to understanding what was possible and to avoid “magical thinking.”
The author said his journey was the most intense and difficult learning he had done since getting a computer science degree at Penn State. “It was so difficult to rewire my mind to think about software systems that improve from experience, vs. software systems that merely do the things you told them to do,” he stated.
IBM developed a conceptual model to help enterprises think about AI-based transformation called the AI Ladder. The ladder has four rungs: collect, organize, analyze and infuse. Most enterprises have lots of data, often organized in siloed IT work or from acquisitions. A given enterprise may have 20 databases and three data warehouses with redundant and inconsistent information about customers. The same is true for other data types such as orders, employees and product information. “IBM promoted the AI Ladder to conceptually climb out of this morass,” Higgins stated.
In the infusion stage, the company works to integrate trained machine learning models into production systems, and design feedback loops so the models can continue to improve from experience. An example of infused AI is the Netflix recommendation system, powered by sophisticated machine learning models.
IBM had determined that a combination of APIs, pre-built ML models and optional tooling to encapsulate, collect, organize and analyze rungs of the AI ladder for common ML domains such as natural language understanding, conversations with virtual agents, visual recognition, speech and enterprise search.
For example, Watson’s Natural Language Understanding became rich and complex. Machine learning is now good at understanding many aspects of language including concepts, relationships between concepts and emotional content. Now the NLU service and the R&D on machine learning-based natural language processing can be made available to developers via an elegant API and supporting SDKs.
“Thus developers can today begin leveraging certain types of AI in their applications, even if they lack any formal training in data science or machine learning,” Higgins stated.
It does not eliminate the AI learning curve, but it makes it a more gentle curve.
Posted by Timo Kohlberger and Yuan Liu, Software Engineers, Google Health
The progress in machine learning (ML) for medical imaging that helps doctors provide better diagnoses has partially been driven by the use of large, meticulously labeled datasets. However, dataset size can be limited in real life due to privacy concerns, low patient volume at partner institutions, or by virtue of studying rare diseases. Moreover, to ensure that ML models generalize well, they need training data that span a range of subgroups, such as skin type, demographics, and imaging devices. Requiring that the size of each combinatorial subgroup (e.g., skin type A with skin condition B, taken by camera C) is also sufficiently large can quickly become impractical.
Today we are happy to share two projects aimed at both improving the diversity of ML training data, and increasing the effective amount of available training data for medical applications. The first project is a configurable method for generation of synthetic skin lesion images in order to improve coverage of rarer skin types and conditions. The second project uses synthetic images as training data to develop an ML model, that can better interpret different biological tissue types across a range of imaging devices.
Generating Diverse Images of Skin Conditions In “DermGAN: Synthetic Generation of Clinical Skin Images with Pathology”, published in the Machine Learning for Health (ML4H) workshop at NeurIPS 2019, we address problems associated with data diversity in de-identified dermatology images taken by consumer grade cameras. This work addresses (1) the scarcity of imaging data representative of rare skin conditions, and (2) the lower frequency of data covering certain Fitzpatrick skin types. Fitzpatrick skin types range from Type I (“pale white, always burns, never tans”) to Type VI (“darkest brown, never burns”), with datasets generally containing relative few cases at the “boundaries”. In both cases, data scarcity problems are exacerbated by the low signal-to-noise ratio common in the target images, due to the lack of standardized lighting, contrast and field-of-view; variability of the background, such as furniture and clothing; and the fine details of the skin, like hair and wrinkles.
To improve diversity in the skin images, we developed a model, called DermGAN, which generates skin images that exhibit the characteristics of a given pre-specified skin condition, location, and underlying skin color. DermGAN uses an image-to-image translation approach, based on the pix2pix generative adversarial network (GAN) architecture, to learn the underlying mapping from one type of image to another.
DermGAN takes as input a real image and its corresponding, pre-generated semantic map representing the underlying characteristics of the real image (e.g., the skin condition, location of the lesion, and skin type), from which it will generate a new synthetic example with the requested characteristics. The generator is based on the U-Net architecture, but in order to mitigate checkerboard artifacts, the deconvolution layers are replaced with a resizing layer, followed by a convolution. A few customized losses are introduced to improve the quality of the synthetic images, especially within the pathological region. The discriminator component of DermGAN is solely used for training, whereas the generator is evaluated both visually and for use in augmenting the training dataset for a skin condition classifier.
Overview of the generator component of DermGAN. The model takes an RGB semantic map (red box) annotated with the skin condition’s size and location (smaller orange rectangle), and outputs a realistic skin image. Colored boxes represent various neural network layers, such as convolutions and ReLU; the skip connections resemble the U-Net and enable information to be propagated at the appropriate scales.
The top row shows generated synthetic examples and the bottom row illustrates real images of basal cell carcinoma (left) and melanocytic nevus (right). More examples can be found in the paper.
In addition to generating visually realistic images, our method enables generation of images of skin conditions or skin types that are more rare and that suffer from a paucity of dermatologic images.
DermGAN can be used to generate skin images (all with melanocytic nevus in this case) with different background skin types (top, by changing the input skin color) and different-sized lesions (bottom, by changing the input lesion size). As the input skin color changes, the lesion changes appearance to match what the lesion would look like on different skin types.
Early results indicated that using the generated images as additional data to train a skin condition classifier may improve performance at detecting rare malignant conditions, such as melanoma. However, more work is needed to explore how best to utilize such generated images to improve accuracy more generally across rarer skin types and conditions.
Generating Pathology Images with Different Labels Across Diverse Scanners The focus quality of medical images is important for accurate diagnoses. Poor focus quality can trigger both false positives and false negatives, even in otherwise accurate ML-based metastatic breast cancer detection algorithms. Determining whether or not pathology images are in-focus is difficult due to factors such as the complexity of the image acquisition process. Digitized whole-slide images could have poor focus across the entire image, but since they are essentially stitched together from thousands of smaller fields of view, they could also have subregions with different focus properties than the rest of the image. This makes manual screening for focus quality impractical and motivates the desire for an automated approach to detect poorly-focused slides and locate out-of-focus regions. Identifying regions with poor focus might enable re-scanning, or yield opportunities to improve the focusing algorithms used during the scanning process.
In our second project, presented in “Whole-slide image focus quality: Automatic assessment and impact on AI cancer detection”, published in the Journal of Pathology Informatics, we develop a method of evaluating de-identified, large gigapixel pathology images for focus quality issues. This involved training a convolutional neural network on semi-synthetic training data that represent different tissue types and slide scanner optical properties. However, a key barrier towards developing such an ML-based system was the lack of labeled data — focus quality is difficult to grade reliably and labeled datasets were not available. To exacerbate the problem, because focus quality affects minute details of the image, any data collected for a specific scanner may not be representative of other scanners, which may have differences in the physical optical systems, the stitching procedure used to recreate a large pathology image from captured image tiles, white-balance and post-processing algorithms, and more. This led us to develop a novel multi-step system for generating synthetic images that exhibit realistic out-of-focus characteristics.
We deconstructed the process of collecting training data into multiple steps. The first step was to collect images from various scanners and to label in-focus regions. This task is substantially easier than trying to determine the degree to which an image is out of focus, and can be completed by non-experts. Next, we generated synthetic out-of-focus images, inspired by the sequence of events that happen prior to a real out-of-focus image is captured: the optical blurring effect happens first, followed by those photons being collected by a sensor (a process that adds sensor noise), and finally software compression adds noise.
A sequence of images showing step-wise out-of-focus image generation. Images are shown in grayscale to accentuate the difference between steps. First, an in-focus image is collected (a) and a bokeh effect is added to produce a blurry image (b). Next, sensor noise is added to simulate a real image sensor (c), and finally JPEG compression is added to simulate the sharp edges introduced by post-acquisition software processing (d). A real out-of-focus image is shown for comparison (e).
Our study shows that modeling each step is essential for optimal results across multiple scanner types, and remarkably, enabled the detection of spectacular out-of-focus patterns in real data:
An example of a particularly interesting out-of-focus pattern across a biological tissue slice. Areas in blue were recognized by the model to be in-focus, whereas areas highlighted in yellow, orange, or red were more out of focus. The gradation in focus here (represented by concentric circles: a red/orange out-of-focus center surrounded by green/cyan mildly out-of-focus, and then a blue in-focus ring) was caused by a hard “stone” in the center that lifted the surrounding biological tissue.
Implications and Future Outlook Though the volume of data used to develop ML systems is seen as a fundamental bottleneck, we have presented techniques for generating synthetic data that can be used to improve the diversity of training data for ML models and thereby improve the ability of ML to work well on more diverse datasets. We should caution though that these methods are not appropriate for validation data, so as to avoid bias such as an ML model performing well only on synthetic data. To ensure unbiased, statistically-rigorous evaluation, real data of sufficient volume and diversity will still be needed, though techniques such as inverse probability weighting (for example, as leveraged in our work on ML for chest X-rays) may be useful there. We continue to explore other approaches to more efficiently leverage de-identified data to improve data diversity and reduce the need for large datasets in the development of ML models for healthcare.
Acknowledgements These projects involved the efforts of multidisciplinary teams of software engineers, researchers, clinicians and cross functional contributors. Key contributors to these projects include Timo Kohlberger, Yun Liu, Melissa Moran, Po-Hsuan Cameron Chen, Trissia Brown, Jason Hipp, Craig Mermel, Martin Stumpe, Amirata Ghorbani, Vivek Natarajan, David Coz, and Yuan Liu. The authors would also like to acknowledge Daniel Fenner, Samuel Yang, Susan Huang, Kimberly Kanada, Greg Corrado and Erica Brand for their advice, members of the Google Health dermatology and pathology teams for their support, and Ashwin Kakarla and Shivamohan Reddy Garlapati for their team for image labeling.
Or, suppose instead I said to you that you should “Drive Safely: It’s the Law” – how would you react?
Perhaps I might say “Drive Safely or Get a Ticket.”
I could be even more succinct and simply say: Drive Safely.
These are all ways to generally say the same thing.
Yet, how you react to them can differ quite a bit.
Why would you react differently to these messages that all seem to be saying the same thing?
Because how the message is phrased will create a different kind of social context that your underlying social norms will react to.
If I simply say “Drive Safely”, it’s a rather perfunctory form of wording the message.
It’s quick, consisting of only two words. You likely would barely notice the message and you might also think that of course it’s important to drive safely. You might ignore the message due to it seemingly being obvious, or you might notice it and think to yourself that it’s kind of a handy reminder but that in the grand scheme of things it wasn’t that necessary, at least not for you (maybe it was intended for riskier drivers, you assume).
Consider next the version that says “Thank You for Driving Safely.”
This message is somewhat longer, having now five words, and takes more effort to read. As you parse the words of the message, the opening element is that you are being thanked for something. We all like being thanked. What is it that you are being thanked for, you might wonder. You then get to the ending of the message and realize you are being thanked for driving safely.
Most people would then maybe get a small smile on their face and think that this was a mildly clever way to urge people to drive safely. By thanking people, it gets them to consider that they need to do something to get the thanks, and the thing they need to do is drive safely. In essence, the message tries to create a reciprocity with the person – you are getting a thank you handed to you, and you in return are supposed to do something, namely you are supposed to drive safely.
Suppose you opt to not drive safely?
You’ve broken the convention of having been given something, the thanks, when it really was undeserved. In theory, you’ll not want to break such a convention and therefore will be motivated to drive safely. I’d say that none of us will necessarily go out of our way to drive safely merely due to the aspect that you need to repay the thank-you. On the other hand, maybe it will be enough of a social nudge that it puts you into a mental mindset of continuing to drive safely. It’s not enough to force you into driving safely, but it might keep you going along as a safe driver.
What about the version that says “Drive Safely: It’s the Law” and your reaction to it?
In this version, you are being reminded to drive safely and then you are being forewarned that it is something you are supposed to do. You are told that the law requires you to drive safely. It’s not really a choice per se, and instead it is the law. If you don’t drive safely, you are a lawbreaker. You might get into legal trouble.
The version that says “Drive Safely or Get a Ticket” is similar to the version warning you about the law, and steps things up a further notch.
If I tell you that something isn’t lawful, you need to make a mental leap that if you break the law there are potentially adverse consequences. In the case of the version telling you straight out that you’ll get a ticket, there’s no ambiguity about the aspect that not only must you drive safely but indeed there is a distinct penalty for not doing so.
None of us likes getting a ticket.
We’ve all had to deal with traffic tickets and the trauma of getting points dinged on our driving records, possibly having our car insurance rates hiked, and maybe needing to go to traffic school and suffer through boring hours of re-learning about driving. Yuk, nobody wants that. This version that mentions the ticket provides a specific adverse consequence if you don’t comply with driving safely.
The word-for-word wording of the drive safely message is actually quite significant as to how the message will be received by others and whether they will be prompted to do anything because of the message.
I realize that some of you might say that it doesn’t matter which of those wordings are used.
Aren’t we being rather tedious in parsing each such word?
Seems like a lot of focus on something that otherwise doesn’t need any attention. Well, you’d actually be somewhat mistaken in the assumption that those variants of wording do not make a difference. There are numerous psychology and cognition studies that show that the wording of a message can have an at times dramatic difference as to whether people notice the message and whether they take it to heart.
I’ll concentrate herein on one such element that makes those messages so different in terms of impact, namely due to the use of reciprocity.
Importance Of Reciprocity
Reciprocity is a social norm.
Cultural anthropologists suggest that it is a social norm that cuts across all cultures and all of time.
In essence, we seem to have always believed in and accepted reciprocity in our dealings with others, whether we explicitly knew it or not.
I tell you that I’m going to help you with putting up a painting on your wall. You now feel as though you owe me something in return. It might be that you would pay me for helping you. Or, it could be something else such as you might do something for me, such as you offer to help me cook a meal. We’re then balanced. I helped you with the painting, you helped me with the meal. In this case, we traded with each other, me giving you one type of service, and you providing in return to me some kind of service.
Of course, the trades could have been something other than a service.
I help you put up the painting (I’m providing a service to you), and you then hand me a six pack of beer. In that case, I did a service for you, and you gave me a product in return (the beers). Maybe instead things started out that you gave me a six-pack of beer (product) and I then offered to help put up your painting (a service). Or, it could be that you hand me the six pack of beers (product), and I hand you a pair of shoes (product).
In either case, one aspect is given to the other person, and the other person provides something in return. We seem to just know that this is the way the world works.
Is it in our DNA?
Is it something that we learn as children? Is it both?
There are arguments to be made about how it has come to be.
Regardless of how it came to be, it exists and actually is a rather strong characteristic of our behavior.
Let’s further unpack the nature of reciprocity.
I had mentioned that you gave me a six-pack of beer and I then handed you a pair of shoes. Is that a fair trade? Maybe those shoes are old, worn out, and have holes in them. You might not need them and even if you needed them you might not want that particular pair of shoes. Seems like an uneven trade. You are likely to feel cheated and regret the trade. You might harbor a belief that I was not fair in my dealings with you. You might expect that I will give you something else of greater value to make-up for the lousy shoes.
On the other hand, maybe I’m not a beer drinker and so you’re having given me beers seemed like an odd item to give to me. I might have thought that I’d give you an odd item in return. Perhaps in my mind, the trade was even. Meanwhile, in your mind, the trade was uneven.
There’s another angle too as to whether the trade was intended as a positive one or something that is a negative one. We both are giving each other things of value and presumably done in a positive way. It could be a negative action kind of trade instead. I hit you in the head with my fist, and so you then kick me in the shin. Negative actions as a reciprocity. It’s the old eye-for-an-eye kind of notion.
Time is a factor in reciprocity too. I will help you put up your painting. Perhaps the meal you are going to help me cook is not going to take place until several months from today. That’s going to be satisfactory in that we both at least know that there is a reciprocal arrangement underway.
If I help you with the painting, and there’s no discussion about what you’ll do for me, I’d walk away thinking that you owe me. You might also be thinking the same. Or, you could create an imbalance by not realizing you owe me, or maybe you are thinking that last year you helped me put oil into my car and so that’s what makes us even now on this most current trade.
Difficulties Of Getting Reciprocity Right
Reciprocity can be dicey.
There are ample ways that the whole thing can get com-bobbled.
I do something for you, you don’t do anything in return.
I do something for you of value N, and you provide in return something of perceived value Y that is substantively less than N. I do something for you, and you pledge to do something for me that’s a year from now, meanwhile I maybe feel cheated because I didn’t get more immediate value and also if you forget a year from now to make-up the trade then I forever might become upset. And so on.
I am assuming that you’ve encountered many of these kinds of reciprocity circumstances in your lifetime. You might not have realized at the time they were reciprocity situations. We often fall into them and aren’t overtly aware of it.
One of the favorite examples about reciprocity in our daily lives involves the seemingly simple act of a waiter or waitress getting a tip after having served a meal. Studies show that if the server brings out the check and includes a mint on the tray holding the check, this has a tendency to increase the amount of the tip. The people that have eaten the meal and are getting ready to pay will feel as though they owe some kind of reciprocity due to the mint being there on the tray. Research indicates that the tip will definitely go up by a modest amount as a result of the act of providing the mint.
A savvy waiter or waitress can further exploit this reciprocity effect. If they look you in the eye and say that the mint was brought out just for you and your guests, this boosts the tip even more so. The rule of reciprocity comes to play since the value of the aspect being given has gone up, namely it was at first just any old mint and now it is a special mint just for you all, and thus the trade in kind by you is going to increase to match somewhat to the increase in value of the offering. The timing involved is crucial too, in that if the mint was given earlier in the meal, it would not have as great an impact as coming just at the time that the payment is going to be made.
As mentioned, reciprocity doesn’t work on everyone in the same way.
The mint trick might not work on you, supposing you hate mints or you like them but perceive it of little value. Or, if the waiter or waitress has irked you the entire meal, it is unlikely that the mint at the end is going to dig them out of a hole. In fact, sometimes when someone tries the reciprocity trick, it can backfire on them. Upon seeing the mint and the server smiling at you, if you are already ticked-off about the meal and the service, it could actually cause you to go ballistic and decide to leave no tip or maybe ask for the manager and complain.
Here’s a recap then about the reciprocity notion:
Reciprocity is a social norm of tremendous power that seems to universally exist
Often fall into a reciprocity and don’t know it
Usually a positive action needs to be traded for another in kind
Usually a negative action needs to be traded for another in kind
An imbalance in the perceived trades can mar the arrangement
Trades can be services or products or combinations thereof
Time can be a factor as to immediate, short-term, or long-term
AI Autonomous Cars And Social Reciprocity
What does this have to do with AI self-driving driverless autonomous cars?
At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. One crucial aspect of the AI will be the interaction with the human occupants of the self-driving car, and as such, the AI should be crafted to leverage reciprocity.
One of the areas of open research and discussion involves the nature of the interaction between the AI of a self-driving car and the human occupants that will be using the self-driving car. Some AI developers with a narrow view seem to think that all that the interaction consists of would be the human occupants saying to drive them to the store or to home, and that’s it.
This is a naive view.
The human occupants are going to want to have the AI much abler to carry on a conversation.
For my article about natural language processing and AI self-driving cars, see: https://aitrends.com/features/socio-behavioral-computing-for-ai-self-driving-cars/
For explanations capabilities of AI for the human occupants, see my article: https://aitrends.com/selfdrivingcars/explanation-ai-machine-learning-for-ai-self-driving-cars/
For a kind of reciprocity of the behavior of AI self-driving cars toward other cars and car drivers, see my article about the Prisoner’s Dilemma or also known as the tit-for-tat: https://aitrends.com/selfdrivingcars/tit-for-tat-and-ai-self-driving-cars/
Let’s consider an example of social reciprocity involving a passenger and driver dialogue.
You get into the AI self-driving car and decide you want to go to work.
Almost as though you are giving an instruction to a GPS, you merely indicate the address of work, and the self-driving car then proceeds to drive you there. Suppose that during the journey, you want to get some coffee, doing so before you get to work. You would want to convey this interim destination to the AI.
Before the days of Alexa and Siri, we might have all accepted the idea that we might need to enter such instructions into an on-board computer tablet, but I think we are now expecting to be able to have a more natural audio interaction with the AI system. If you’ve gone to the trouble to buy an expensive and shiny AI self-driving car, you certainly would hope that some effort went into making the system interactive with the human occupants.
A barebones “conversational” AI might be this:
You: Take me to work.
AI self-driving car: Okay
What does the word “Okay” mean in this situation? Does it mean that the AI is acknowledging that you have said you want to go to work, but that it doesn’t necessarily mean that the AI should immediately do so. Or, maybe it means that the AI is going to immediately start the self-driving car and drive you to work. Or, perhaps the AI misunderstood what you said and thought you said “I hate work” and therefore the AI is merely saying “Okay” as a placeholder word to seem responsive.
Let’s try this again:
You: Take me to work.
AI self-driving car: Starting to your work now.
This is a little better since the AI has echoed back to you that you want to go to work, and furthermore it is stating when this action will occur, specifically that it will occur right away or now. This seems somewhat flat though. If you were conversing with a human chauffeur, they probably would be more embellishing in what they might say.
Here’s another try:
You: Take me to work.
AI self-driving car: Good morning, I will head to your work at 1234 Lauren Street. Hope you will have a pleasant day at work!
This provides more specific feedback, including the work address, which helps to confirm that you really intended to go to work. It might be that you said the word “work” out of habit and meant to say “home” – by hearing the address it might cause you to realize that you had said the wrong word. The part of the message that says “good morning” and the other part that says “have a pleasant day” are added embellishments that presumably give a softer feeling to the messaging and makes things seem less robotic.
One criticism of having the AI utter “good morning” and “have a pleasant day” is that it implies perhaps that the AI actually means those things.
When I speak those words to you, you assume that I as a human have measured out those words and that I presumably know what it means to have a good morning, and so with my knowledge about the nature of mornings, I am genuinely hoping that you have a good one. If you see the words “good morning” written on a poster, you don’t consider that the poster knows anything about the meaning of those words. When the AI system speaks those words, you are likely to be “fooled” into thinking that the AI system “understands” the nature of mornings and is basing those words on a sense of the world.
But, the AI of today is more akin to the poster, it is merely showcasing those words and does not yet (at least) comprehend the true meaning of those words.
Do we want the AI to seem to be more aware than it really is?
That’s an important question. If the human occupants believe that the AI has some form of human awareness and knowledge, the human occupant might get themselves into a pickle by trying to converse with the AI system. Suppose the human starts to suffer a heart attack, and believes that the AI has human-like understanding, and so the human says “help, I’m having a heart attack” – if you said this to another adult, the human adult would likely realize that you are in trouble, they might call 911, they might try to aid you, etc.
The AI of today would not know what you’ve said per se. You might have been misled into thinking that it would. If you believed that the AI was not so capable as a human, you might instead have uttered “take me to the nearest hospital” which then hopefully is similar to “take me to work” in that the system can parse the words and realize that it is a driving instruction. The AI would presumably then alter the driving path and instead drive the self-driving car to a nearby hospital.
This argument also comes up when AI developers put other human-like speaking mannerisms into their systems. Humans for example will often say “you know” or have pauses in their speaking or say “uh” while speaking. To-date, most of the AI natural language processing systems have generated spoken words that are streamlined and omit those human mannerisms. Lately, some AI developers have been including them. This gives the AI spoken words a closer to human-like feel.
Here’s an example:
You: Take me to work.
AI self-driving car: Uh, Okay, sounds good… will do so.
The added elements to the response are going to suggest to the human listener that the AI is composing speech in a manner similar to a human. This anthropomorphizes the AI system, which as I’ve mentioned before can be good if you believe that’s a desirable way to bond with the human occupants, but it can also be bad in that it can mislead as to what the AI system can and cannot really do and understand.
The joint interaction in a self-driving car between the human occupants and the AI are potentially going to involve life and death matters.
This is an important point to keep in mind. If you are simply developing an online search engine that verbally interacts with humans or a system that can aid them in ordering toothbrushes and towels, whether the human believes that the AI is actually comprehending them is not particularly life threatening. In the case of a self-driving car, the human might need to offer insight to the AI about the driving task, or the AI might need to get input from the human about the driving task, any of which if improperly undertaken could have dire consequences.
For my article about the framework for AI self-driving cars, see: https://aitrends.com/selfdrivingcars/framework-ai-self-driving-driverless-cars-big-picture/
For my article about common sense reasoning and AI self-driving cars, see: https://aitrends.com/selfdrivingcars/common-sense-reasoning-and-ai-self-driving-cars/
Digging Into The AI-Human Social Reciprocity
This then takes us to the reciprocity aspects.
Should the AI system of the self-driving car carry on conversations that also invoke the use of reciprocity?
If you are of a mind that the AI should not over-inflate what it can actually comprehend, you’d probably say that it should not also imbue reciprocity.
If you are seeking to have the AI appear as sentient as it can, the use of reciprocity would be a core element to achieve that sense, since it is so common among humans and part-and-parcel of our daily lives.
Here’s an example:
You: Take me to work.
AI self-driving car: Happy to take you to work. Please make sure to keep your seat belts on.
You likely can detect the subtle reciprocity involved in the dialogue. The AI implies that it is doing something for the human occupant, taking them to work, and the punch-up is that the AI is “happy” to do so. This creates a human-like aspect as though the AI was doing this as a service for you. The trade is that in return, you are being asked to keep your seat belts on.
If we strip away the reciprocity, we’d get something like this instead:
You: Take me to work.
AI self-driving car: Okay. Keep your seat belts on.
This version is similar to the earlier example about the drive safely message. We’ve now got a plainer and straightforward kind of instruction or maybe even an edict, which was the same with the “drive safely” message. The “happy to take you to work” was more akin to the “thank you for driving safely” message that created a kind of quid-pro-quo element to the dialogue.
If we make the messaging more along the negative side, it might be something like this:
You: Take me to work.
AI self-driving car: Okay. Keep your seat belts on or I’ll stop the car and you won’t get to work on time.
Whoa! This sounds like some kind of fierce AI that is threatening you.
There are AI developers that would argue that this message is actually better than the others because it makes abundantly clear the adverse consequence if the human does not wear their seat belts.
Yes, it’s true that it does spell out the consequences, but it also perhaps sets up a “relationship” with the human occupant that’s going to be an angry one. It sets the tone in a manner that might cause the human to consider in what manner they want to respond back to the AI (angrily!).
If the AI system is intended to interact with the human occupants in a near-natural way, the role of reciprocity needs to be considered.
It is a common means of human to human interaction. Likewise, the AI self-driving car will be undertaking the driving task and some kind of give-and-take with the human occupants is likely to occur.
We believe that as AI Natural Language Processing (NLP) capabilities get better, incorporating reciprocity will further enhance the seeming natural part of natural language processing.
It is prudent though to be cautious in overstepping what can be achieved and the life-and-death consequences of human and AI interaction in a self-driving car context needs to be kept in mind.
Copyright 2020 Dr. Lance Eliot
This content is originally posted on AI Trends.
[Ed. Note: For reader’s interested in Dr. Eliot’s ongoing business analyses about the advent of self-driving cars, see his online Forbes column: https://forbes.com/sites/lanceeliot/]
Posted by Nathan Frey, Senior Software Engineer, Google Research, Los Angeles and Zheng Sun, Senior Software Engineer, Google Research, Mountain View
Videos filmed and edited for television and desktop are typically created and viewed in landscape aspect ratios (16:9 or 4:3). However, with an increasing number of users creating and consuming content on mobile devices, historical aspect ratios don’t always fit the display being used for viewing. Traditional approaches for reframing video to different aspect ratios usually involve static cropping, i.e., specifying a camera viewport, then cropping visual contents that are outside. Unfortunately, these static cropping approaches often lead to unsatisfactory results due to the variety of composition and camera motion styles. More bespoke approaches, however, typically require video curators to manually identify salient contents on each frame, track their transitions from frame-to-frame, and adjust crop regions accordingly throughout the video. This process is often tedious, time-consuming, and error-prone.
To address this problem, we are happy to announce AutoFlip, an open source framework for intelligent video reframing.AutoFlip is built on top of the MediaPipe framework that enables the development of pipelines for processing time-series multimodal data. Taking a video (casually shot or professionally edited) and a target dimension (landscape, square, portrait, etc.) as inputs, AutoFlip analyzes the video content, develops optimal tracking and cropping strategies, and produces an output video with the same duration in the desired aspect ratio.
Left: Original video (16:9). Middle: Reframed using a standard central crop (9:16). Right: Reframed with AutoFlip (9:16). By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content.
AutoFlip Overview AutoFlip provides a fully automatic solution to smart video reframing, making use of state-of-the-art ML-enabled object detection and tracking technologies to intelligently understand video content. AutoFlip detects changes in the composition that signify scene changes in order to isolate scenes for processing. Within each shot, video analysis is used to identify salient content before the scene is reframed by selecting a camera mode and path optimized for the contents.
Shot (Scene) Detection A scene or shot is a continuous sequence of video without cuts (or jumps). To detect the occurrence of a shot change, AutoFlip computes the color histogram of each frame and compares this with prior frames. If the distribution of frame colors changes at a different rate than a sliding historical window, a shot change is signaled. AutoFlip buffers the video until the scene is complete before making reframing decisions, in order to optimize the reframing for the entire scene.
Video Content Analysis We utilize deep learning-based object detection models to find interesting, salient content in the frame. This content typically includes people and animals, but other elements may be identified, depending on the application, including text overlays and logos for commercials, or motion and ball detection for sports.
The face and object detection models are integrated into AutoFlip through MediaPipe, which uses TensorFlow Lite on CPU. This structure allows AutoFlip to be extensible, so developers may conveniently add new detection algorithms for different use cases and video content. Each object type is associated with a weight value, which defines its relative importance — the higher the weight, the more influence the feature will have when computing the camera path.
Left: People detection on sports footage. Right: Two face boxes (‘core’ and ‘all’ face landmarks). In narrow portrait crop cases, often only the core landmark box can fit.
Reframing After identifying the subjects of interest on each frame, logical decisions about how to reframe the content for a new view can be made. AutoFlip automatically chooses an optimal reframing strategy — stationary, panning or tracking — depending on the way objects behave during the scene (e.g., moving around or stationary). In stationary mode, the reframed camera viewport is fixed in a position where important content can be viewed throughout the majority of the scene. This mode can effectively mimic professional cinematography in which a camera is mounted on a stationary tripod or where post-processing stabilization is applied. In other cases, it is best to pan the camera, moving the viewport at a constant velocity. The tracking mode provides continuous and steady tracking of interesting objects as they move around within the frame.
Based on which of these three reframing strategies the algorithm selects, AutoFlip then determines an optimal cropping window for each frame, while best preserving the content of interest. While the bounding boxes track the objects of focus in the scene, they typically exhibit considerable jitter from frame-to-frame and, consequently, are not sufficient to define the cropping window. Instead, we adjust the viewport on each frame through the process of Euclidean-norm optimization, in which we minimize the residuals between a smooth (low-degree polynomial) camera path and the bounding boxes.
Top: Camera paths resulting from following the bounding boxes from frame-to-frame. Bottom: Final smoothed camera paths generated using Euclidean-norm path formation. Left: Scene in which objects are moving around, requiring a tracking camera path. Right: Scene where objects stay close to the same position; a stationary camera covers the content for the full duration of the scene.
AutoFlip’s configuration graph provides settings for either best-effort or required reframing. If it becomes infeasible to cover all the required regions (for example, when they are too spread out on the frame), the pipeline will automatically switch to a less aggressive strategy by applying a letterbox effect, padding the image to fill the frame. For cases where the background is detected as being a solid color, this color is used to create seamless padding; otherwise a blurred version of the original frame is used.
AutoFlip Use Cases We are excited to release this tool directly to developers and filmmakers, reducing the barriers to their design creativity and reach through the automation of video editing. The ability to adapt any video format to various aspect ratios is becoming increasingly important as the diversity of devices for video content consumption continues to rapidly increase. Whether your use case is portrait to landscape, landscape to portrait, or even small adjustments like 4:3 to 16:9, AutoFlip provides a solution for intelligent, automated and adaptive video reframing.
What’s Next? Like any machine learning algorithm, AutoFlip can benefit from an improved ability to detect objects relevant to the intent of the video, such as speaker detection for interviews or animated face detection on cartoons. Additionally, a common issue arises when input video has important overlays on the edges of the screen (such as text or logos) as they will often be cropped from the view. By combining text/logo detection and image inpainting technology, we hope that future versions of AutoFlip can reposition foreground objects to better fit the new aspect ratios. Lastly, in situations where padding is required, deep uncrop technology could provide improved ability to expand beyond the original viewable area.
While we work to improve AutoFlip internally at Google, we encourage contributions from developers and filmmakers in the open source communities.
Acknowledgments We would like to thank our colleagues who contributed to Autoflip, Alexander Panagopoulos, Jenny Jin, Brian Mulford, Yuan Zhang, Alex Chen, Xue Yang, Mickey Wang, Justin Parra, Hartwig Adam, Jingbin Wang, and Weilong Yang; MediaPipe team who helped with open sourcing, Jiuqiang Tang, Tyler Mullen, Mogan Shieh, Ming Guang Yong, and Chuo-Ling Chang.