AI Priorities for 2020 — Part 1: Going Beyond Example-Based Learning
Humans keep learning core concepts throughout their childhood and later life that they continuously recombine into problem solutions. Modern AI in the form of convolutional neural networks (CNN) does not have the luxury of going through a human childhood and adolescence to acquire this key capability of generalizing successful behavior while minimizing detrimental actions. In a nutshell, the CNN is at a 18 year disadvantage (assuming a ‘standard human childhood’) when it comes to learning modular and generalizable problem solving capabilities. Instead, the CNN relies on us humans to provide examples of desired outcomes and the input variables that we suspect correlate with this outcome or its opposite.
The Limits of Learning by Example Only
Imagine if our parents and teachers had exlusively trained us by providing a vast number of examples for us to analyze and learn from? Things would have taken us significantly longer to figure out and they would have let to a lot more injuries during our practice attempts. For example, when you teach your kids how to ride a bike you explain to them how the pedals, brakes, and handlebar work. You also explain basic physical principles such as the need to go above a certain speed to be able to keep their balance and the dangers of going too fast and risk falling. We then watch them as they try and provide situational instructions to make it easier and safer for them to learn. Unfortunately, when we teach autonomous vehicles we cannot provide this kind of natural language instructions and the vehicles cannot combine their understanding of our verbal instructions with their previous experience, without us providing hardcoded rules.
The Limits of Hardcoded Rules
Now imagine hardcoding the rules that teach your kids how to safely learn how to ride their bikes. You would never be sufficiently concise or cover all possible dangerous situations for them to be successful. Instead, your kids understand your language in front of a significant foundation of their own experience. This experience has already thaught them a number of reusable lessons that helps them overcome you imprecise and incomplete explanations. For example that it is bad for them to collide with a car, that it is painful to fall on their knees, that it takes longer to brake when they ride faster, that in the dark they need to turn on their lights to see and be seen, that it is not a great idea to ride with deflated tires, and so on. Later, when they learn how to drive a car, they will be able to combine these principles with lots of other concepts they learned throughout their life and consequently they will need a lot less training than if they needed to learn driving from scratch.
In other words, relying on CNNs without providing them any kind of modular input structure comes close to creating a human brain for each and every single use case.
But What About Transfer Learning?
To prevent us from having to train our learning models from scratch, we can leverage a modular set of pre-trained models that cover basic capabilities. This covers voice and image recognition, entity extraction, sentiment analysis, text and image categorization and clustering, and other core requirements. Developers simply call these pre-trained models via API and receive results without having to worry about training or hosting the model. Transfer learning is a step into the right direction, but in its current form only addresses the “tip of the iceberg”.
Transfer Learning Has no Situational Awareness
While modern transfer learning services often are truly revolutionary when it comes to solving hard problems in application programming, such as reliably reading hand-writing, detecting an impending hard drive failure, or using biometric detection for authentication, transfer learning cannot adapt to the individual application context. For example, image recognition APIs are typically trained on a vast library of objects they can reliably recognize. But these recognition capabilities end when it comes to specific use cases where a specific subset of objects has to be recognized under very specific conditions. One of my first use cases for AWS Rekognition, almost three years back, immediately and entirely coincidentally exposed this weekness. Rekognition simply was not able to recognize brigthly lit up digits, as they appear on your microwave, oven, and other kitchen appliances. To fix this, we needed to hard code pre-processing steps into the application that were not at all based on deep learning or even basic machine learning, but entirely relied on the good old recognition of contrast so that we could first identify these lit up digits, then draw a frame around them, and then reduce contrast and saturation to a degree where AWS Rekognition could read the words. There is no shortage of these very specific use cases that require significant hardcoding to arrive at a solution that is robust enough for production use. But these are only the mass produced commercially available solutions, what about the Boston Dynamics robot that learns how to open doors, how do Tesla cars drive safely in all types of situations, and how could Google’s Deep Mind teach itself to beat chess and Go world champions?
In part 2 of this series, scheduled for next Monday, I will answer these questions and discuss some of the most promising solutions toward achieving more generalizable AI capabilities.