The A.I. field had long been dominated by so called Expert Systems. These systems translate expertise from humans into strictly defined rules in order to replicate a reasonably intelligent system. However, its sheer inability to understand unknown experiences made the A.I. field go look for other more scalable techniques that do provide those capabilities. Techniques such as machine learning which, mostly through deep learning, is revolutionizing almost everything around us. From consumer electronics to medical research to supply chain optimization. But, what is actually driving its capability to understand the unknown more effectively than an Expert System?
The machine learning landscape can roughly be divided into supervised and unsupervised techniques. In the unsupervised case the inherent structure of each dataset (e.g. density) is exploited to find patterns. Unfortunately, these kinds of techniques do not perform well in most real-world applications due to the lack of a clear optimization target. This means that the unsupervised machine learning algorithm simply does not know what to look for specifically.
“It is easy to see that in order to do machine learning effectively one needs labels, and a lot of them.”
With supervised machine learning the optimization target should be clearly defined upfront. That means that each data instance, the smallest indivisible entity that in bulk makes up the dataset, requires a label of what it is. For instance, the label 'pear' is given to an image (the data instance) that shows a basket of pears. That image is part of a dataset consisting of many more images of fruit all containing labels (‘pear’, ‘apple’, ‘banana’ etc.). The machine learning algorithm then calculates a function that best describes the relation of all images to the labels. So, if a newly obtained image containing a banana is given to the algorithm, it will tell us that a banana is present.
To draw a parallel to the Expert System approach, the label is the human expertise and the calculated function is the rules being summarized from it. However, the supervised algorithm is able to deal with more variations in input compared to the more rigid approach of the rules-based Expert System.
Supervised machine learning is driving the current explosion around us. Siri on your iPhone is able to listen to your request quite well because many different kinds of pronunciations of words and sentences have been labeled by humans. The Tesla car is able to detect surrounding vehicles and act accordingly because of the many labeled examples it has been given in advance. The recommendations you see at Amazon are based on past behavior of other users which function as labels. The list goes on and on.
“It is expected to be a 5 billion dollar business by 2023 according to Redpoint Ventures.”
It is easy to see that in order to do machine learning effectively one needs labels, and a lot of them. Acquiring these labels is, however, a laborious, difficult and costly undertaking. One needs to set up a pipeline where a large pool of humans label tens of thousands (often more) data instances. On top of that it needs to deal with labeler fatigue and labeler disagreement in order to govern the quality of the labels itself.
There are many companies concerned with this label acquisition problem. Amazon Mechanical Turk is the most prominent of them. They provide a platform where you have access to cheap (often overseas) labor that will label almost anything you throw at them. Others, such as Scale.ai, provide label acquisition pipelines only for a select few A.I. problems like image classification, yet with more streamlined processes. This market for label-acquisition platforms and the access they provide to cheap labor is getting more important. It is expected to be a 5 billion dollar business by 2023 according to Redpoint Ventures.
Here is the irony in all this. Whilst A.I. is being held responsible for (future) job loss , it actually needs an army of human labelers to be involved in order to make it practically viable. It is this silent workforce what truly is driving the A.I. boom right now.