Three Milestones in Machine Learning from 1958, 1986, and 1989

Torsten Volk
4 min readSep 28, 2021

Understanding the following three milestones in machine learning has proven very useful, at least for me, when defining and executing machine learning projects today.

Rosenblatt: The perceptron: a probabilistic model for information storage and organization in the brain, 1958

The first of these questions is in the province of sensory physiology, and is the only one for which appreciable understanding has been achieved. This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory. With regard to the second question, two alternative positions have been maintained. The first suggests that storage of sensory information is in the form of coded representations or images, with some sort of one-to-one mapping between the sensory stimulus.

My quick illustration and calculation of why a single layer of neutrons is not sufficient to distinguish between Ford and Ferrari

In 1958, Frank Rosenblatt created the perceptron, consisting of one layer of neurons and ran experiments that come very close to today’s key neural network learning use cases around image recognition. For example, he attempted to train the perceptron to differentiate between photos of women and men, in exactly the same way we would train a Tensorflow model to achieve this same goal. The reason Rosenblatt did not achieve good results sounds trivial today. He simply did not have enough compute horsepower to implement multiple layers of neurons. One single layer was simply not able to deconstruct the pixels in photos in a sufficiently granular manner to capture the essence of what the photo was showing. However, Rosenblatt and the rest of the AI world did not know how close they actually were to success. On the other hand, without modern silicon chips, success simply was not possible in 1958.

F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain, 1958, DOI: 10.1037/h0042519

Rumelhart, Hinton, Williams: Learning representations by backpropagating errors, 1986

We describe a new learning procedure, back-propagation, for networks of neuron-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure.

Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). https://doi.org/10.1038/323533a0

Rumelhart, Hinton, and Williams took Rosenblatt’s concept of the Perceptron to the next level by introducing hidden layers to the neural network and by defining and optimizing the connections within and between these hidden layers through backpropagation. Backpropagation uses a loss function, such as gradient descent, to adjust the weights of the neurons within each layer, in order to minimize the overall loss. The principle of backpropagation has been crucial to the overall development of deep learning as it now allows us to “throw hardware at the problem.” Modern GPUs use their cores to run gradient descent and other functions for backpropagation in parallel to optimally fit the learning model to its training set. Unfortunately, there were GPUs or even CPUs in 1986 that were powerful enough for today’s use cases. But that would change in 1993.

LeCun: Handwritten digit recognition: Applications of neural net chips and automatic learning. IEEE Communication, 1989

Two novel methods for achieving handwritten digit recognition are described. The first method is based on a neural network chip that performs line thinning and feature extraction using local template matching. The second method is implemented on a digital signal processor and makes extensive use of constrained automatic learning. Experimental results obtained using isolated handwritten digits taken from postal zip codes, a rather difficult data set, are reported and discussed.

Y. LeCun, L. D. Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard. Handwritten digit recognition: Applications of neural net chips and automatic learning. IEEE Communication, pages 41–46, November 1989. invited paper.

Everyone with any interest in machine learning needs to take one minute to watch Yann LeCun’s demo of the first working convolutional neural network (CNN).

LeCun’s original animation to explain LeNet 5

LeCun’s own animation visualizes how his CNN recognizes handwritten numbers from different people by searching for basic patterns within these numbers and then gradually combining the recognized patterns into the overall number. The training process of the CNN was the key to the ability to recognize numbers without significant delay. Instead of matching patterns across the entire set of training images, LeCun used algorithms that were able to automatically deconstruct each image into more and more basic patterns to ultimately arrive at the core building blocks for each number. Only comparing these core building blocks to the actual handwritten numbers allowed LeCun’s CNN to reduce compute requirements to a degree that allowed identifying handwriting in near real-time.

--

--

Torsten Volk

Artificial Intelligence, Cognitive Computing, Automatic Machine Learning in DevOps, IT, and Business are at the center of my industry analyst practice at EMA.