In July 1958, the New York Times reported on the unveiling of a "machine that learns by doing." Eleven years later, a single book delivered its death sentence. Thirty years after that, a single Nature paper brought it back.
July 8, 1958. The New York Times ran a prominent story on a US Navy demonstration in Washington — "NEW NAVY DEVICE LEARNS BY DOING." The press release contained this striking claim:
"The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence."
— New York Times, 1958.07.08 (paraphrasing the Navy press conference)The man behind the announcement was Frank Rosenblatt, a 30-year-old psychologist at the Cornell Aeronautical Laboratory. The machine he built was a roughly cabinet-sized device with about 400 photocells, called the Mark I Perceptron — the first hardware implementation of an artificial neural network.
The first person to take the hypothesis "a machine can recognize patterns like a human can" and build it in hardware. The Mark I could be shown letters and learn to classify them as 'A' or 'B'. The US Office of Naval Research funded the work.
The perceptron's operation is simple. Multiply each input by a weight, sum them, and if the sum crosses a threshold output 1, otherwise 0. If the prediction is wrong, nudge the weights a little. Try again. Adjust again. That's it. But the moment that loop ran, "a machine learning from data" had become a working concept for the first time.
Eleven years later, in 1969, two giants at MIT — Marvin Minsky and Seymour Papert — published a book simply titled Perceptrons. The cover featured two patterns and asked a simple question: "Are these the same?"
The book proved mathematically that a single-layer perceptron could not solve nonlinear problems like XOR. That is, it couldn't even learn the simple logic of "true when exactly one input is true."
The fix, in principle, was clear: stack multiple layers. But that opened a harder question — "how do you train all the weights in a multi-layer network simultaneously?" Nobody had an answer.
October 1986. Nature, volume 323, pages 533–536. A six-page paper titled "Learning representations by back-propagating errors." Three authors.
The core idea, in one sentence — "apply the chain rule of differentiation in reverse, from output back to input, to compute gradients for every weight in one pass." This is what we now call backpropagation.
The implication was immediate. Multi-layer networks could now be trained. Minsky's 17-year-old objection — that perceptrons can't learn XOR — was resolved. The door to deep neural networks was finally open.
"The algorithm is so robust that 40 years later, every neural network still trains the same way."
— ChatGPT, GPT-4, Stable Diffusion, Claude — all trained with backpropBackpropagation didn't trigger an immediate AI explosion. Throughout the 1990s, neural networks stayed at the margins. Two reasons:
So neural networks went through a second AI winter from the late 1990s through ~2010. Only a handful of researchers — Hinton, Yann LeCun, Yoshua Bengio — kept the embers alive. They would later be called the "godfathers of deep learning" and share the 2018 Turing Award.
Once in 1958, again in the 1990s. And each time, one person and one paper brought it back. In 1986 it was Hinton's backpropagation. In 2012 it was Krizhevsky's AlexNet.
The ChatGPT you use today, Stable Diffusion, autonomous driving, semiconductor fab AI — all of it traces back to Rosenblatt's cabinet-sized machine in 1958. And the way they all learn is exactly what Hinton wrote down in 1986.
In the next post (EP02), we step into the era that started in 1989 when Yann LeCun got a network to read handwritten ZIP codes at Bell Labs. How did his "LeNet" eventually become the camera in your phone, 30 years later?