April 1993, a Denny's in San Jose, California. Three founders sat down to start a graphics card company. The words "artificial intelligence" never came up. Thirty years later, that company's market cap passed Apple's and Microsoft's.
30-year-old Taiwanese-American Jensen Huang was an executive at LSI Logic. He met Chris Malachowsky and Curtis Priem — two graphics engineers from Sun Microsystems — at a Denny's in San Jose. The agreement, in one sentence: "Let's start a graphics card company." Six years later, they would coin a word: GPU.
October 1999. NVIDIA released a new chip called GeForce 256. They put a brand-new term into the marketing copy — "Graphics Processing Unit (GPU)." That word was born that day.
The problem they wanted to solve was simple — render 3D game frames fast. A frame is millions of pixels, and each pixel runs the same kind of calculation (lighting, texture mapping, transforms). CPUs handled them one pixel at a time, which was too slow. "Run the same calculation on hundreds of pixels in parallel" — that was the core idea of the GPU.
Early 2000s. A few academics started something strange — "Could we run scientific computing on GPUs?" But the GPU APIs (OpenGL, DirectX) were graphics-only, so you had to express things like matrix multiplication as if it were texture compositing. It was so painful almost nobody did it.
While doing his PhD at Stanford, Buck built BrookGPU. He joined NVIDIA in 2004 and redesigned the same idea at the chip level. The result, released in June 2007, was CUDA. You could now program a GPU in plain C. The barrier to entry for academia disappeared.
June 2009. Stanford's Andrew Ng group published a paper at ICML — "Large-scale Deep Unsupervised Learning using Graphics Processors." Headline result: CUDA-trained models ran 70× faster than CPU equivalents. The field paid attention.
Then came the moment from EP02. Fall 2012, ImageNet competition. Hinton's two students — Alex Krizhevsky and Ilya Sutskever — entered. The GPUs they used to train: two NVIDIA GTX 580 consumer gamer cards. Their model, AlexNet, won. And — every vision lab in the world started buying NVIDIA GPUs.
2013. An internal Google analysis: "If every user spends just 3 minutes a day on speech recognition, we'd need to double our datacenter footprint." The answer wasn't "buy more NVIDIA GPUs." It was "build our own chip."
A Stanford PhD who designed MIPS and DEC Alpha CPUs in the 1980s. At Google he built the TPU (Tensor Processing Unit). The key difference: GPUs are designed for "any kind of parallel compute," but TPUs are designed to do "neural network matrix multiply, very well." Specialized for one task → 30-80× the efficiency of a GPU.
TPU v1 was unveiled at Google I/O in May 2016. The March 2016 AlphaGo vs Lee Sedol matches were actually run on TPUs. That same year Google rolled TPUs out across Search, Translate, and Photos. NVIDIA — saw a new competitor.
NVIDIA datacenter GPU lineage 2017–2026:
In 2024 NVIDIA's market cap crossed $3 trillion, passing both Apple and Microsoft. But the more striking number is — NVIDIA controls roughly 90% of the global datacenter GPU market. AMD MI300, Google TPU, Amazon Trainium, Microsoft Maia are all challenging — but the switching cost of leaving the CUDA ecosystem keeps customers in place.
2017. Apple shipped a chip in iPhone X called Apple Neural Engine. It's an NPU (Neural Processing Unit) — a chip that runs AI models directly on the phone. Photo auto-categorization, Face ID, on-device speech recognition all stopped going to the cloud and started running locally.
As of 2026, almost every phone SoC has an NPU. Apple A18 Pro Neural Engine (35 TOPS), Samsung Exynos NPU, Qualcomm Hexagon, Google Tensor G4. Small LLMs like Llama 3.2 1B now run directly on phones without ever touching the cloud. This is the next stage after the ChatGPT era from EP04 — "the model came down to the phone."
In EP01 we saw Hinton's 1986 backpropagation. The algorithm sat dormant for 30 years. "Not enough data, computers too slow" — of those two limits from EP02, the second one was finally solved by GPUs.
Same algorithm. Same math. But between 1986 and 2012, compute got more than a million times faster. So — the same backprop suddenly started working. That's why some people argue AI was a hardware revolution, not an algorithm revolution.
In the next post (EP07), we look at how all six episodes so far converge — how AI is actually changing one industry. SK hynix Panoptes, NVIDIA cuLitho, Samsung Omniverse Twin. The on-the-ground story of how AI runs inside semiconductor fabs.