AI History · EP 06

The diner-booth company
that owns AI compute

April 1993, a Denny's in San Jose, California. Three founders sat down to start a graphics card company. The words "artificial intelligence" never came up. Thirty years later, that company's market cap passed Apple's and Microsoft's.

6 min read 2026.05.05 1993 → 2026

01April 1993, a family restaurant

🍔
Jensen Huang · Chris Malachowsky · Curtis Priem
NVIDIA co-founders · 1993.04, San Jose Denny's · seed: $40,000

30-year-old Taiwanese-American Jensen Huang was an executive at LSI Logic. He met Chris Malachowsky and Curtis Priem — two graphics engineers from Sun Microsystems — at a Denny's in San Jose. The agreement, in one sentence: "Let's start a graphics card company." Six years later, they would coin a word: GPU.

October 1999. NVIDIA released a new chip called GeForce 256. They put a brand-new term into the marketing copy — "Graphics Processing Unit (GPU)." That word was born that day.

The problem they wanted to solve was simple — render 3D game frames fast. A frame is millions of pixels, and each pixel runs the same kind of calculation (lighting, texture mapping, transforms). CPUs handled them one pixel at a time, which was too slow. "Run the same calculation on hundreds of pixels in parallel" — that was the core idea of the GPU.

022007, the man who pulled GPUs out of gaming

Early 2000s. A few academics started something strange — "Could we run scientific computing on GPUs?" But the GPU APIs (OpenGL, DirectX) were graphics-only, so you had to express things like matrix multiplication as if it were texture compositing. It was so painful almost nobody did it.

⚙️
Ian Buck
Stanford PhD (2004) → NVIDIA · creator of CUDA · now NVIDIA VP

While doing his PhD at Stanford, Buck built BrookGPU. He joined NVIDIA in 2004 and redesigned the same idea at the chip level. The result, released in June 2007, was CUDA. You could now program a GPU in plain C. The barrier to entry for academia disappeared.

032009-2012, the secret academia found

June 2009. Stanford's Andrew Ng group published a paper at ICML — "Large-scale Deep Unsupervised Learning using Graphics Processors." Headline result: CUDA-trained models ran 70× faster than CPU equivalents. The field paid attention.

Then came the moment from EP02. Fall 2012, ImageNet competition. Hinton's two students — Alex Krizhevsky and Ilya Sutskever — entered. The GPUs they used to train: two NVIDIA GTX 580 consumer gamer cards. Their model, AlexNet, won. And — every vision lab in the world started buying NVIDIA GPUs.

📌 The day a gamer card became AI infrastructure
At the time, "AI" was a side category for NVIDIA. GeForce was for gamers, Quadro for workstations. After the AlexNet shock in 2012, Jensen Huang has said in multiple interviews and keynotes that "we did not expect, 25 years ago, that AI would become the heart of NVIDIA." The 2017 Volta architecture (with Tensor Cores) was the formal pivot to AI-first silicon.

042016, Google built its own chip

2013. An internal Google analysis: "If every user spends just 3 minutes a day on speech recognition, we'd need to double our datacenter footprint." The answer wasn't "buy more NVIDIA GPUs." It was "build our own chip."

🔷
Norman Jouppi
Google · TPU project lead · ISCA 2017 paper · ex-MIPS, ex-DEC Alpha designer

A Stanford PhD who designed MIPS and DEC Alpha CPUs in the 1980s. At Google he built the TPU (Tensor Processing Unit). The key difference: GPUs are designed for "any kind of parallel compute," but TPUs are designed to do "neural network matrix multiply, very well." Specialized for one task → 30-80× the efficiency of a GPU.

TPU v1 was unveiled at Google I/O in May 2016. The March 2016 AlphaGo vs Lee Sedol matches were actually run on TPUs. That same year Google rolled TPUs out across Search, Translate, and Photos. NVIDIA — saw a new competitor.

052024, every AI company lined up

NVIDIA datacenter GPU lineage 2017–2026:

V100
2017 · Volta
First Tensor Cores. Start of the AI-training silicon era.
A100
2020 · Ampere
Trained GPT-3. Pandemic-era cloud explosion.
H100
2022 · Hopper
GPT-4 training standard. $30,000+ per card.
H200
2024 · Hopper
141GB HBM3e — supplied by SK hynix.
B200
2024 · Blackwell
208B transistors. Two GPU dies on one board.
GB300
2025 · Blackwell Ultra
Successor to B200. Tuned for inference efficiency.

In 2024 NVIDIA's market cap crossed $3 trillion, passing both Apple and Microsoft. But the more striking number is — NVIDIA controls roughly 90% of the global datacenter GPU market. AMD MI300, Google TPU, Amazon Trainium, Microsoft Maia are all challenging — but the switching cost of leaving the CUDA ecosystem keeps customers in place.

06And one more thing — the NPU in your phone

2017. Apple shipped a chip in iPhone X called Apple Neural Engine. It's an NPU (Neural Processing Unit) — a chip that runs AI models directly on the phone. Photo auto-categorization, Face ID, on-device speech recognition all stopped going to the cloud and started running locally.

As of 2026, almost every phone SoC has an NPU. Apple A18 Pro Neural Engine (35 TOPS), Samsung Exynos NPU, Qualcomm Hexagon, Google Tensor G4. Small LLMs like Llama 3.2 1B now run directly on phones without ever touching the cloud. This is the next stage after the ChatGPT era from EP04 — "the model came down to the phone."

🔑 GPU vs TPU vs NPU
GPU (NVIDIA): most general-purpose. Both training and inference. Expensive and large. The datacenter standard.
TPU (Google): extreme matrix-multiply specialization. Dominant efficiency. Used only inside Google.
NPU (Apple/Samsung/...): small and efficient. Goes into phones, laptops, robots. Inference only.

07So what is the silicon story really about

In EP01 we saw Hinton's 1986 backpropagation. The algorithm sat dormant for 30 years. "Not enough data, computers too slow" — of those two limits from EP02, the second one was finally solved by GPUs.

Same algorithm. Same math. But between 1986 and 2012, compute got more than a million times faster. So — the same backprop suddenly started working. That's why some people argue AI was a hardware revolution, not an algorithm revolution.

In the next post (EP07), we look at how all six episodes so far converge — how AI is actually changing one industry. SK hynix Panoptes, NVIDIA cuLitho, Samsung Omniverse Twin. The on-the-ground story of how AI runs inside semiconductor fabs.

🧪
Try it · AI Lab
Compare CPU vs GPU matrix multiplication →
Run the same 8×8 matrix multiply on a CPU (sequential) and a GPU (parallel) and watch the difference. Crank up the matrix size to see the gap widen.
AI History · Series Navigation