AI History · EP 07

The factories that make AI
already run on AI

The NVIDIA H100 GPUs that train GPT-4 are made in TSMC and Samsung fabs. A single wafer takes 14 weeks and over 200 process steps. Does a human watch every step? No. AI is already running inside the fab too.

6 min read 2026.05.05 Industry · Applied AI

01First — the real bottleneck inside a fab

Hundreds of layers stack on a single wafer. Each layer needs thickness, CD (critical dimension), resistance, and defect inspection. If you measured every wafer at every step — measurement alone would add 14 weeks. Throughput would crash by more than half.

So the practical reality is "sample measurement". Measure 1 wafer out of every 25. The downside — if there's a defect on one of the 24 unmeasured wafers, you find out much later, after they've already moved through the next steps. Losses pile up.

⚠️ Two losses from sample measurement
① Detection lag — between when a fail first happens and when you catch it, every wafer in flight is at risk. On average 12-25 wafers are "already gone" by the time you spot it.
② Blind spots — 96% of wafers are never measured. They're statistically assumed to be normal, but outliers in the tail of the distribution can ship as-is.

This is where AI walked in. "Sensor data is already being collected on every wafer. Can we predict the measurement from that data?" That's the starting point of Virtual Metrology (VM). And the world's first production-grade VM system came out of Korea.

02SK hynix Panoptes — the first VM in mass production

🇰🇷
SK hynix × Gauss Labs · Panoptes
Started 2018 · Full deployment 2020 · DRAM/NAND fabs · SPIE 2024

Named after Argus Panoptes, the hundred-eyed giant of Greek myth. The meaning — "see all wafers, 100%." Built by SK hynix subsidiary Gauss Labs. The core algorithm is a Patch + Channel Independent Time-series Transformer (PatchTST pattern). Each sensor is patched as an independent channel.

In production at SK hynix's Icheon and Cheongju fabs on selected layers. The 2024 SPIE follow-up paper introduced Cross-Tool Attention: jointly learn the common pattern across same-type chambers and the per-tool specifics. A model trained on one tool now transfers to other tools.

Field-validated · Production deployed

03NVIDIA cuLitho — why a GPU company walked into the fab

🇺🇸
NVIDIA cuLitho × TSMC · Samsung · ASML
Announced GTC 2023.03 · TSMC integrated 2024 · Samsung joined 2025

One of the heaviest steps in EUV lithography is OPC (Optical Proximity Correction) — drawing the mask pattern with corrections for light diffraction. On a CPU cluster, one mask takes 2 weeks. NVIDIA announced cuLitho at GTC 2023 — 500 NVIDIA DGX H100 systems replace the workload of 40,000 CPU servers, cutting one mask's OPC from 2 weeks to ~8 hours. About 40× speedup.

At the 2023 GTC announcement, TSMC, Samsung, and ASML were named as collaboration partners. In 2024–25 TSMC began integrating cuLitho into its production OPC flow; Samsung and ASML are working on the same. After making the AI revolution possible, GPUs now also set the pace at which the chips themselves get manufactured.

2 weeks → 8 hours (40× faster)
📌 Why cuLitho became unavoidable
As nodes shrink (2nm, 1.4nm), OPC compute requirements grow exponentially. On legacy CPU clusters, single-mask OPC has started to exceed 2 weeks — and that pushes the entire chip schedule back. Without GPU-accelerated OPC, the production timeline of next-gen nodes itself is at risk. That is the industry consensus.

04Samsung Hyper-Auto Fab + Omniverse Twin

🇰🇷
Samsung Semiconductor × NVIDIA Omniverse
2023 NVIDIA-Samsung partnership · Digital twin fab simulation

The idea: turn a whole fab into a digital twin. Simulate equipment, piping, and robot routing in virtual space. When a new wafer arrives — simulate which tools it should pass through and in what order, pick the most efficient path, then push it back to the real fab. Samsung is publicly running this kind of work on NVIDIA Omniverse.

Anomaly simulation also works. If one EUV scanner goes offline for service, what happens to fab-wide throughput — answered in 1 second. AI proposes the alternative schedule before the human planner even starts.

Digital twin + real-time optimization

05And other companies — everyone jumped in

🇺🇸
Applied Materials ExtractAI
2022~ · Optical ↔ SEM active learning

An active learning loop bridging optical inspection (fast but lower accuracy) and SEM (slow but precise). The model only escalates uncertain defects to SEM for verification, then retrains on those labels. SEM-level accuracy at optical-level throughput.

🇹🇼
TSMC · Heterogeneous Graph Yield RCA
Multiple academic publications · Graph Neural Networks for yield analysis

When a lot's yield drops, build a graph with multiple node types — chamber, tool, recipe, lot — and apply different attention/embedding per node type. The model can automatically pinpoint which chamber × lot combination is accumulating the defects. RCA (root cause analysis) that used to take humans days is being shortened — academia and industry are both very active here.

🇺🇸
KLA Multi-Perspective DL Inspection
2023~ · Diffraction · polarization · multi-angle channels

Instead of feeding a single RGB image, feed multiple diffraction and polarization channels at multiple angles simultaneously. A CNN processes 4–7 channels jointly. The detailed algorithm is proprietary, but the term "Multi-Perspective DL" appears in KLA's official marketing material.

06So what does this all add up to

Anyone who's been with this series since EP01 will have noticed — every algorithm we've covered converges here. EP01 backprop, EP02 CNN (inspection), EP03 Transformer (VM, RCA), EP04 LLM (factory copilot), EP05 Diffusion (defect data synthesis), EP06 GPU/CUDA (cuLitho).

One more thing — none of these production deployments are at academic SOTA. The 2017 Transformer paper → 2024 SK hynix production was 7 years. Academia → industry production has a 3–5 year average lag. The models the field is excited about now (Mamba, FlashAttention 3) — will land in fabs around 2027–2030.

🔑 One-line summary
The fab that makes the GPUs that run AI is itself run by AI. From Hinton's EP01 paper, through EP02's AlexNet, into EP06's GPU silicon — the same algorithms are embedded in every step of manufacturing those very chips. A closed loop where the tool that makes itself runs on itself.

In the next post (EP08, the finale), we cover how any company applies all these models to its own data — RAG (Retrieval-Augmented Generation). The story of how a 2020 paper by Patrick Lewis turned, by 2026, into the standard architecture for every company's internal AI copilot.

🧪
Try it · AI Lab
Virtual metrology — predict thickness from sensors →
Adjust 4 sensor sliders (temperature, pressure, gas, RF) → real-time thickness prediction. Process 50 wafers and plot a prediction-accuracy scatter chart.
AI History · Series Navigation