The NVIDIA H100 GPUs that train GPT-4 are made in TSMC and Samsung fabs. A single wafer takes 14 weeks and over 200 process steps. Does a human watch every step? No. AI is already running inside the fab too.
Hundreds of layers stack on a single wafer. Each layer needs thickness, CD (critical dimension), resistance, and defect inspection. If you measured every wafer at every step — measurement alone would add 14 weeks. Throughput would crash by more than half.
So the practical reality is "sample measurement". Measure 1 wafer out of every 25. The downside — if there's a defect on one of the 24 unmeasured wafers, you find out much later, after they've already moved through the next steps. Losses pile up.
This is where AI walked in. "Sensor data is already being collected on every wafer. Can we predict the measurement from that data?" That's the starting point of Virtual Metrology (VM). And the world's first production-grade VM system came out of Korea.
Named after Argus Panoptes, the hundred-eyed giant of Greek myth. The meaning — "see all wafers, 100%." Built by SK hynix subsidiary Gauss Labs. The core algorithm is a Patch + Channel Independent Time-series Transformer (PatchTST pattern). Each sensor is patched as an independent channel.
In production at SK hynix's Icheon and Cheongju fabs on selected layers. The 2024 SPIE follow-up paper introduced Cross-Tool Attention: jointly learn the common pattern across same-type chambers and the per-tool specifics. A model trained on one tool now transfers to other tools.
Field-validated · Production deployedOne of the heaviest steps in EUV lithography is OPC (Optical Proximity Correction) — drawing the mask pattern with corrections for light diffraction. On a CPU cluster, one mask takes 2 weeks. NVIDIA announced cuLitho at GTC 2023 — 500 NVIDIA DGX H100 systems replace the workload of 40,000 CPU servers, cutting one mask's OPC from 2 weeks to ~8 hours. About 40× speedup.
At the 2023 GTC announcement, TSMC, Samsung, and ASML were named as collaboration partners. In 2024–25 TSMC began integrating cuLitho into its production OPC flow; Samsung and ASML are working on the same. After making the AI revolution possible, GPUs now also set the pace at which the chips themselves get manufactured.
2 weeks → 8 hours (40× faster)The idea: turn a whole fab into a digital twin. Simulate equipment, piping, and robot routing in virtual space. When a new wafer arrives — simulate which tools it should pass through and in what order, pick the most efficient path, then push it back to the real fab. Samsung is publicly running this kind of work on NVIDIA Omniverse.
Anomaly simulation also works. If one EUV scanner goes offline for service, what happens to fab-wide throughput — answered in 1 second. AI proposes the alternative schedule before the human planner even starts.
Digital twin + real-time optimizationAn active learning loop bridging optical inspection (fast but lower accuracy) and SEM (slow but precise). The model only escalates uncertain defects to SEM for verification, then retrains on those labels. SEM-level accuracy at optical-level throughput.
When a lot's yield drops, build a graph with multiple node types — chamber, tool, recipe, lot — and apply different attention/embedding per node type. The model can automatically pinpoint which chamber × lot combination is accumulating the defects. RCA (root cause analysis) that used to take humans days is being shortened — academia and industry are both very active here.
Instead of feeding a single RGB image, feed multiple diffraction and polarization channels at multiple angles simultaneously. A CNN processes 4–7 channels jointly. The detailed algorithm is proprietary, but the term "Multi-Perspective DL" appears in KLA's official marketing material.
Anyone who's been with this series since EP01 will have noticed — every algorithm we've covered converges here. EP01 backprop, EP02 CNN (inspection), EP03 Transformer (VM, RCA), EP04 LLM (factory copilot), EP05 Diffusion (defect data synthesis), EP06 GPU/CUDA (cuLitho).
One more thing — none of these production deployments are at academic SOTA. The 2017 Transformer paper → 2024 SK hynix production was 7 years. Academia → industry production has a 3–5 year average lag. The models the field is excited about now (Mamba, FlashAttention 3) — will land in fabs around 2027–2030.
In the next post (EP08, the finale), we cover how any company applies all these models to its own data — RAG (Retrieval-Augmented Generation). The story of how a 2020 paper by Patrick Lewis turned, by 2026, into the standard architecture for every company's internal AI copilot.