AI Model Scaling Timeline

Watch 50 notable AI models appear chronologically, plotted by parameter count vs. benchmark performance (MMLU). Size encodes parameters, color encodes lab.

2017
Speed
Time
About this visualization — Parameter counts and MMLU benchmark scores are approximate, compiled from published papers, technical reports, and community estimates. Where exact parameter counts are undisclosed (e.g. GPT-4), widely-cited estimates are used. MMLU (Massive Multitask Language Understanding) is a standardized benchmark measuring knowledge across 57 subjects. Some models use 5-shot MMLU; others use 0-shot or CoT variants — scores are approximate and meant to illustrate the scaling trend rather than provide exact comparisons.
Released
Parameters
MMLU Score