AI Model Scaling Timeline — Data Visualization Gallery

About this visualization — Parameter counts and MMLU benchmark scores are approximate, compiled from published papers, technical reports, and community estimates. Where exact parameter counts are undisclosed (e.g. GPT-4), widely-cited estimates are used. MMLU (Massive Multitask Language Understanding) is a standardized benchmark measuring knowledge across 57 subjects. Some models use 5-shot MMLU; others use 0-shot or CoT variants — scores are approximate and meant to illustrate the scaling trend rather than provide exact comparisons.