About this visualization — Parameter counts and MMLU benchmark scores are approximate, compiled from
published papers, technical reports, and community estimates. Where exact parameter counts are undisclosed (e.g. GPT-4),
widely-cited estimates are used. MMLU (Massive Multitask Language Understanding) is a standardized benchmark measuring
knowledge across 57 subjects. Some models use 5-shot MMLU; others use 0-shot or CoT variants — scores are
approximate and meant to illustrate the scaling trend rather than provide exact comparisons.