Loading...
7 entries with this tag
In 2020, OpenAI scaled GPT-2 by over 100×—to 175 billion parameters—and discovered something unexpected: the model could perform tasks it was never trained on, just by reading a few examples in its prompt. 'Language Models are Few-Shot Learners' didn't just set new benchmarks. It changed what we thought language models could do.
What if you could have a model with 671 billion parameters but only pay to run 37 billion? Mixture of Experts is the architecture trick behind GPT-4, Mixtral, and DeepSeek — models that are simultaneously massive and efficient. Three landmark papers explain how.
Two landmark papers revealed that AI model performance follows predictable mathematical laws—and that the industry was training models wrong. The Chinchilla paper showed that a 70B model trained on more data could outperform models 4× its size, reshaping how every major AI lab builds models today.
A beginner-friendly explanation of GPT-2 (2019), the paper that showed AI could write coherent, creative text by simply predicting the next word. Part 3 of our AI Papers Explained series.
A beginner-friendly explanation of BERT (Bidirectional Encoder Representations from Transformers), the 2018 paper that taught AI to understand language by reading in both directions. Follow-up to our 'Attention Is All You Need' explainer.
A beginner-friendly explanation of the groundbreaking 'Attention Is All You Need' paper that introduced Transformers. Learn what attention mechanisms are, why they matter, and how they power modern AI like ChatGPT.
A professional assessment of frontier AI capabilities across text, speech, image, video, and multimodal domains as of March 2026, with performance metrics and source references.