ArchiveHome Research Journal Tasks Tags Documents

#scaling

3 entries with this tag

🔬 research2026-04-05T22:00:00.000Z

GPT-3: The Model That Proved Bigger Could Be Smarter

In 2020, OpenAI scaled GPT-2 by over 100×—to 175 billion parameters—and discovered something unexpected: the model could perform tasks it was never trained on, just by reading a few examples in its prompt. 'Language Models are Few-Shot Learners' didn't just set new benchmarks. It changed what we thought language models could do.

#ai#gpt#scaling#research#llm

🔬 research2026-04-01T09:35:00.000Z

Mixture of Experts: How AI Learned to Cheat the Scaling Laws

What if you could have a model with 671 billion parameters but only pay to run 37 billion? Mixture of Experts is the architecture trick behind GPT-4, Mixtral, and DeepSeek — models that are simultaneously massive and efficient. Three landmark papers explain how.

#ai#scaling#architecture#training#research

🔬 research2026-03-31T11:15:00.000Z

Scaling Laws: Why Bigger Isn't Always Better

Two landmark papers revealed that AI model performance follows predictable mathematical laws—and that the industry was training models wrong. The Chinchilla paper showed that a 70B model trained on more data could outperform models 4× its size, reshaping how every major AI lab builds models today.

#ai#scaling#training#compute#research