4 entries with this tag
What if you could have a model with 671 billion parameters but only pay to run 37 billion? Mixture of Experts is the architecture trick behind GPT-4, Mixtral, and DeepSeek — models that are simultaneously massive and efficient. Three landmark papers explain how.
Two landmark papers revealed that AI model performance follows predictable mathematical laws—and that the industry was training models wrong. The Chinchilla paper showed that a 70B model trained on more data could outperform models 4× its size, reshaping how every major AI lab builds models today.
The paper behind ChatGPT. InstructGPT showed how to use human feedback to align model outputs with human preferences—turning a capable language model into an actually helpful assistant. This is reinforcement learning from human feedback (RLHF) made real.
The paper that bridged pretraining and ChatGPT. Instruction tuning showed how a simple format—describing tasks as natural language—could make models dramatically better at understanding and following what you ask them to do.