Join us virtually this Wednesday at 6pm to kick off our monthly Paper Review series! We will be holding these sessions every 4 weeks on Zoom. The first one will be led by Josh Phillips:
This month we will be exploring the current evolution in AI reasoning models. We will start with a short review of the history of combining reinforcement learning with tree search for reasoning (AlphaGo/MuZero). Then we will cover the real start of the reasoning race with the chain-of-thought innovations highlighted by “Let’s Verify Step by Step.” And finally of course, we’ll cover the most important insights from DeepSeek’s three recent papers that build on all of this prior foundation:
- DeepSeek-V3 – a highly efficient MoE-based approach (671B parameters, only 37B active per token) powered by advanced CUDA/NCCL optimizations and low-precision (FP8) training, making huge models tractable on constrained hardware.
- DeepSeek-R1 – the first open demonstration that an LLM can learn complex reasoning purely through RL rewards, without any annotated examples (r1-zero)
- DeepSeekMath– which introduces GRPO (an alternative to PPO that simplifies training and removes the need for a separate value network).
|