Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
2いいね 83 views回再生

Paper Reading: DeepSeek R1 - Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Join us for an insightful session on the groundbreaking paper, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

Session led by Mike A:   / malik-mike-a-32b63818b  

In this session, we will explore DeepSeek-R1, a state-of-the-art reasoning model that pushes the boundaries of reinforcement learning (RL) applied to large language models (LLMs). This paper introduces two novel models, DeepSeek-R1-Zero and DeepSeek-R1, showcasing advancements in reasoning capabilities through RL-driven self-evolution. Unlike traditional models relying heavily on supervised fine-tuning (SFT), DeepSeek-R1-Zero develops its reasoning abilities purely through RL, while DeepSeek-R1 combines RL with a multi-stage training pipeline for enhanced performance.

The paper highlights exceptional benchmarks achieved by DeepSeek-R1 on math, coding, and STEM-related reasoning tasks, where its performance rivals that of leading closed-source models such as OpenAI's o1-1217. Additionally, we’ll discuss distilling these capabilities into smaller, more efficient models to make advanced reasoning accessible for diverse applications.

Whether you're a researcher, developer, or enthusiast in AI and LLMs, this paper reading will provide an in-depth understanding of the novel reinforcement learning techniques driving DeepSeek-R1 and its implications for the future of AI-driven reasoning systems. Don't miss this opportunity to engage with cutting-edge advancements in the field!

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

GitHub - deepseek-ai/DeepSeek-R1

Mark Chen X Post: https://x.com/markchen90/status/18843...
GRPO Diagram: https://www.reddit.com/r/LocalLLaMA/c...
The Bitter Lesson: http://www.incompleteideas.net/IncIde...
Gradient Descent Image:   / 1*f9a162ghpmbitvtaua_llq.png  
V3: https://arxiv.org/abs/2412.19437
DeepSeek Math: https://arxiv.org/abs/2402.03300
rstar-math: https://arxiv.org/abs/2501.04519
Verify Step-by-Step: https://arxiv.org/abs/2305.20050
DeepSeek catch-up chart: https://media.licdn.com/dms/image/v2/...[…]41824000&v=beta&t=ubyKQniaCJTL37PzIOJi9YZRo1AF8yipuauSioyn59U
UC Berkeley Student $30 replication of Aha: https://x.com/jiayi_pirate/status/188...
Deep Agent R1-V replication: https://x.com/liangchen5518/status/18...
s1: https://arxiv.org/abs/2501.19393
DeepSeek-r1 cost breakdown: https://semianalysis.com/2025/01/31/d...
R1 Deep Dive: https://fireworks.ai/blog/deepseek-r1...
All in Podcast: https://podcasts.apple.com/us/podcast...
Peter Gostev LinkedIn:   / peter-gostev  
LLM Agents Learning: https://llmagents-learning.org/sp25

コメント