Personal Wiki

Tag: grpo

20 items with this tag.

May 27, 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
May 27, 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
May 27, 2026
Ch. 1 — Introduction
May 27, 2026
Ch. 2 — DeepSeek-R1-Zero
May 27, 2026
Ch. 3 — DeepSeek-R1
May 27, 2026
Ch. 4 — Experiment
May 27, 2026
Ch. 5 — Ethics and Safety Statement
May 27, 2026
Ch. 6 — Conclusion, Limitation, and Future Work
May 27, 2026
Ch. 7 — Author List
May 27, 2026
Ch. 8 — Background
May 27, 2026
Ch. 9 — Training Details
May 27, 2026
Ch. 10 — Self-Evolution of DeepSeek-R1-Zero
May 27, 2026
Ch. 11 — Evaluation of DeepSeek-R1
May 27, 2026
Ch. 12 — More Analysis
May 27, 2026
Ch. 13 — DeepSeek-R1 Distillation
May 27, 2026
Ch. 14 — Discussion
May 27, 2026
Ch. 15 — Related Work
May 27, 2026
Ch. 16 — Open Weights, Code, and Data
May 27, 2026
Ch. 17 — Evaluation Prompts and Settings
May 27, 2026
Harness, Scaffold, and the AI Agent Terms Worth Getting Right

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community