Personal Wiki
Search
Search
Dark mode
Light mode
Explorer
Tag: reinforcement-learning
20 items with this tag.
May 27, 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
reinforcement-learning
llm
deep-learning
chain-of-thought
grpo
reward-modeling
reasoning
fine-tuning
May 27, 2026
Ch. 1 — Introduction
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 2 — DeepSeek-R1-Zero
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 3 — DeepSeek-R1
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 4 — Experiment
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 5 — Ethics and Safety Statement
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 6 — Conclusion, Limitation, and Future Work
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 7 — Author List
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 8 — Background
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 9 — Training Details
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 10 — Self-Evolution of DeepSeek-R1-Zero
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 11 — Evaluation of DeepSeek-R1
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 12 — More Analysis
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 13 — DeepSeek-R1 Distillation
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 14 — Discussion
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 15 — Related Work
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 16 — Open Weights, Code, and Data
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 17 — Evaluation Prompts and Settings
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Harness, Scaffold, and the AI Agent Terms Worth Getting Right
agentic-ai
agent-architecture
multi-agent
tool-calling
react-loop
memory-augmentation
llm-orchestration
reinforcement-learning
reward-modeling
grpo
llm
perceive-reason-act