Personal Wiki
Search
Search
Dark mode
Light mode
Explorer
Tag: knowledge-distillation
18 items with this tag.
Apr 29, 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 1 — Introduction
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 2 — DeepSeek-R1-Zero
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 3 — DeepSeek-R1
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 4 — Experiment
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 5 — Ethics and Safety Statement
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 6 — Conclusion, Limitation, and Future Work
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 7 — Author List
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 8 — Background
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 9 — Training Details
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 10 — Self-Evolution of DeepSeek-R1-Zero
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 11 — Evaluation of DeepSeek-R1
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 12 — More Analysis
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 13 — DeepSeek-R1 Distillation
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 14 — Discussion
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 15 — Related Work
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 16 — Open Weights, Code, and Data
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
Apr 29, 2026
Ch. 17 — Evaluation Prompts and Settings
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm