Personal Wiki
Search
Search
Dark mode
Light mode
Explorer
Tag: knowledge-distillation
19 items with this tag.
May 27, 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 1 — Introduction
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 2 — DeepSeek-R1-Zero
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 3 — DeepSeek-R1
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 4 — Experiment
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 5 — Ethics and Safety Statement
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 6 — Conclusion, Limitation, and Future Work
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 7 — Author List
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 8 — Background
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 9 — Training Details
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 10 — Self-Evolution of DeepSeek-R1-Zero
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 11 — Evaluation of DeepSeek-R1
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 12 — More Analysis
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 13 — DeepSeek-R1 Distillation
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 14 — Discussion
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 15 — Related Work
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 16 — Open Weights, Code, and Data
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Ch. 17 — Evaluation Prompts and Settings
reinforcement-learning
grpo
knowledge-distillation
chain-of-thought
reasoning
llm
May 27, 2026
Data Flywheel: What It Is and How It Works
data-flywheel
fine-tuning
lora
peft
llm
rag
guardrails
nvidia-nemo
nvidia-nim
nvidia-blueprints
agent-evaluation
agentic-ai
human-in-the-loop
knowledge-distillation
llmops