Blog1

标签: RL

此标签下有17条笔记。

2026年5月01日
审美评估与推理
2026年4月30日
Competitive Programming with Large Reasoning Models
2026年4月30日
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2026年4月30日
Kimi K2.5: Visual Agentic Intelligence
2026年4月30日
Kimi K2: Open Agentic Intelligence
2026年4月30日
Kimi k1.5: Scaling Reinforcement Learning with LLMs
2026年4月30日
OneRec Technical Report
2026年4月30日
推理模型训练方法比较 DeepSeek-R1 vs Kimi k1.5 vs Qwen3
2026年4月30日
GRPO 分组相对策略优化
2026年4月30日
RLHF
- RLHF
- alignment
- PPO
- DPO
- RL
2026年4月30日
推理模型与强化学习
2026年4月30日
知识蒸馏 vs RL 哪种方式更能有效获得推理能力
2026年4月30日
Aes-R1: Unlocking the Essence of Beauty — Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
2026年4月30日
Competitive Programming with Large Reasoning Models
2026年4月30日
Kimi K2.5: Visual Agentic Intelligence
2026年4月30日
Kimi k1.5: Scaling Reinforcement Learning with LLMs
2026年4月30日
推理增强方法

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community