Blog1

标签: DPO

此标签下有5条笔记。

2026年5月11日
The Llama 3 Herd of Models
2026年4月30日
OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
2026年4月30日
DPO 直接偏好优化
2026年4月30日
RLHF
- RLHF
- alignment
- PPO
- DPO
- RL
2026年4月30日
OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community