Blog1

标签: alignment

此标签下有4条笔记。

2026年5月16日
Training Language Models to Follow Instructions with Human Feedback
2026年5月07日
Qwen 技术报告
2026年4月30日
RLHF
- RLHF
- alignment
- PPO
- DPO
- RL
2026年4月30日
大语言模型基础

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community