Blog1

标签: GRPO

此标签下有12条笔记。

2026年5月07日
Gen-Searcher
2026年4月30日
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2026年4月30日
Emu3.5: Native Multimodal Models are World Learners
2026年4月30日
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
2026年4月30日
OmniGen2: Towards Instruction-Aligned Multimodal Generation
2026年4月30日
OneRec Technical Report
2026年4月30日
OneRec-Think: In-Text Reasoning for Generative Recommendation
2026年4月30日
OneRec-V2 Technical Report
2026年4月30日
推理模型训练方法比较 DeepSeek-R1 vs Kimi k1.5 vs Qwen3
2026年4月30日
GRPO 分组相对策略优化
2026年4月30日
推理模型与强化学习
2026年4月30日
OmniGen2: Towards Instruction-Aligned Multimodal Generation

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community