Blog1

标签: MoE

此标签下有26条笔记。

2026年4月30日
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
2026年4月30日
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2026年4月30日
DeepSeek-V3 Technical Report
2026年4月30日
Kimi K2.5: Visual Agentic Intelligence
2026年4月30日
Kimi K2: Open Agentic Intelligence
2026年4月30日
Kimi-VL Technical Report
2026年4月30日
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
2026年4月30日
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2026年4月30日
OneRec Technical Report
2026年4月30日
OneRec-V2 Technical Report
2026年4月30日
Qwen3 Technical Report
2026年4月30日
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2026年4月30日
gpt-oss-120b & gpt-oss-20b Model Card
2026年4月30日
MLA 多头潜在注意力
2026年4月30日
MoE 混合专家模型
2026年4月30日
DeepSeek 系列模型
2026年4月30日
Kimi 系列模型
2026年4月30日
Qwen3
2026年4月30日
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
2026年4月30日
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2026年4月30日
Kimi K2: Open Agentic Intelligence
2026年4月30日
Kimi-VL Technical Report
2026年4月30日
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2026年4月30日
OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
2026年4月30日
Qwen3 Technical Report
2026年4月30日
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community