Blog1

标签: 预训练

此标签下有5条笔记。

2026年4月30日
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2026年4月30日
Training Compute-Optimal Large Language Models
2026年4月30日
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
2026年4月30日
RoFormer: Enhanced Transformer with Rotary Position Embedding
2026年4月30日
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community