Blog1

❯

文件夹: AI阅读笔记

此文件夹下有69条笔记。

2026年4月30日
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2026年4月30日
CLAP: Learning Audio Concepts From Natural Language Supervision
2026年4月30日
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2026年4月30日
Training Compute-Optimal Large Language Models
2026年4月30日
Competitive Programming with Large Reasoning Models
2026年4月30日
Denoising Diffusion Probabilistic Models
2026年4月30日
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
2026年4月30日
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
2026年4月30日
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
2026年4月30日
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2026年4月30日
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2026年4月30日
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
2026年4月30日
DeepSeek-V3 Technical Report
2026年4月30日
Emu3.5: Native Multimodal Models are World Learners
2026年4月30日
FLUX.1 Kontext: Flow Matching Rectified Transformer for Unified Image Generation and Editing
2026年4月30日
Flow Matching for Generative Modeling
2026年4月30日
Language Models are Few-Shot Learners
2026年4月30日
GPT-4 Technical Report
2026年4月30日
GPT-4o System Card
2026年4月30日
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
2026年4月30日
HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction
2026年4月30日
Training language models to follow instructions with human feedback
2026年4月30日
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
2026年4月30日
Kimi K2.5: Visual Agentic Intelligence
2026年4月30日
Kimi K2: Open Agentic Intelligence
2026年4月30日
Kimi k1.5: Scaling Reinforcement Learning with LLMs
2026年4月30日
Kimi-VL Technical Report
2026年4月30日
LLaMA: Open and Efficient Foundation Language Models
2026年4月30日
The Llama 3 Herd of Models
2026年4月30日
LoRA: Low-Rank Adaptation of Large Language Models
2026年4月30日
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
2026年4月30日
Masked Autoencoders Are Scalable Vision Learners
2026年4月30日
MLP-Mixer: An all-MLP Architecture for Vision
2026年4月30日
Magic-MM-Embedding: Towards Visual-Token-Efficient Universal Multimodal Embedding with MLLMs
2026年4月30日
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
2026年4月30日
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2026年4月30日
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
2026年4月30日
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
2026年4月30日
ObjEmbed: Towards Universal Multimodal Object Embeddings
2026年4月30日
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
2026年4月30日
OmniGen2: Towards Instruction-Aligned Multimodal Generation
2026年4月30日
OneRec Technical Report
2026年4月30日
OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
2026年4月30日
OneRec-Think: In-Text Reasoning for Generative Recommendation
2026年4月30日
OneRec-V2 Technical Report
2026年4月30日
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
2026年4月30日
OpenOneRec: An Open Foundation Model and Benchmark to Accelerate Generative Recommendation
2026年4月30日
PyTorch: An Imperative Style, High-Performance Deep Learning Library
2026年4月30日
Qwen3 Technical Report
2026年4月30日
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2026年4月30日
RoFormer: Enhanced Transformer with Rotary Position Embedding
2026年4月30日
RzenEmbed: Towards Comprehensive Multimodal Retrieval
2026年4月30日
SAIL-Embedding: Omni-modal Embedding Foundation Model
2026年4月30日
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
2026年4月30日
Seedream 3.0 Technical Report
2026年4月30日
Seedream 4.0: Toward Next-generation Multimodal Image Generation
2026年4月30日
Show-o2: Improved Native Unified Multimodal Models
2026年4月30日
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
2026年4月30日
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2026年4月30日
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2026年4月30日
Attention Is All You Need
2026年4月30日
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2026年4月30日
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
2026年4月30日
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
2026年4月30日
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2026年4月30日
You Only Look Once: Unified, Real-Time Object Detection
2026年4月30日
gpt-oss-120b & gpt-oss-20b Model Card
2026年4月30日
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
2026年4月30日
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community