Blog1

❯

❯

文件夹: Wiki/Sources

此文件夹下有123条笔记。

2026年5月16日
RoFormer: Enhanced Transformer with Rotary Position Embedding
2026年5月16日
Attention Is All You Need
2026年5月16日
Language Models are Few-Shot Learners
2026年5月16日
GPT-4 Technical Report
2026年5月16日
GPT-4o System Card
2026年5月16日
Training Language Models to Follow Instructions with Human Feedback
2026年5月16日
LLaMA: Open and Efficient Foundation Language Models
2026年5月16日
The Llama 3 Herd of Models
2026年5月16日
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2026年5月16日
Training Compute-Optimal Large Language Models
2026年5月07日
AHE Agentic Harness Engineering
2026年5月07日
DeepSeek V4
2026年5月07日
EmoSet 视觉情感数据集
2026年5月07日
Emu3 原生多模态模型
2026年5月07日
GLM-5 从 Vibe Coding 到 Agentic Engineering
2026年5月07日
GPT-2
2026年5月07日
Gen-Searcher
2026年5月07日
Kimi Linear 高效注意力架构
2026年5月07日
LLM Agent 综述 2023
2026年5月07日
LLM 可扩展水印
2026年5月07日
MiniMax-01 Lightning Attention
2026年5月07日
Normalizing Flows 归一化流
2026年5月07日
OpenAI o1 System Card
2026年5月07日
PaperBanana
2026年5月07日
Qwen 技术报告
2026年5月07日
Qwen-Image 技术报告
2026年5月07日
Qwen2.5 技术报告
2026年5月07日
Qwen2.5-VL 技术报告
2026年5月07日
Qwen3-VL 技术报告
2026年5月07日
Qwen3-VL-Embedding and Reranker
2026年5月07日
Seedance 2.0 视频生成
2026年5月07日
Thinking with Visual Primitives
2026年5月07日
Unify-Agent
2026年5月07日
VLM2Vec-V2
2026年5月07日
生成式推荐综述
2026年5月06日
Self-Evolving Agents 综述
2026年4月30日
Aes-R1: Unlocking the Essence of Beauty — Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
2026年4月30日
AIEdiT: Affective Image Editing Shaping Emotional Factors via Text Descriptions
2026年4月30日
Agent AI: Surveying the Horizons of Multimodal Interaction
2026年4月30日
Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
2026年4月30日
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
2026年4月30日
BAGEL: Emerging Properties in Unified Multimodal Pretraining
2026年4月30日
CLAP: Learning Audio Concepts From Natural Language Supervision
2026年4月30日
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2026年4月30日
Competitive Programming with Large Reasoning Models
2026年4月30日
Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)
2026年4月30日
Denoising Diffusion Probabilistic Models
2026年4月30日
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
2026年4月30日
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
2026年4月30日
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
2026年4月30日
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2026年4月30日
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2026年4月30日
DeepSeek-V3 Technical Report
2026年4月30日
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
2026年4月30日
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
2026年4月30日
EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation
2026年4月30日
EmoEdit: Evoking Emotions through Image Manipulation
2026年4月30日
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
2026年4月30日
Emu3.5: Native Multimodal Models are World Learners
2026年4月30日
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
2026年4月30日
Flow Matching for Generative Modeling
2026年4月30日
Generating Fearful Images: Investigating Potential Emotional Biases in Image-Generation Models
2026年4月30日
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
2026年4月30日
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
2026年4月30日
HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction
2026年4月30日
ImgEdit: A Unified Image Editing Dataset and Benchmark
2026年4月30日
InstructPix2Pix: Learning to Follow Image Editing Instructions
2026年4月30日
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
2026年4月30日
Kimi K2: Open Agentic Intelligence
2026年4月30日
Kimi K2.5: Visual Agentic Intelligence
2026年4月30日
Kimi k1.5: Scaling Reinforcement Learning with LLMs
2026年4月30日
Kimi-VL Technical Report
2026年4月30日
A Survey on LLM-based Autonomous Agents
2026年4月30日
LoRA: Low-Rank Adaptation of Large Language Models
2026年4月30日
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
2026年4月30日
Masked Autoencoders Are Scalable Vision Learners
2026年4月30日
MLP-Mixer: An all-MLP Architecture for Vision
2026年4月30日
Magic-MM-Embedding
2026年4月30日
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
2026年4月30日
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
2026年4月30日
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2026年4月30日
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
2026年4月30日
OCRBench v2: An Improved Benchmark for Evaluating LMMs on Visual Text
2026年4月30日
ObjEmbed: Towards Universal Multimodal Object Embeddings
2026年4月30日
OminiControl: Minimal and Universal Control for Diffusion Transformer
2026年4月30日
OmniDocBench: Benchmarking Diverse PDF Document Parsing
2026年4月30日
OmniGen2: Towards Instruction-Aligned Multimodal Generation
2026年4月30日
OneRec Technical Report
2026年4月30日
OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
2026年4月30日
OneRec-Think: In-Text Reasoning for Generative Recommendation
2026年4月30日
OneRec-V2 Technical Report
2026年4月30日
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
2026年4月30日
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
2026年4月30日
OpenOneRec Technical Report: An Open Foundation Model and Benchmark to Accelerate Generative Recommendation
2026年4月30日
PyTorch: An Imperative Style, High-Performance Deep Learning Library
2026年4月30日
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
2026年4月30日
Qwen3 Technical Report
2026年4月30日
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2026年4月30日
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing (RISEBench)
2026年4月30日
RzenEmbed: Towards Comprehensive Multimodal Retrieval
2026年4月30日
SAIL-Embedding: Omni-modal Embedding Foundation Model
2026年4月30日
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
2026年4月30日
Seedream 3.0 Technical Report
2026年4月30日
Seedream 4.0: Toward Next-generation Multimodal Image Generation
2026年4月30日
Show-o2: Improved Native Unified Multimodal Models
2026年4月30日
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
2026年4月30日
Step1X-Edit: A Practical Framework for General Image Editing
2026年4月30日
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2026年4月30日
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2026年4月30日
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
2026年4月30日
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
2026年4月30日
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
2026年4月30日
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
2026年4月30日
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
2026年4月30日
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2026年4月30日
VisionCreator: A Native Visual-Generation Agentic Model
2026年4月30日
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation
2026年4月30日
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
2026年4月30日
You Only Look Once: Unified, Real-Time Object Detection
2026年4月30日
gpt-oss-120b & gpt-oss-20b Model Card
2026年4月30日
olmOCR: Unlocking Trillions of Tokens in PDFs with VLMs
2026年4月29日
DreamOmni2: Multimodal Instruction-based Editing and Generation
2026年4月29日
llm-wiki

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community