Blog1

标签: clippings

此标签下有123条笔记。

2026年5月16日
Qwen-Image-2.0 Technical Report
2026年5月07日
A Survey on Generative Recommendation: Data, Model, and Tasks
- clippings
2026年5月07日
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
- clippings
2026年5月07日
DeepSeek V4
- clippings
2026年5月07日
EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes
- clippings
2026年5月07日
Emu3: Next-Token Prediction is All You Need
- clippings
2026年5月07日
GLM-5: from Vibe Coding to Agentic Engineering
- clippings
2026年5月07日
Gen-Searcher: Reinforcing Agentic Search for Image Generation
- clippings
2026年5月07日
Kimi Linear: An Expressive, Efficient Attention Architecture
- clippings
2026年5月07日
Language Models are Unsupervised Multitask Learners
- clippings
2026年5月07日
MiniMax-01: Scaling Foundation Models with Lightning Attention
- clippings
2026年5月07日
Normalizing Flows: An Introduction and Review of Current Methods
- clippings
2026年5月07日
OpenAI o1 System Card
- clippings
2026年5月07日
PaperBanana: Automating Academic Illustration for AI Scientists
- clippings
2026年5月07日
Qwen Technical Report
- clippings
- InsTag
2026年5月07日
Qwen-Image Technical Report
- clippings
2026年5月07日
Qwen2.5 Technical Report
- clippings
2026年5月07日
Qwen2.5-VL Technical Report
- clippings
2026年5月07日
Qwen3-VL Technical Report
- clippings
- Tools
2026年5月07日
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for Multimodal Retrieval
- clippings
2026年5月07日
Scalable watermarking for identifying large language model outputs
- clippings
2026年5月07日
Seedance 2.0: Advancing Video Generation for World Complexity
- clippings
2026年5月07日
The Rise and Potential of Large Language Model Based Agents: A Survey
- clippings
2026年5月07日
Thinking with Visual Primitives
- clippings
2026年5月07日
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
- clippings
2026年5月07日
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
- clippings
2026年5月05日
A Systematic Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution
- clippings
2026年5月04日
GPT-4 Technical Report
- clippings
2026年5月04日
GPT-4o System Card
- clippings
2026年4月30日
A Survey on Large Language Model based Autonomous Agents
- clippings
2026年4月30日
Adding Conditional Control to Text-to-Image Diffusion Models
- clippings
2026年4月30日
Affective Image Editing: Shaping Emotional Factors via Text Descriptions
- clippings
2026年4月30日
Agent AI: Surveying the Horizons of Multimodal Interaction
- clippings
2026年4月30日
Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
- clippings
- Turn
2026年4月30日
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- clippings
2026年4月30日
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
2026年4月30日
Attention Is All You Need
- clippings
2026年4月30日
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- clippings
- L
- H
- A
2026年4月30日
CLAP : Learning Audio Concepts From Natural Language Supervision
- clippings
2026年4月30日
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- clippings
2026年4月30日
Competitive Programming with Large Reasoning Models
2026年4月30日
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
- clippings
- Row
2026年4月30日
DeepSeek LLM Scaling Open-Source Language Models with Longtermism
2026年4月30日
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
2026年4月30日
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- clippings
2026年4月30日
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- clippings
2026年4月30日
DeepSeek-V3 Technical Report
- clippings
2026年4月30日
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
- clippings
2026年4月30日
Denoising Diffusion Probabilistic Models
- clippings
2026年4月30日
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
- clippings
2026年4月30日
Emerging Properties in Unified Multimodal Pretraining
- clippings
2026年4月30日
EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation
- clippings
- Image
2026年4月30日
EmoEdit: Evoking Emotions through Image Manipulation
- clippings
2026年4月30日
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
- clippings
2026年4月30日
Emu3.5: Native Multimodal Models are World Learners
- clippings
2026年4月30日
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
- clippings
2026年4月30日
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
- clippings
2026年4月30日
Flow Matching for Generative Modeling
- clippings
2026年4月30日
Generating Fearful Images: Investigating Potential Emotional Biases in Image-Generation Models
- clippings
2026年4月30日
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
- clippings
2026年4月30日
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
- clippings
2026年4月30日
HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction
- clippings
2026年4月30日
ImgEdit: A Unified Image Editing Dataset and Benchmark
2026年4月30日
InstructPix2Pix: Learning to Follow Image Editing Instructions
- clippings
2026年4月30日
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
2026年4月30日
Kimi K2: Open Agentic Intelligence
2026年4月30日
Kimi K2.5: Visual Agentic Intelligence
- clippings
2026年4月30日
Kimi k1.5: Scaling Reinforcement Learning with LLMs
- clippings
2026年4月30日
Kimi-VL Technical Report
- clippings
2026年4月30日
L u m i n a - D i M O O An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
- clippings
2026年4月30日
LLaMA: Open and Efficient Foundation Language Models
- clippings
- discriminant
2026年4月30日
Language Models are Few-Shot Learners
- clippings
2026年4月30日
LoRA: Low-Rank Adaptation of Large Language Models
- clippings
2026年4月30日
MLP-Mixer: An all-MLP Architecture for Vision
- clippings
2026年4月30日
Magic-MM-Embedding: Towards Visual-Token-Efficient Universal Multimodal Embedding with MLLMs
- clippings
2026年4月30日
Masked Autoencoders Are Scalable Vision Learners
- clippings
2026年4月30日
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation
- clippings
2026年4月30日
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
- clippings
2026年4月30日
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
- clippings
2026年4月30日
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
2026年4月30日
ObjEmbed: Towards Universal Multimodal Object Embeddings
- clippings
- token
2026年4月30日
OminiControl: Minimal and Universal Control for Diffusion Transformer
- clippings
2026年4月30日
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
- clippings
2026年4月30日
OmniGen2: Towards Instruction-Aligned Multimodal Generation
- clippings
2026年4月30日
OneRec Technical Report
2026年4月30日
OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
- clippings
2026年4月30日
OneRec-Think: In-Text Reasoning for Generative Recommendation
- clippings
2026年4月30日
OneRec-V2 Technical Report
- clippings
2026年4月30日
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
- clippings
2026年4月30日
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
- clippings
2026年4月30日
OpenOneRec Technical Report An Open Foundation Model and Benchmark to Accelerate Generative Recommendation
- clippings
2026年4月30日
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2026年4月30日
PyTorch: An Imperative Style, High-Performance Deep Learning Library
- clippings
2026年4月30日
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
- clippings
2026年4月30日
Qwen3 Technical Report
- clippings
2026年4月30日
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- clippings
2026年4月30日
RoFormer: Enhanced Transformer with Rotary Position Embedding
- clippings
2026年4月30日
RzenEmbed: Towards Comprehensive Multimodal Retrieval
- clippings
2026年4月30日
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
- clippings
2026年4月30日
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
- clippings
2026年4月30日
Seedream 3.0 Technical Report
- clippings
2026年4月30日
Seedream 4.0: Toward Next-generation Multimodal Image Generation
- clippings
2026年4月30日
Show-o2: Improved Native Unified Multimodal Models
- clippings
2026年4月30日
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
- clippings
2026年4月30日
Step1X-Edit: A Practical Framework for General Image Editing
- clippings
- Sub-tasks
2026年4月30日
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- clippings
- param
2026年4月30日
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- clippings
2026年4月30日
The Llama 3 Herd of Models
- clippings
2026年4月30日
Training Compute-Optimal Large Language Models
- clippings
2026年4月30日
Training language models to follow instructions with human feedback
- clippings
2026年4月30日
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- clippings
- ToT
2026年4月30日
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
- clippings
2026年4月30日
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
- clippings
- Samples
2026年4月30日
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
- clippings
2026年4月30日
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
- clippings
2026年4月30日
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
- clippings
2026年4月30日
VisionCreator: A Native Visual-Generation Agentic Model with Understanding, Thinking, Planning and Creation
- clippings
2026年4月30日
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation
- clippings
- Chats
2026年4月30日
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
- clippings
2026年4月30日
You Only Look Once: Unified, Real-Time Object Detection
- clippings
2026年4月30日
gpt-oss-120b & gpt-oss-20b Model Card
- clippings
2026年4月30日
pipeline: Unlocking Trillions of Tokens in PDFs with Vision Language Models
- clippings
2026年4月29日
DreamOmni2: Multimodal Instruction-based Editing and Generation
- clippings

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community