General
updated
How to Synthesize Text Data without Model Collapse?
Paper
• 2412.14689
• Published • 53
SepLLM: Accelerate Large Language Models by Compressing One Segment into
One Separator
Paper
• 2412.12094
• Published • 11
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
and Adversarial Training with Large Speech Language Models
Paper
• 2306.07691
• Published • 13
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating
Inverse Short-Time Fourier Transform
Paper
• 2203.02395
• Published • 1
Scaling Laws for Floating Point Quantization Training
Paper
• 2501.02423
• Published • 26
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published • 55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published • 302
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published • 103
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper
• 2501.06282
• Published • 53
An Empirical Study of Autoregressive Pre-training from Videos
Paper
• 2501.05453
• Published • 41
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published • 100
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
• 2501.09732
• Published • 72
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper
• 2501.09686
• Published • 41
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding
Paper
• 2501.13106
• Published • 91
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
• 2501.18427
• Published • 25
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
• 2502.01534
• Published • 40
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
• 2501.18492
• Published • 88
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
• 2502.03275
• Published • 18
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion
Transformer
Paper
• 2502.01105
• Published • 21
Paper
• 2502.06049
• Published • 31
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper
• 2502.07737
• Published • 9
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published • 149
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published • 154
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
• 2502.12115
• Published • 46
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM
Multi-Agent Systems
Paper
• 2502.11098
• Published • 13
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published • 38
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising
Trajectory Sharpening
Paper
• 2502.12146
• Published • 16
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
• 2502.14768
• Published • 47
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
• 2502.18449
• Published • 75
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
• 2502.15814
• Published • 69
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via
Reinforcement Learning and Reasoning
Paper
• 2503.07608
• Published • 23
Personalize Anything for Free with Diffusion Transformer
Paper
• 2503.12590
• Published • 44
Being-0: A Humanoid Robotic Agent with Vision-Language Models and
Modular Skills
Paper
• 2503.12533
• Published • 68
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large
Reasoning Models with Iterative Retrieval Augmented Generation
Paper
• 2503.21729
• Published • 29
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published • 17
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published • 11
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published • 63
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published • 12
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
• 2504.12395
• Published • 16
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published • 34
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
• 2504.12364
• Published • 22
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through
the Lens of Internal Representations
Paper
• 2504.13816
• Published • 18
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published • 122
Kuwain 1.5B: An Arabic SLM via Language Injection
Paper
• 2504.15120
• Published • 121
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published • 78
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World
Model-based LLM Agents
Paper
• 2504.15785
• Published • 22
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
• 2504.17789
• Published • 23
Step1X-Edit: A Practical Framework for General Image Editing
Paper
• 2504.17761
• Published • 92
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image
Generation
Paper
• 2504.17502
• Published • 55
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published • 124
Breaking the Modality Barrier: Universal Embedding Learning with
Multimodal LLMs
Paper
• 2504.17432
• Published • 40
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
Paper
• 2504.17207
• Published • 30
Can Large Language Models Help Multimodal Language Analysis? MMLA: A
Comprehensive Benchmark
Paper
• 2504.16427
• Published • 18
BitNet v2: Native 4-bit Activations with Hadamard Transformation for
1-bit LLMs
Paper
• 2504.18415
• Published • 49
DeepCritic: Deliberate Critique with Large Language Models
Paper
• 2505.00662
• Published • 54
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with
Auto-Regressive Transformer
Paper
• 2505.04622
• Published • 27
Unified Continuous Generative Models
Paper
• 2505.07447
• Published • 42
Learning Dynamics in Continual Pre-Training for Large Language Models
Paper
• 2505.07796
• Published • 19
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published • 99
Thinkless: LLM Learns When to Think
Paper
• 2505.13379
• Published • 50