HF Daily Papers

Introduces Brain-IT-VQA, a benchmark and model for answering visual questions grounded in brain-imaging data (fMRI/EEG), enabling neural-decoding-based VQA.

HF Daily Papers Jun 2

StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration

StreamChar presents a decoupled orchestration framework for generating coherent long-horizon character audio-video streams, separating high-level planning from low-level synthesis.

HF Daily Papers Jun 2

Speculative Pipeline Decoding: Higher-Accuracy and Zero-Bubble Speculation via Pipeline Parallelism

Combines pipeline parallelism with speculative decoding to eliminate pipeline-bubble overhead and improve token acceptance rates in large-model inference.

HF Daily Papers Jun 2

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

Investigates whether foundation models can actively navigate to a specified viewpoint through visual exploration, revealing promising capabilities alongside systematic failure modes.

HF Daily Papers Jun 2

ESPO: Early-Stopping Proximal Policy Optimization

ESPO introduces an early-stopping criterion for PPO that prevents reward over-optimization during RLHF training, improving alignment stability across diverse preference datasets.

HF Daily Papers Jun 2

NITP: Next Implicit Token Prediction for LLM Pre-training

NITP proposes predicting latent implicit token representations rather than explicit tokens during pre-training, improving LLM reasoning quality without extra inference cost.

HF Daily Papers Jun 2

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

X-Stream frames multimodal large language models as multiplexers that simultaneously process and reason over multiple heterogeneous input streams, enabling richer multi-source understanding.

HF Daily Papers Jun 2

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Empirically compares discriminative vision-language and generative video diffusion pretraining for spatial intelligence tasks, finding generation-focused models often learn richer 3D representations.

HF Daily Papers Jun 2

Draft-OPD: On-Policy Distillation for Speculative Draft Models

Draft-OPD trains speculative decoding draft models via on-policy distillation from the target model, improving draft token quality and boosting overall inference throughput.

HF Daily Papers Jun 2

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

SkillAdaptor enables LLM agents to automatically refine reusable skills by learning from both successful and failed action trajectories, improving generalization to novel tasks.

HF Daily Papers Jun 2

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

K-BrowseComp benchmarks web browsing agents on tasks anchored in Korean-language web contexts, testing navigation, search, and comprehension on Korean-language sites.

HF Daily Papers Jun 2

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

VideoMLA applies low-rank KV cache compression to autoregressive video diffusion transformers, enabling minute-scale video generation while keeping memory footprint tractable.

HF Daily Papers Jun 2

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Crafter introduces a multi-agent pipeline that generates and iteratively edits publication-quality scientific figures from diverse inputs including tables, captions, and reference images.

HF Daily Papers Jun 2

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Explores how parameter-efficient fine-tuning methods scale when simultaneously training millions of personal LoRA adapters on a single shared trillion-parameter base model.

HF Daily Papers Jun 1

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Research shows LLM agent populations spontaneously develop compressed private languages that improve token efficiency but risk evading human oversight.

HF Daily Papers Jun 1

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Proposes using on-policy feedback to iteratively self-improve reward models in RLHF, addressing distributional mismatch that degrades reward quality at inference time.

HF Daily Papers Jun 1

Linear Scaling Video VLMs for Long Video Understanding

Presents a vision-language model architecture that scales linearly with video length, enabling practical long-video understanding without quadratic attention cost.

HF Daily Papers Jun 1

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Introduces a method for automatically discovering latent failure signals in vision-language-action agent trajectories to enable runtime safety monitoring.

HF Daily Papers Jun 1

How can embedding models bind concepts?

Studies how embedding models bind multiple concepts in shared vector spaces, finding superposition as the key representational binding strategy.

HF Daily Papers Jun 1

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

An automated benchmark for systematically auditing the quality and diversity of skills available to LLM agents in open-domain skill ecosystems.

HF Daily Papers Jun 1

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

A benchmark revealing that current LLM agents systematically fail on long-horizon data analysis tasks requiring sustained multi-step planning and context management.

HF Daily Papers Jun 1

VLM3: Vision Language Models Are Native 3D Learners

Demonstrates that standard vision-language models can learn 3D spatial representations natively without specialized 3D architectures, given appropriate training signals.

HF Daily Papers Jun 1

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Proposes a decoupled memory architecture that separates long-term world state from short-term rendering, enabling consistent video world generation at minute-long timescales.

HF Daily Papers Jun 1

PEEK: Picking Essential frames via Efficient Knowledge distillation

A knowledge distillation method for efficiently selecting essential keyframes from video streams, reducing compute while preserving task-critical visual information.

HF Daily Papers Jun 1

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

Enables open-ended skill acquisition in AI agents through self-play with co-evolving policies, generating increasingly challenging tasks without a fixed reward function.

HF Daily Papers Jun 1

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

A self-aware RL framework that trains agentic search systems to recognize when to stop exploring, preventing over-search that wastes compute without improving answer quality.

HF Daily Papers Jun 1

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

Introduces a benchmark and trajectory synthesis pipeline targeting policy-induced errors in GUI agents, enabling training of more error-resilient automation systems.

HF Daily Papers Jun 1

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Analyzes trojan backdoor attacks on agentic AI harnesses spanning from prompt injection to persistent control hijacking, and proposes corresponding defense strategies.

HF Daily Papers Jun 1

Exploring Autonomous Agentic Data Engineering for Model Specialization

Investigates AI agents autonomously running data engineering pipelines to generate specialized training data for fine-tuning foundation models on new domains.

HF Daily Papers Jun 1

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

A large-scale evaluation of neural TTS systems across diverse real-world scenarios, exposing consistent quality gaps in long-form and spontaneous speech generation.

HF Daily Papers Jun 1

Task-Focused Memorization for Multimodal Agents

Proposes selective task-focused memorization for multimodal agents, retaining only relevant information while discarding noise across long interaction histories.

HF Daily Papers Jun 1

dMoE: dLLMs with Learnable Block Experts

Introduces a mixture-of-experts architecture for diffusion language models with learnable block-level experts that improve generation quality and computational efficiency.

HF Daily Papers Jun 1

Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

Shows that teacher-student disagreements vary in learnability during on-policy distillation, and proposes token teachability scores to focus training on tractable disagreements.

HF Daily Papers Jun 1

GrepSeek: Training Search Agents for Direct Corpus Interaction

Trains LLM agents to directly query text corpora via grep-style operations, improving information retrieval accuracy by bypassing dense vector indexing.

HF Daily Papers Jun 1

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Automates the generation of reusable AI agent skills by distilling domain expertise from human experts, enabling systematic skill library construction at scale.

HF Daily Papers Jun 1

Trust-Region Behavior Blending for On-Policy Distillation

Applies trust-region constraints to blend teacher and student behaviors during on-policy distillation, improving training stability and final model performance.

HF Daily Papers Jun 1

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Introduces representation forcing — aligning intermediate hidden states — to train unified multimodal models without the quality bottleneck imposed by discrete tokenization.

HF Daily Papers Jun 1

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

A zero-shot speech synthesis system that generates expressive long-form audio for both monologue and natural multi-speaker dialogue without speaker-specific fine-tuning.

HF Daily Papers Jun 1

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Trains LLMs to reason over long contexts by learning from search agent trajectories guided by rubric-based reward signals, improving structured long-context reasoning.

HF Daily Papers Jun 1

Mellum2 Technical Report

Technical report for Mellum2, JetBrains' updated code-focused language model optimized for developer tooling tasks including completion and code understanding.

HF Daily Papers Jun 1

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

A 100K-image dataset using generative models to produce high-quality ground truth for training generalizable image restoration networks on diverse real-world degradations.

HF Daily Papers Jun 1

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

Generates structured 3D indoor scene layouts from functional textual descriptions, bridging the gap between natural-language room requirements and geometric scene generation.

HF Daily Papers Jun 1

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Proposes a streaming spatial audio generation system synchronized to video in real time using an autoregressive diffusion transformer, enabling immersive on-the-fly audio synthesis.

HF Daily Papers Jun 1

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

SANA-Streaming enables real-time video editing using a hybrid diffusion transformer that balances generation quality and latency for streaming applications.