H HF Daily Papers Jun 2 RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes
H HF Daily Papers Jun 2 OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
H HF Daily Papers Jun 2 Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
H HF Daily Papers Jun 2 LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation
H HF Daily Papers Jun 2 When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
H HF Daily Papers Jun 2 MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation
H HF Daily Papers Jun 2 Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism
H HF Daily Papers Jun 2 Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs
H HF Daily Papers Jun 2 Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
H HF Daily Papers Jun 2 VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization
H HF Daily Papers Jun 2 StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration
H HF Daily Papers Jun 2 Speculative Pipeline Decoding: Higher-Accuracy and Zero-Bubble Speculation via Pipeline Parallelism
H HF Daily Papers Jun 2 Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?
H HF Daily Papers Jun 2 Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models
H HF Daily Papers Jun 2 VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
H HF Daily Papers Jun 2 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
H HF Daily Papers Jun 2 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters
H HF Daily Papers Jun 1 Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
H HF Daily Papers Jun 1 The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement
H HF Daily Papers Jun 1 Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring
H HF Daily Papers Jun 1 OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents
H HF Daily Papers Jun 1 DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
H HF Daily Papers Jun 1 SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
H HF Daily Papers Jun 1 Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents
H HF Daily Papers Jun 1 From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors
H HF Daily Papers Jun 1 Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios
H HF Daily Papers Jun 1 Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation
H HF Daily Papers Jun 1 COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation
H HF Daily Papers Jun 1 SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue
H HF Daily Papers Jun 1 LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
H HF Daily Papers Jun 1 GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
H HF Daily Papers Jun 1 Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
H HF Daily Papers Jun 1 SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer
H HF Daily Papers May 31 Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
H HF Daily Papers May 31 Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
H HF Daily Papers May 31 ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
H HF Daily Papers May 31 Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments
H HF Daily Papers May 31 CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval
H HF Daily Papers May 31 SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control
H HF Daily Papers May 31 CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
H HF Daily Papers May 31 Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation
H HF Daily Papers May 31 Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering
H HF Daily Papers May 31 Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation
H HF Daily Papers May 31 UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents
H HF Daily Papers May 31 RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains
H HF Daily Papers May 31 PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions
H HF Daily Papers May 31 DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
H HF Daily Papers May 31 WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction
H HF Daily Papers May 31 When Should Models Change Their Minds? Contextual Belief Management in Large Language Models
H HF Daily Papers May 31 CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists
H HF Daily Papers May 31 When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems
H HF Daily Papers May 31 LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents
H HF Daily Papers May 31 AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios
H HF Daily Papers May 31 Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
H HF Daily Papers May 31 CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation
H HF Daily Papers May 31 minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
H HF Daily Papers May 31 YoCausal: How Far is Video Generation from World Model? A Causality Perspective
H HF Daily Papers May 31 UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering
H HF Daily Papers May 31 LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
H HF Daily Papers May 31 Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
H HF Daily Papers May 31 AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
B Berkeley AI Research May 8 Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling