Trending Papers - Hugging Face

new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

byAK and the research community

Submitted by

taesiri

Fish Audio S2 Technical Report

Fish Audio S2 is an open-source text-to-speech system with multi-speaker capabilities, multi-turn generation, and instruction-following control through natural-language descriptions, utilizing a multi-stage training approach and production-ready inference engine.

Submitted by

taesiri

Fish Audio S2 Technical Report

Submitted by

Lingaaaaaaa

OpenClaw-RL: Train Any Agent Simply by Talking

OpenClaw-RL framework enables policy learning from diverse next-state signals across multiple interaction modalities using asynchronous training with PRM judges and hindsight-guided distillation.

Submitted by

Lingaaaaaaa

OpenClaw-RL: Train Any Agent Simply by Talking

OpenClaw-RL framework enables policy learning from diverse next-state signals across multiple interaction modalities using asynchronous training with PRM judges and hindsight-guided distillation.

Submitted by

akhaliq

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.

· Published on Apr 28, 2025

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

andito

Submitted by

andito

Submitted by

taesiri

Submitted by

taesiri

Submitted by

yulu2

Submitted by

yulu2

Submitted by

taesiri

Submitted by

taesiri

Submitted by

UglyToilet

MemOS: A Memory OS for AI System

MemOS, a memory operating system for Large Language Models, addresses memory management challenges by unifying plaintext, activation-based, and parameter-level memories, enabling efficient storage, retrieval, and continual learning.

· Published on Jul 4, 2025

Submitted by

UglyToilet

MemOS: A Memory OS for AI System

Submitted by

AdinaY

Submitted by

AdinaY

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

exander

Submitted by

exander

Submitted by

Junyi42

Submitted by

Junyi42

Submitted by

taesiri

Helios: Real Real-Time Long Video Generation Model

Helios is a 14 billion parameter autoregressive diffusion model for video generation that achieves real-time performance and high-quality long-video synthesis without conventional optimization techniques.

Submitted by

taesiri

Helios: Real Real-Time Long Video Generation Model

Self-Supervised Prompt Optimization

A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.

· Published on Feb 7, 2025

Self-Supervised Prompt Optimization

A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data.

Submitted by

Harold328

DVD: Deterministic Video Depth Estimation with Generative Priors

Video depth estimation framework DVD adapts pre-trained video diffusion models into deterministic single-pass depth regressors using structural anchors, latent manifold rectification, and global affine coherence for improved accuracy and efficiency.

· Published on Mar 12, 2026

Submitted by

Harold328

Submitted by

akhaliq

Very Large-Scale Multi-Agent Simulation in AgentScope

Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.

· Published on Jul 25, 2024

Submitted by

akhaliq

Submitted by

taesiri

Submitted by

taesiri

Submitted by

taesiri

Qwen3-TTS Technical Report

The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speech tokenizers for efficient streaming synthesis.

Qwen · Published on Jan 22, 2026

Submitted by

taesiri

Qwen3-TTS Technical Report

Qwen · Jan 22, 2026

Submitted by

lifuguan

Submitted by

lifuguan

Submitted by

taesiri

LTX-2: Efficient Joint Audio-Visual Foundation Model

LTX-2 is an open-source audiovisual diffusion model that generates synchronized video and audio content using a dual-stream transformer architecture with cross-modal attention and classifier-free guidance.

· Published on Jan 6, 2026

Submitted by

taesiri

Submitted by

hao-li

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content. We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions. Our content analysis of 16 instruction types shows that developers prioritize functional context, such as build and run commands (62.3%), implementation details (69.9%), and architecture (67.7%). We also identify a significant gap: non-functional requirements like security (14.5%) and performance (14.5%) are rarely specified. These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.

11 authors

· Published on Nov 17, 2025

Submitted by

hao-li

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

11 authors

· Nov 17, 2025

Submitted by

Liuff23

Submitted by

Liuff23

LightRAG: Simple and Fast Retrieval-Augmented Generation

LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.

5 authors

· Published on Oct 8, 2024

Submitted by

taesiri

Submitted by

taesiri

Submitted by

taesiri

Submitted by

taesiri

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

xssstory

Submitted by

xssstory

AutoDev: Automated AI-Driven Development

AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.

5 authors

· Published on Mar 13, 2024

AutoDev: Automated AI-Driven Development

AutoDev is an AI-driven software development framework that automates complex engineering tasks within a secure Docker environment, achieving high performance in code and test generation.

Submitted by

xiaobiaodu

Submitted by

xiaobiaodu

Submitted by

taesiri

Submitted by

taesiri

Submitted by

evanking

Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices

Monolingual ASR models trained on a balanced mix of high-quality, pseudo-labeled, and synthetic data outperform multilingual models for small model sizes, achieving superior error rates and enabling on-device ASR for underrepresented languages.

· Published on Sep 2, 2025

Submitted by

evanking

Submitted by

Charlie019

Submitted by

Charlie019

Submitted by

taesiri

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.

· Published on Nov 14, 2025

Submitted by

taesiri

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

DeepSeek-V3 Technical Report

DeepSeek-V3 is a parameter-efficient Mixture-of-Experts language model using MLA and DeepSeekMoE architectures, achieving high performance with efficient training and minimal computational cost.

DeepSeek · Published on Dec 27, 2024

DeepSeek-V3 Technical Report

DeepSeek-V3 is a parameter-efficient Mixture-of-Experts language model using MLA and DeepSeekMoE architectures, achieving high performance with efficient training and minimal computational cost.

Multi-Agent Collaboration via Evolving Orchestration

A centralized orchestrator dynamically directs LLM agents via reinforcement learning, achieving superior multi-agent collaboration in varying tasks with reduced computational costs.

14 authors

· Published on May 26, 2025

Multi-Agent Collaboration via Evolving Orchestration

A centralized orchestrator dynamically directs LLM agents via reinforcement learning, achieving superior multi-agent collaboration in varying tasks with reduced computational costs.

14 authors

· May 26, 2025

Kronos: A Foundation Model for the Language of Financial Markets

Kronos, a specialized pre-training framework for financial K-line data, outperforms existing models in forecasting and synthetic data generation through a unique tokenizer and autoregressive pre-training on a large dataset.

7 authors

· Published on Aug 2, 2025