Trending Papers - Hugging Face

6 min read Original article ↗

new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Subscribe

byAK and the research community

Submitted by

taesiri

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.

Submitted by

taesiri

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.

Submitted by

lhmd

Submitted by

lhmd

Submitted by

wafer-bob

Submitted by

wafer-bob

Submitted by

taesiri

Submitted by

taesiri

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

Yuyang-z

Submitted by

Yuyang-z

Submitted by

RuofengYang

Submitted by

RuofengYang

Submitted by

taesiri

LongCat-Video Technical Report

LongCat-Video, a 13.6B parameter video generation model based on the Diffusion Transformer framework, excels in efficient and high-quality long video generation across multiple tasks using unified architecture, coarse-to-fine generation, and block sparse attention.

meituan-longcat LongCat

· Published on Oct 25, 2025

Submitted by

taesiri

LongCat-Video Technical Report

LongCat-Video, a 13.6B parameter video generation model based on the Diffusion Transformer framework, excels in efficient and high-quality long video generation across multiple tasks using unified architecture, coarse-to-fine generation, and block sparse attention.

Submitted by

akhaliq

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.

· Published on Apr 28, 2025

Submitted by

akhaliq

Submitted by

filicos

Submitted by

filicos

Submitted by

AaronHuangWei

Submitted by

AaronHuangWei

Submitted by

akhaliq

Submitted by

akhaliq

Submitted by

CoreloneH

Submitted by

CoreloneH

Submitted by

zbhpku

Submitted by

zbhpku

Submitted by

Paranioar

Submitted by

Paranioar

Submitted by

andito

Submitted by

andito

Submitted by

taesiri

Submitted by

taesiri

Submitted by

Kaining

Submitted by

Kaining

Submitted by

taesiri

Submitted by

taesiri

LightRAG: Simple and Fast Retrieval-Augmented Generation

LightRAG improves Retrieval-Augmented Generation by integrating graph structures for enhanced contextual awareness and efficient information retrieval, achieving better accuracy and response times.

  • 5 authors

· Published on Oct 8, 2024

Submitted by

Jinjing713

Submitted by

Jinjing713

Submitted by

taesiri

A Very Big Video Reasoning Suite

A large-scale video reasoning dataset and benchmark are introduced to study video intelligence capabilities beyond visual quality, enabling systematic analysis of spatiotemporal reasoning and generalization across diverse tasks.

Submitted by

taesiri

A Very Big Video Reasoning Suite

A large-scale video reasoning dataset and benchmark are introduced to study video intelligence capabilities beyond visual quality, enabling systematic analysis of spatiotemporal reasoning and generalization across diverse tasks.

Submitted by

akhaliq

Very Large-Scale Multi-Agent Simulation in AgentScope

Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendly tools.

· Published on Jul 25, 2024

Submitted by

akhaliq

Submitted by

taesiri

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

AutoResearchClaw is a multi-agent autonomous research system that improves scientific discovery through structured debate, self-healing execution, verifiable reporting, human collaboration, and evolutionary learning, outperforming previous systems on a benchmark while maintaining human oversight.

  • 35 authors

· Published on May 19, 2026

Submitted by

taesiri

Submitted by

liangjiaqing

Submitted by

liangjiaqing

Submitted by

thuzhaowang

Pixal3D: Pixel-Aligned 3D Generation from Images

Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.

Submitted by

thuzhaowang

Pixal3D: Pixel-Aligned 3D Generation from Images

Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.

Submitted by

xcjthu

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM4, a highly efficient large language model for end-side devices, achieves superior performance using innovations in sparse attention, pre-training datasets, training algorithms, and inference systems.

Submitted by

xcjthu

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM4, a highly efficient large language model for end-side devices, achieves superior performance using innovations in sparse attention, pre-training datasets, training algorithms, and inference systems.

Submitted by

mwxely

Submitted by

mwxely

Submitted by

taesiri

SAM 3: Segment Anything with Concepts

Segment Anything Model 3 achieves state-of-the-art performance in promptable concept segmentation and tracking by leveraging a unified model architecture with decoupled recognition and localization.

Submitted by

taesiri

SAM 3: Segment Anything with Concepts

Segment Anything Model 3 achieves state-of-the-art performance in promptable concept segmentation and tracking by leveraging a unified model architecture with decoupled recognition and localization.

Submitted by

nielsr

Geometric Context Transformer for Streaming 3D Reconstruction

LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from video streams using a geometric context transformer architecture with specialized attention mechanisms for coordinate grounding, dense geometric cues, and long-range drift correction, achieving stable real-time performance at 20 FPS.

Submitted by

nielsr

Geometric Context Transformer for Streaming 3D Reconstruction

LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from video streams using a geometric context transformer architecture with specialized attention mechanisms for coordinate grounding, dense geometric cues, and long-range drift correction, achieving stable real-time performance at 20 FPS.

Submitted by

nielsr

Stable Audio 3

Stable Audio 3 enables efficient variable-length audio generation and editing through latent diffusion models operating on a semantic-acoustic autoencoder, with adversarial post-training for improved speed and quality.

Submitted by

nielsr

Stable Audio 3

Stable Audio 3 enables efficient variable-length audio generation and editing through latent diffusion models operating on a semantic-acoustic autoencoder, with adversarial post-training for improved speed and quality.

Submitted by

imone

HRM-Text: Efficient Pretraining Beyond Scaling

A Hierarchical Recurrent Model architecture with specialized training on instruction-response pairs achieves competitive language modeling performance with significantly reduced computational requirements compared to traditional Transformer-based approaches.

Submitted by

imone

HRM-Text: Efficient Pretraining Beyond Scaling

A Hierarchical Recurrent Model architecture with specialized training on instruction-response pairs achieves competitive language modeling performance with significantly reduced computational requirements compared to traditional Transformer-based approaches.

Submitted by

zhen-nan

L2P: Unlocking Latent Potential for Pixel Generation

Latent-to-Pixel transfer paradigm efficiently leverages pre-trained latent diffusion models to create pixel-space models with minimal training overhead and high-resolution generation capabilities.

· Published on May 12, 2026

Submitted by

zhen-nan

Submitted by

BryanWangNLP

Submitted by

BryanWangNLP

Submitted by

Ziqi

Submitted by

Ziqi

Submitted by

taesiri

Submitted by

taesiri

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

Rbin

RAG-Anything: All-in-One RAG Framework

RAG-Anything is a unified framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

Submitted by

Zuica96

Submitted by

Zuica96

Submitted by

LakshyAAAgrawal

Submitted by

LakshyAAAgrawal

Submitted by

Yirany

Submitted by

Yirany

Submitted by

taesiri

Submitted by

taesiri