GitHub - AMA-CMFAI/LAMBDA: This is the offical repository of paper "LAMBDA: A large Model Based Data Agent". https://www.polyu.edu.hk/ama/cmfai/lambda.html

We introduce LAMBDA, a novel open-source, code-free multi-agent data analysis system that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language.

News

LAMBDA App for macOS and Windows has been released. Details can be found in Released. (Hint: There are some problems with the kernel installation in the APP. You should run ipython kernel install --name lambda --user to install the kernel in advance.)
Docs site is available!

Key Features

Code-Free Data Analysis: Perform complex data analysis tasks through human language instruction.
Multi-Agent System: Utilizes two key agent roles, the programmer and the inspector, to generate and debug code seamlessly.
User Interface: This includes a robust user interface that allows direct user intervention in the operational loop.
Model Integration: Flexibly integrates external models and algorithms to cater to customized data analysis needs.
Automatic Report Generation: Concentrate on high-value tasks, rather than spending time and resources on report writing and formatting.
Jupyter Notebook Exporting: Export the code and the results to Jupyter Notebook for reproduction and further analysis flexibly.

Getting Started

Installation

First, clone the repository.

git clone https://github.com/AMA-CMFAI/LAMBDA.git
cd LAMBDA

Then, we recommend creating a Conda environment for this project and installing the dependencies by following the commands:

conda create -n lambda python=3.10
conda activate lambda

Then, install the required packages:

pip install -r requirements.txt

Next, you should install the Jupyter kernel to create a local Code Interpreter:

ipython kernel install --name lambda --user

Configuration to Easy Start

To use the Large Language Models, you should have an API key from OpenAI or other companies. Besides, we support OpenAI-Style interface for your local LLMs once deployed, available frameworks such as Ollama, LiteLLM, LLaMA-Factory.

Here are some products that offer free APIkeys for your reference: OpenRouter and SILICONFLOW

Set your API key, models and working path in the config.yaml:

#================================================================================================
#                                       Config of the LLMs
#================================================================================================
conv_model : "gpt-4.1-mini" # Choose the model you want to use. We highly recommned using the advanced model.
programmer_model : "gpt-4.1-mini" 
inspector_model : "gpt-4.1-mini"
api_key : "sk-xxxxxxx" # The API Keys you buy.
base_url_conv_model : 'https://api.openai.com/v1' # The base url from the provider.
base_url_programmer : 'https://api.openai.com/v1'
base_url_inspector : 'https://api.openai.com/v1'


#================================================================================================
#                                       Config of the system
#================================================================================================
streaming : True
project_cache_path : "cache/conv_cache/" # Local cache path
max_attempts : 5 # The max attempts of self-correcting
max_exe_time: 18000 # The maximum time for the execution

#knowledge integration
retrieval : False # Whether to start a knowledge retrieval. If you don't create your knowledge base, you should set it to False

Finally, run the following command to start the LAMBDA with GUI:

Demonstration Videos

The performance of LAMBDA in solving data science problems is demonstrated in several case studies, including:

Planning Works

Create a Logger for log.
Pre-installation of popular packages in the kernel.
Replace Gradio UI with OpenWebUI.
Refactor the Knowledge Integration and Knowledge base module by ChromaDB.
Add a Docker image for easier use.
Docsite.

Updating History

See Docs site.

Related Works

If you are interested in Data Agent, you can take a look at :

Our survey paper [A Survey on Large Language Model-based Agents for Statistics and Data Science]
and a reading list: [Paper List of LLM-based Data Science Agents]

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Thank the contributors and the communities for their support and feedback.

If you find our work useful in your research, consider citing our paper by:

@article{sun2025lambda,
  title={Lambda: A large model based data agent},
  author={Sun, Maojun and Han, Ruijian and Jiang, Binyan and Qi, Houduo and Sun, Defeng and Yuan, Yancheng and Huang, Jian},
  journal={Journal of the American Statistical Association},
  pages={1--13},
  year={2025},
  publisher={Taylor \& Francis}
}

@article{sun2025survey,
  title={A survey on large language model-based agents for statistics and data science},
  author={Sun, Maojun and Han, Ruijian and Jiang, Binyan and Qi, Houduo and Sun, Defeng and Yuan, Yancheng and Huang, Jian},
  journal={The American Statistician},
  pages={1--14},
  year={2025},
  publisher={Taylor \& Francis}
}