estimate-train-time
Predict distributed LLM training time before you run. This tool estimates the wall-clock time for training large language models across multiple GPUs using 3D parallelism (pipeline, tensor, and data parallelism), helping you plan capacity and compare parallelization strategies without expensive trial runs.
Installation
pip install estimate-train-time # Coming soon to PyPINote: PyPI package is coming soon. For now, install directly from the repository:
git clone https://github.com/DebarghaG/estimate-train-time.git cd estimate-train-time pip install -e .
Quick Start
# List available example configurations estimate-train-time list-examples # Run prediction with a bundled example (Llama 7B on A100s) estimate-train-time predict --example llemma_7b_4_2_2_P
Output:
Estimated time cost of current training config: 9480819.17 us
= 9480.82 ms
= 9.4808 s
Features
- 3D Parallelism Support: Pipeline, tensor (model), and data parallelism
- Pre-trained Regressors: Bundled models for NVIDIA A100 and GH200 GPUs
- No GPU Required: Predictions run on CPU using trained regressors
- Extensible: Add your own GPU profiles and cluster configurations
Documentation
- Getting Started - Installation and first prediction
- Core Concepts - Understanding distributed training estimation
- Configuration Reference - Config file parameters
- CLI Reference - Command-line options
- Python API - Programmatic usage
- Examples - Usage examples and custom configurations
- Advanced - Kernel sampling and extending the tool
Python API
from estimate_train_time import one_batch_predict # Predict training time from a config file time_us = one_batch_predict("path/to/config.yml") print(f"One batch takes {time_us / 1e6:.2f} seconds")
Requirements
- Python 3.8+
- pandas, numpy, scikit-learn, xgboost, pyyaml, ijson, joblib
For GPU sampling (optional): torch, flash-attn, deepspeed
Acknowledgements
National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)
Citation
If you use this tool in your research, please cite our paper, accepted to HiPC 2025 (proceedings forthcoming):
@article{zhang2025efficient, title={Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM}, author={Zhang, Biyao and Zheng, Mingkai and Ganguly, Debargha and Zhang, Xuecen and Singh, Vikash and Chaudhary, Vipin and Zhang, Zhao}, journal={arXiv preprint arXiv:2509.22832}, year={2025} }
License
MIT License - see LICENSE for details.