GitHub - DebarghaG/estimate-train-time: Time cost estimator of LLM's distributed training [HiPC 2025]

2 min read Original article ↗

estimate-train-time

PyPI version Python 3.8+ License: MIT

Predict distributed LLM training time before you run. This tool estimates the wall-clock time for training large language models across multiple GPUs using 3D parallelism (pipeline, tensor, and data parallelism), helping you plan capacity and compare parallelization strategies without expensive trial runs.

Installation

pip install estimate-train-time  # Coming soon to PyPI

Note: PyPI package is coming soon. For now, install directly from the repository:

git clone https://github.com/DebarghaG/estimate-train-time.git
cd estimate-train-time
pip install -e .

Quick Start

# List available example configurations
estimate-train-time list-examples

# Run prediction with a bundled example (Llama 7B on A100s)
estimate-train-time predict --example llemma_7b_4_2_2_P

Output:

Estimated time cost of current training config: 9480819.17 us
                                               = 9480.82 ms
                                               = 9.4808 s

Features

  • 3D Parallelism Support: Pipeline, tensor (model), and data parallelism
  • Pre-trained Regressors: Bundled models for NVIDIA A100 and GH200 GPUs
  • No GPU Required: Predictions run on CPU using trained regressors
  • Extensible: Add your own GPU profiles and cluster configurations

Documentation

Python API

from estimate_train_time import one_batch_predict

# Predict training time from a config file
time_us = one_batch_predict("path/to/config.yml")
print(f"One batch takes {time_us / 1e6:.2f} seconds")

Requirements

  • Python 3.8+
  • pandas, numpy, scikit-learn, xgboost, pyyaml, ijson, joblib

For GPU sampling (optional): torch, flash-attn, deepspeed

Acknowledgements

National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)

Citation

If you use this tool in your research, please cite our paper, accepted to HiPC 2025 (proceedings forthcoming):

@article{zhang2025efficient,
  title={Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM},
  author={Zhang, Biyao and Zheng, Mingkai and Ganguly, Debargha and Zhang, Xuecen and Singh, Vikash and Chaudhary, Vipin and Zhang, Zhao},
  journal={arXiv preprint arXiv:2509.22832},
  year={2025}
}

License

MIT License - see LICENSE for details.