SEMamba (Accepted to IEEE SLT 2024)
This is the official implementation of the SEMamba paper.
For more details, please refer to: An Investigation of Incorporating Mamba for Speech Enhancement
NeurIPS 2024 competition : URGENT challenge 2024 ( oral presentation )
-
A speech enhancement (SE) challenge aiming to build universal, robust, diverse, and generalizable SE models.
-
The challenge involves diverse distortions, including
- additive noise,
- reverberation,
- clipping,
- bandwidth limitations,
with all types of sampling frequencies supported by a single model.
-
Requires handling a large-scale dataset (~1.5 TB) and includes ranking based on 13 metrics in classes of
- non-intrusive,
- intrusive,
- downstream-task-independent,
- downstream-task-dependent,
- subjective
SE metrics.
-
Achieved 4th place among 70 participating teams (>20 teams joined to the final stage).
-
Deliver an oral presentation at the NeurIPS 2024 workshop, Vancouver, Canada.
-
Demo website Live Demo Website
✅ (Updated June 3) Online Demo on HuggingFace
-
You can now upload or record audio directly on our Hugging Face demo https://huggingface.co/spaces/rc19477/Speech_Enhancement_Mamba for speech enhancement.
-
We now provide another model,
ckpts/vd.pth, trained on both the VCTK-Demand and DNS-2020 corpora. The model achieves the following performance:
| Dataset | PESQ | CSIG | CBAK | COVL | STOI |
|---|---|---|---|---|---|
| DNS-2020 | 3.66 | 2.88 | 4.33 | 2.67 | 0.98 |
| DNS-2020 w/PCS | 3.70 | 2.87 | 3.45 | 2.67 | 0.98 |
| VCTK-Demand | 3.56 | 4.73 | 4.00 | 4.25 | 0.96 |
| VCTK-Demand w/PCS | 3.75 | 4.76 | 3.67 | 4.37 | 0.96 |
✅ (Updated May 14) Now supports mamba-ssm 2.2.x
If you already installed mamba-ssm 2.2.x, you can still run this repo with a few modifications:
Steps to run:
# Step 1: Copy local override cp -R mamba_install/mamba_ssm/ . # Step 2: Launch training sh run.sh
🐳 Docker Support
We provide pre-built Docker environments to simplify setup:
-
x86 systems Tested on A100, RTX 4090, and RTX 3090 👉 Visit x86 Docker Repo
-
ARM systems Tested on NVIDIA GH200 👉 Visit GH200 Docker Repo
The Mamba-2 framework is designed to support both Mamba-1 and Mamba-2 model structures.
Requirement
* Python >= 3.9
* CUDA >= 12.0
* PyTorch == 2.2.2
Model
Speech Enhancement Results
ASR Word Error Rate
We have tested the ASR results using OpenAI Whisper on the test set of VoiceBank-DEMAND.
The evaluation code will be released in the future.
Additional Notes
-
Ensure that both the
nvidia-smiandnvcc -Vcommands show CUDA version 12.0 or higher to verify proper installation and compatibility. -
Currently, it supports only GPUs from the RTX series and newer models. Older GPU models, such as GTX 1080 Ti or Tesla V100, may not support the execution due to hardware limitations.
Installation
(Suggested:) Step 0 - Create a Python environment with Conda
It is highly recommended to create a separate Python environment to manage dependencies and avoid conflicts.
conda create --name mamba python=3.9 conda activate mamba
Step 1 - Install PyTorch
Install PyTorch 2.2.2 from the official website. Visit PyTorch Previous Versions for specific installation commands based on your system configuration (OS, CUDA version, etc.).
Step 2 - Install Required Packages
After setting up the environment and installing PyTorch, install the required Python packages listed in requirements.txt.
pip install -r requirements.txt
Step 3 - Install the Mamba Package
Navigate to the mamba_install directory and install the package. This step ensures all necessary components are correctly installed.
cd mamba_install pip install .
📌 Note: You might run pip install numpy==1.26.4 after these to prevent unpredictable issues.
mamba_install) can help prevent package issues and ensure compatibility between different dependencies. It is recommended to follow these steps carefully to avoid potential conflicts.
CUDA>=12.0 and installed pytorch 2.2.2, you could try mamba 1.2.0.post1 instead of mamba 1.2.0 as follow:
cd mamba-1_2_0_post1 pip install .
Training the Model
Step 1: Prepare Dataset JSON
Create the dataset JSON file using the script sh make_dataset.sh. You may need to modify make_dataset.sh and data/make_dataset_json.py.
Alternatively, you can directly modify the data paths in data/train_clean.json, data/train_noisy.json, etc.
Step 2: Run the following script to train the model.
📌 Note: You can use tensorboard --logdir exp/path_to_your_exp/logs to check your training log
📌 Note: If you would like to train SEMamba with the pretrained_pesq_discriminator weights, modify the run.sh to use the configuration: --config recipes/SEMamba_advanced/SEMamba_advanced_pretrainedD.pth.
Using the Pretrained Model
Modify the --input_folder and --output_folder parameters in pretrained.sh to point to your desired input and output directories. Then, run the script.
Implementing the PCS Method in SEMamba
(Updated May 14, 2025) Fix issue: inference.py not parsing bool values correctly There are two methods to implement the PCS (Perceptual Contrast Stretching) method in SEMamba:
- Use PCS as Training Target:
- Run the
sh runPCS.shwith the yaml configurationuse_PCS400=True. - Use the pretrained model
sh pretrained.shwithout post-processing--post_processing_PCS False.
- Use PCS as Post-Processing:
- Run the
sh run.shwith the yaml configurationuse_PCS400=False. - Use the pretrained model
sh pretrained.shwith post-processing--post_processing_PCS True.
Evaluation
The evaluation metrics is calculated via: CMGAN
Perceptual Contrast Stretching
The implementation of Perceptual Contrast Stretching (PCS) as discussed in our paper can be found at PCS400.
References and Acknowledgements
We would like to express our gratitude to the authors of MP-SENet, CMGAN, HiFi-GAN, and NSPP.
Citation:
If you find the paper useful in your research, please cite:
@article{chao2024investigation,
title={An Investigation of Incorporating Mamba for Speech Enhancement},
author={Chao, Rong and Cheng, Wen-Huang and La Quatra, Moreno and Siniscalchi, Sabato Marco and Yang, Chao-Han Huck and Fu, Szu-Wei and Tsao, Yu},
journal={arXiv preprint arXiv:2405.06573},
year={2024}
}


