GitHub - Trint-ai/TrintAI: A powerful Open-Source tool for transcribe and understand speech.

4 min read Original article ↗

TrintAI is a powerful open source tool for converting speech into text. In addition to its transcription capabilities, it can generate summaries of the audio and detect sentiments and emotions. Using TrintAI you can power your apps with cutting-edge speech recognition.

More to come...

📣 We're currently seeking community maintainers, so don't hesitate to get in touch if you're interested, check the contribution guidelines 📣

If you find this project useful or interesting, please consider giving it a star on GitHub! 🌟 Your support helps us continue to improve and maintain the project.

Just click the star button at the top of the repository page. Your feedback and support mean a lot to us. Thank you! 😊

We believe in open source and we believe we can take TrintAI to the next level. Here we provide a list of the most popular speech-to-text paid services in the market that can be use for feature comparison.

git clone https://github.com/Trint-ai/TrintAI.git
cp backend/.env.example backend/.env
cd backend
pip install -r requirements.txt
docker build -t trintai .
docker run -p 8000:8000 -t trintai
curl --header "Content-Type: application/json" \
        --request POST \
        --data '{"file":"https://mycustomdomain/audio.mp3"}' \
        http://localhost:8000/api
{
    'summary': str,
    'transcript': list
}
{   
    'timestamps':
        {
            'from': str(timestamp)
            'to': str(timestamp)
        },
    'offsets':
        {
            'from': int
            'to': int
        }
     'text': str,
     'speaker': str,
     'emotion': str,
     'emotion_score': int
}
{
    "summary": {
        "summary": "Joanne Burns called ILTECA Telecom for assistance regarding her data service, which she believed should have been restored by now. Sam, the representative, asked for her name to check the status of her data."
    },
    "transcript": [
        {
            "timestamps": {
                "from": "00:00:00,000",
                "to": "00:00:03,120"
            },
            "offsets": {
                "from": 0,
                "to": 3120
            },
            "text": "Thank you for calling ILTECA Telecom.",
            "speaker": "1",
            "emotion": "joy",
            "emotion_score": 0.5524019002914429
        },
        {
            "timestamps": {
                "from": "00:00:03,120",
                "to": "00:00:04,080"
            },
            "offsets": {
                "from": 3120,
                "to": 4080
            },
            "text": "My name is Sam.",
            "speaker": "1",
            "emotion": "neutral",
            "emotion_score": 0.6922041177749634
        },
        {
            "timestamps": {
                "from": "00:00:04,080",
                "to": "00:00:05,260"
            },
            "offsets": {
                "from": 4080,
                "to": 5260
            },
            "text": "How may I assist you today?",
            "speaker": "1",
            "emotion": "neutral",
            "emotion_score": 0.43952763080596924
        },
        {
            "timestamps": {
                "from": "00:00:05,260",
                "to": "00:00:08,780"
            },
            "offsets": {
                "from": 5260,
                "to": 8780
            },
            "text": "Hi. My name is Joanne.",
            "speaker": "0",
            "emotion": "neutral",
            "emotion_score": 0.8426525592803955
        },
        {
            "timestamps": {
                "from": "00:00:08,780",
                "to": "00:00:14,840"
            },
            "offsets": {
                "from": 8780,
                "to": 14840
            },
            "text": "And I have your services that -- I said I was out of data in May.",
            "speaker": "0",
            "emotion": "neutral",
            "emotion_score": 0.5988990068435669
        },
        {
            "timestamps": {
                "from": "00:00:14,840",
                "to": "00:00:18,320"
            },
            "offsets": {
                "from": 14840,
                "to": 18320
            },
            "text": "But I think my data should be back on by now.",
            "speaker": "0",
            "emotion": "neutral",
            "emotion_score": 0.9454419016838074
        },
        {
            "timestamps": {
                "from": "00:00:18,320",
                "to": "00:00:19,220"
            },
            "offsets": {
                "from": 18320,
                "to": 19220
            },
            "text": "Can you check?",
            "speaker": "0",
            "emotion": "neutral",
            "emotion_score": 0.7124136090278625
        },
        {
            "timestamps": {
                "from": "00:00:19,220",
                "to": "00:00:20,540"
            },
            "offsets": {
                "from": 19220,
                "to": 20540
            },
            "text": "It doesn't seem like it.",
            "speaker": "0",
            "emotion": "surprise",
            "emotion_score": 0.5951151847839355
        },
        {
            "timestamps": {
                "from": "00:00:20,540",
                "to": "00:00:25,320"
            },
            "offsets": {
                "from": 20540,
                "to": 25320
            },
            "text": "All right.",
            "speaker": "1",
            "emotion": "neutral",
            "emotion_score": 0.6785580515861511
        },
        {
            "timestamps": {
                "from": "00:00:25,320",
                "to": "00:00:25,940"
            },
            "offsets": {
                "from": 25320,
                "to": 25940
            },
            "text": "Okay. Great.",
            "speaker": "1",
            "emotion": "joy",
            "emotion_score": 0.9347952008247375
        },
        {
            "timestamps": {
                "from": "00:00:25,940",
                "to": "00:00:27,900"
            },
            "offsets": {
                "from": 25940,
                "to": 27900
            },
            "text": "Now, thank you so much.",
            "speaker": "1",
            "emotion": "joy",
            "emotion_score": 0.7642761468887329
        },
        {
            "timestamps": {
                "from": "00:00:28,960",
                "to": "00:00:32,720"
            },
            "offsets": {
                "from": 28960,
                "to": 32720
            },
            "text": "All right.",
            "speaker": "0",
            "emotion": "neutral",
            "emotion_score": 0.6785580515861511
        },
        {
            "timestamps": {
                "from": "00:00:32,720",
                "to": "00:00:34,680"
            },
            "offsets": {
                "from": 32720,
                "to": 34680
            },
            "text": "Now, let me see.",
            "speaker": "1",
            "emotion": "neutral",
            "emotion_score": 0.44418302178382874
        },
        {
            "timestamps": {
                "from": "00:00:34,680",
                "to": "00:00:38,980"
            },
            "offsets": {
                "from": 34680,
                "to": 38980
            },
            "text": "Can you please provide me with your first and last name?",
            "speaker": "1",
            "emotion": "neutral",
            "emotion_score": 0.8994667530059814
        },
        {
            "timestamps": {
                "from": "00:00:38,980",
                "to": "00:00:42,140"
            },
            "offsets": {
                "from": 38980,
                "to": 42140
            },
            "text": "Joanne Burns.",
            "speaker": "0",
            "emotion": "neutral",
            "emotion_score": 0.7366818785667419
        },
        {
            "timestamps": {
                "from": "00:00:42,140",
                "to": "00:00:44,580"
            },
            "offsets": {
                "from": 42140,
                "to": 44580
            },
            "text": "All right.",
            "speaker": "1",
            "emotion": "neutral",
            "emotion_score": 0.6785580515861511
        }
    ]
}

Use TrintAI speech-to-text application to analyze audio files from call centers, meetings, and calls. Gain insights from conversations, improve customer interactions, and streamline decision-making with accurate transcriptions.