Code, Music and ChatGPT

Building a simple sound waveform visualiser website with ChatGPT

Press enter or click to view image in full size

The above is a screenshot of a website built with ChatGPT. The entire ChatGPT transcript is here. The resulting code is in the on github. The music track depicted is Transmission by Eelke Kleijn.

The App

You drag-and-drop an MP3 file onto the box.
The file is uploaded, and an asynchronous background task gets queued. The frontend continuously checks the status of the job until it’s completed (or fails).
When the task gets processed, the MP3 is analysed and a waveform data array is sent to the client. There’s a separate waveform for highs, mids lows (similar to DJ-ing software).
The client, upon receiving the waveform data array, visualises the data on an HTML Canvas.

This is an interesting task, because:

It’s a non-trivial software engineering task: there’s a frontend, which needs to look decent. There’s a backend, which deals with asynchronous jobs and sound processing.
It requires multiple third-party libraries and figuring out how they work.
Cross-discipline knowledge is required: you need to read the mp3, perform a fourier transform and correctly split the frequency ranges.

Building the app

Throughout building the app, we (ChatGPT and I) made small iterations and tested what we had at every step. The quotes from the transcript below are links to the corresponding lines in the transcript.

A basic front-end

Write the frontend of a website where users upload an mp3 over drag and drop. the mp3 then gets uploaded to the backend for asynchronous proccessing, and the ui displays status updates on the status of the job (in-queue/processing/completed).

ChatGPT arrived at a reasonable result with only a few iterations.

okay, now rewrite the <head> section to use pure.css, and write some css to make the page prettier

That was also easy.

The back-end

okay, now write an app in node.js/express that serves the above webpage, and additionally dispatches jobs that process the uploaded mp3s. For now, the processing should be a dummy job that simply completes in 5 seconds.

This was working immediately. We iteratively added more functionality to the job:

great, now make the above script calculate the length of the mp3 in the jobs

This was quick. Then, the problems started:

perfect, now change processMp3, so that it returns a track analysis object. It should look like this:
{
duration: <song duration in seconds>,
waveform: [{
lows: <a value from 0–100 indicating the loudness of low frequencies at this bucket>,
mids: <a value from 0–100 indicating the loudness of mid frequencies at this bucket>,
highs: <a value from 0–100 indicating the loudness of high frequencies at this bucket>,
}, … ]}
(the waveform array in the object should have 512 buckets.)

To do the required Fourier Transform, ChatGPT tried to use libraries that assumed the existence of a global window object, indicating that they’re meant for front-end use.

these are front end libraries, they don’t work on the backend use something else

It suggested using pydub, which is a python library, and produced code that tries to use pydub in javascript — which obviously doesn’t work. Then it started making things up:

ChatGPT: In that case, you can use the audio library to perform audio analysis and generate the waveform.

It then produced a piece of code that uses a made-up audio library that doesn’t exist or can not be used in that way. It quickly admitted its mistake:

Me: there’s no such npm package
ChatGPT: I apologize, you are correct. There is no audio library in npm. In that case, you can use the fourier-transform library to perform audio analysis and generate the waveform.

Then it produced a piece of code that makes sense — except it used a function that wasn’t defined:

    let audio = await decode(audioData);

To make a long story short, when asked to write decode, ChatGPT tried to use LameJS:

import { MP3Decoder } from "lamejs";

Except, LameJS doesn’t have an MP3Decoder — it does however have an MP3Encoder . I insisted more, and got this response:

I’m sorry, but processing audio on the server-side is quite complex, especially without the use of a pre-existing library. I recommend using a pre-existing library, such as ffmpeg, sox, or libsndfile, that provides a high-level API for audio processing. These libraries can be used to extract the audio data from the MP3 file, process the audio data to determine the duration and loudness of the audio, and generate the waveform data.

When pressed more, ChatGPT kept coming up with excuses and ways to avoid writing the part it clearly was struggling with.

Clearly it was time to take a step back. I asked ChatGPT to write the program in python, make it output to stdout, and run it from the node.js script. That worked well, and produced the following code:

import numpy as np
import matplotlib.pyplot as plt
from pydub import AudioSegmentdef process_mp3(mp3_path):
    audio = AudioSegment.from_file(mp3_path, format="mp3")
    samples = np.array(audio.get_array_of_samples())
    duration = len(audio) / 1000
    sample_rate = audio.frame_rate
    num_samples = len(samples)
    num_buckets = 512
    bucket_size = num_samples // num_buckets
    waveform = []
    for i in range(num_buckets):
        start = i * bucket_size
        end = start + bucket_size
        bucket_samples = samples[start:end]
                fft = np.fft.rfft(bucket_samples)
        fft = np.abs(fft)
        low_freq = int(sample_rate * 0.05)
        mid_freq = int(sample_rate * 0.5)
        high_freq = int(sample_rate * 0.95)
        lows = np.mean(fft[:low_freq])
        mids = np.mean(fft[low_freq:mid_freq])
        highs = np.mean(fft[mid_freq:high_freq])
        waveform.append({
            "lows": lows,
            "mids": mids,
            "highs": highs
        })
    return {
        "duration": duration,
        "waveform": waveform
    }
if __name__ == "__main__":
    mp3_path = sys.argv[1]
    waveform_data = process_mp3(mp3_path)
    print(json.dumps(waveform_data))

Front-end plotting

Writing the code for plotting the waveform data wasn’t hard, but took a few iterations and some ChatGPT-assisted debugging. At some point, I showed ChatGPT an excerpt of waveform data to help it fix its code.

The Architect and the Code Monkey

I observed an interesting separation of tasks during this process:

I acted as a software architect: I decided what to build and made decision on the overall design. I divided the app into small parts that can be built incrementally. I pointed out security concerns. I did the testing.
ChatGPT acted like a code monkey: It did well at the boring parts: writing the actual code, debugging and deciding what libraries to use.

This analogy also transfers well to using ChatGPT for other tasks, like writing essays.

I’ve spent some time trying to get ChatGPT to act as the software architect as well, by making it use another instance of itself as a code monkey. So far, I’ve been unsuccessful, but this is what I’m trying:

In this conversation, you will play the role of a software architect building an program with ChatGPT, a coding AI. You can interact only with two agents:
* ChatGPT, by prepending your responses with CHATGPT:
* A unix terminal, by prepending your responses with TERMINAL:
All your messages, with no exceptions, must be prepended by either CHATGPT: or TERMINAL:. Do not break out of character. Do not ask any clarifying questions - instead use your best assumptions. I will reply to your messages, pretending to be either ChatGPT or the terminal.
As a software architect, you're good at high level system design, but want to delegate the more boring parts of writing software (writing the actual code) to chatgpt. However, ChatGPT is not good at dealing with complex projects, so you'll have to break the project down into small specific tasks. ChatGPT also makes mistakes, so you're encouraged to test the code it provides you with.
I will give you an example of a conversation you could have, if you wanted to build an app that prints hello world to standard output:
— —
> CHATGPT: Write an program in C++ that outputs "Hello, world!".
> Okay, here's your program:
#include<cstdio> int main() { printf("Hello, world"); return 0; }
> TERMINAL: mkdir hello-world && cd hello-world && echo “#include<cstdio> int main() { printf(\”Hello, world\”); return 0; }” > prog.cpp && g++ prog.cpp && ./a.out
> Hello, world!
— —
Okay. Now build an app that prints “Hello, world, again!” using chatgpt, like above.