Visual Morphology of Vowels
github.comThese are beautiful artistically...
...but I don't have the slightest clue what they mean, and I've certainly dabbled in FFT and spectrogram and wavelet work, on top of a lot IPA vowel work, but I'm missing the why behind the formulas given and I'm missing how these plots are supposed to relate to frequencies visually.
A spectrogram of someone pronouncing vowels is extremely straightforward. Recognizing patterns of formants in spectrograms is quite simple.
So what is this trying to reveal that spectrograms don't? Besides that, what are the axes? Why are these circular or presumably polar? Why are they spiky? Why the particular blue/red bandpass filter? And what does autocorrelation have to do with vowels?
I'm not sure I've ever found myself so mystified by something I feel like I should have the background to understand quite easily.
If they're just supposed to be works of art then that's cool. But the title "visual morphology of vowels" seems like the plots are intended to reveal some kind of link between frequencies and the shape of the mouth maybe? But the example images aren't even labeled by which vowel they represent so I'm just baffled.
Spectrograms are analytical tools, they don't convey the nature of sound: whether it's consonant or dissonant, cool or warm, pleasing or annoying. We could, and do, analyze pictures with 2D spectrograms, but hardly anyone would argue that those spectrograms are true representations of pictures. And that's the question I've been trying to answer: if spectrograms and waveforms aren't the true images of sound, then what is?
On these ACF images, consonant frequencies produce regular patterns, that appear good due to their regular structure. High and low frequencies map to different colors, that appear to arrange themselves in a certain good looking way - this effect is surprising to me. The interesting observation here is that the good looking arrangements happen only for pleasing sounds. Different vowels, 29 total, taken from the Wikipedia's IPA table, produce different and distinct shapes - that's what I meant by "visual morphology".
The ACF data can be presented in any form, it's just data after all, but I'm not interested in just information, I want the image to convey the "harmonic nature" of sound, and the polar coordinates happen to do this well.
There is a link to demo there, and you can generate ACF images for any sounds you have, just make sure they are isolated 1-2 sec recordings. After looking at the images and listening to sounds that correspond to them, you'll quickly notice some pattern and will be able to guess the sound by looking at its image.
> Spectrograms are analytical tools, they don't convey the nature of sound: whether it's consonant or dissonant, cool or warm, pleasing or annoying.
But they do! It’s entirely possible for even inexperienced phoneticians to reconstruct speech given only a spectrogram — and it isn’t even that hard to do so. I cannot make any firm statements about these ACF images, but given that they present no temporal information, I find it difficult to imagine this being possible with them.
And as for ‘conveying the nature of sound’, I invite you to consider e.g. [0] or [1]. It’s easy to see on the spectrogram that some sounds are noisy, some are resonant, some are strong, some are weak, and so on.
[0] https://home.cc.umanitoba.ca/~krussll/phonetics/acoustic/spe...
The radial coordinate on ACF images is the temporal coordinate. Each circular slice encodes one FFT frame. Although I'm hardly a novice in making sense of spectrograms, I don't find them visually appealing: they are just schematic representation of sound to the eye. For example, here is my GPU implementation of wavelet transform, that works for arbitrary wavelet functions (Haar, Morlet, whatever you can code in a GLSL function):
These are certainly very pretty, but I can’t help but feel that the usual visualisations are much more information-dense and helpful [0] [1]. Spectrograms and vowel diagrams let me instantly see the formants of a vowel, allowing me to compare them to surrounding sounds and understand how they’re made in the mouth. By contrast, I can’t quite understand how to identify any resonant frequencies in this visualisation; the bands which are so obvious in spectrograms are difficult if not entirely impossible to find here. It would be very interesting to see if this visualisation could be modified to make this more prominent.
(They really are very pretty, though!)
Pink Trombone (only "safe for work" if you use headphones):
How to Break Pink Trombone:
https://www.youtube.com/watch?v=djUxAqss4KY
Pink Trombone takes on "Take On Me"
I've been casually researching visualisations of sound for 2 years now, and recently came across another interesting discovery, and wanted to share it.
It slightly improves the way ACF images are presented, but this small improvement makes a big difference. It works best on "small sounds" that last 1-2 sec, such as vowels or sample recordings of flute, violin and so on. The sound is analysed with FFT with the sliding window of 1/4 sec that advances by 1/500 sec at a time until it covers the entire waveform. After computing FFT spectrum for each frame, a basic bandpass filter is applied to separate high and low frequencies. The result is fed to the inverse FFT, thus computing ACF, and presented in polar coordinates using a basic red-blue color scheme. The effect is that low frequencies appear red and high frequencies appear blue.
To my surprise, this basic method reveals a large variety of distinctive, yet visually appealing, shapes for vowel sounds.
This is great. I like the presentation of the sounds. I would like to see which vowel sound each autocorrelation function represents. I don't see labels for any of them.
Excellent work though.
This is incredible ! But need more time to review !