Cat meow sounds visualized with auto-correlation function

soundshader.github.io

38 points by ssgh 4 years ago · 34 comments

Reader

gbh444g 4 years ago

Hi HN!

I used the meow sounds from https://soundspunos.com/animals/10-cat-meow-sounds.html. I expected to see very little variability in the meows, maybe just 4-5 different types for basic emotions. To my surprise, each “cat meow” has astonishingly colorful, complex and unique structure, unlike human vowels that follow a more or less predictable pattern: https://soundshader.github.io/vowels.

The algorithm behind these images is fairly simple. It computes FFT to decompose the sound into a set of A·cos(2πwt+φ) waves and drops the phase φ to align all cos waves together. This is known as the auto-correlation function (ACF). Before merging them back, it colorizes each wave using its frequency w: the A notes (432·2ⁿ Hz) become red, C notes - green, E notes - blue, and so on. Finally, it merges the colored and aligned cos waves back, using the amplitude A for color opacity, and renders them in polar coordinates, where the radial coordinate is time.

dsizzle 4 years ago

I think it'd be useful to have the same graphs for some non-cat sounds, like the human vowels that you mention. It's not clear if there's something particularly interesting about cats or not here.
- gbh444g 4 years ago
  
  Here are vowels: https://soundshader.github.io/vowels
vanusa 4 years ago

I would agree that the images are rather intriguing, but ... what does all this visual structure actually mean?
I'm guessing some kind of overtone structure in these sounds (perhaps decipherable to cats, but not to us)?
I await your insight.
- nixpulvis 4 years ago
  
  I know from personal experience, cat's use a lot of inflection in their voices. I'm not at all surprised by the images (though I don't know exactly what they mean). This inflection directly effects the image, because it modulates the pitch of the meow. Another factor, completely unrepresented in these images is the lower frequency components connected by seconds of silence. Most cats are pretty quiet, but sometimes they get the oral equivalent of the zoomies.
  I would love to see more examples across species and variants of felines, under controlled conditions. Could you figure out an appropriate color map?
  Not to start a dog vs cat war, but as someone who loves both, I think I can safely say that cats put much more information in their voices than dogs, for example.
  - r00fus 4 years ago
    
    > Not to start a dog vs cat war, but as someone who loves both, I think I can safely say that cats put much more information in their voices than dogs, for example.
    My cat only gets really vocal when she wants something ( & usually only with me, not my kids or spouse). She also trills a lot, usually in surprise. Could be they compress the data - dogs are very chatty - maybe the cats focus on high throughput whereas the dogs go for low-latency.
    I'd love to see the same analysis done with dog barks.
    
    nixpulvis 4 years ago
    
    > maybe the cats focus on high throughput whereas the dogs go for low-latency.
    I really like that way of thinking about it. Cats hit the low latency pretty quick with the hissing, which is what the wild cats looked more like in this projects demo. Kinda makes sense.
- gbh444g 4 years ago
  
  Interpreting ACF images:
  1. Time progresses from the center to the edge of the circle.
  2. Color means note, e.g. A4=432Hz is red, but so is A1, A2 and all other A notes. B is orange, C is yellow, D is green and so on.
  3. The amount of fine details is frequency: the higher the frequency, the more fine details you see. If notes of different colors and different frequencies sound simultaneously, e.g. a A2 with a G5, you’ll see a red belt with a few repetitions mixed with a blue belt with 8x more repetitions, so the result will be a purple belt with a fine structure.
  For example, on one image below there is a green belt with 10 repetitions. One repetition correponds to 13.5 Hz here (55296 Hz sample rate, 4096 FFT bins), so 10 repetitions is 135 Hz, which corresponds to C3. On another image there is a curious red cross in the center, it’s a red belt with 2 repetitons. That’s 27 Hz, or A0, almost infrasound.
SomeHacker44 4 years ago

I would think that the absolute notes of the sounds are not the relevant metric (that was colored here). I naively imagine it is the relative tonal structure. That is, meaning does not come from particular frequencies, but from the relationship of the frequencies. I wonder, if so, how that might be represented and normalized. Just like we can understand two people speaking with different basic pitches, then add meaning when they add shifts from those basic pitches.
Either way super pretty visualization!
dredmorbius 4 years ago

Is it possible to see what sounds map to what ACF images?
I don't see that. And without this, the effect is ... just some pretty pictures.
- gbh444g 4 years ago
  
  The only way right now is to use the demo and download the sounds yourself (you can use the meows from my link to get the same images):
  https://soundshader.github.io/?n=4096&img=2048&acf.lr=5&sr=5...
  - dredmorbius 4 years ago
    
    A link or pairing of the images and corresponding audio would be nice to have.

errantspark 4 years ago

Is "dropping the phase" the same thing as computing the spectral power distribution?

P.S. A4 = 432Hz is a stupid fad that can't die soon enough.

gbh444g 4 years ago

432 vs 440 Hz in music is the equivalent of the C++ vs Java battle. Vivaldi was a proponent of 432 Hz, so it's only when he died, newthinkers had recalibrated pianos to 440 Hz. I believe the newthinkers are simply lacking taste, and rounding 432 to 440 is same as chopping off a chunk of Parthenon to "fix" its proportions from the golden phi ratio to 3/2.
- errantspark 4 years ago
  
  No, it's not. Java and C++, hell even Java and C# are much more relevantly dissimilar than setting A at 440 vs 432. I'd love to see a citation on Vivaldi's love for 432, he wasn't even playing in 12-TET was he? He'd probably be playing in meantone during that era right? I figure he'd be way more mad about using the wrong intonation to play his music than a difference absolute pitch reference.
  > rounding 432 to 440
  Rounding from what to what? Why does the number cycles relative to seconds being any particular number matter?
  - gbh444g 4 years ago
    
    Well, I don't know. My personal reason is that 432 has at least some connection to reality, e.g. half day = 43,200 sec, or speed of light = 432 x 432 miles/sec, or Sun's radius = 432,000 miles. This means that if we take distance that light covers in 1/432 sec, then Sun's radius is exactly 1000 such distances, which is pretty cool.
    On the other hand, 440 Hz seems just a random number to me picked by someone with little imagination.
dylan604 4 years ago

To quote the dude, "that's like your opinion, man."
For others less familiar with this poster's preference and their angst: https://producerhive.com/editorial/432hz-vs-440hz/
jarenmf 4 years ago

Seems so, except that in this case it is basically just the modulus whereas the PSD is the modulus squared |A|^2
- errantspark 4 years ago
  
  Ahh yeah, magnitude vs. power, my mistake on the terms.

nixpulvis 4 years ago

Cool idea progressing time outwards radially, for a moment I thought I was looking at the cat's eye through a strange filter!

beebeepka 4 years ago

Not sure how these images are being generated but in my experience there is a wide variety of purr and meow. Can't say unique because I have no way of knowing.

Cats are magical to me but in reality I can be friends with all sorts of animals, even people!

hereforphone 4 years ago

> It computes FFT to decompose the sound into a set of A·cos(2πwt+φ) waves and drops the phase φ to align all cos waves together. This is known as the auto-correlation function (ACF).

How is simply dropping the phase transforming FFT into ACF (according to various definitions of ACF as shown here: https://en.wikipedia.org/wiki/Autocorrelation)?

jarenmf 4 years ago

I might be mistaken but since the auto-correlation function is the inverse FFT of the power spectral density and power spectral density doesn't contain information about the phase. Thus, it's like dropping the phase and taking IFFT(|A|^2)
- hereforphone 4 years ago
  
  They're inverse operations but as far as I understand that's not the only thing going on with the ACF. Taking the phase out is just taking the real part of the FFT is it not? Regardless where's the correlation?
gbh444g 4 years ago

Wikipedia is great at obfuscating simple ideas in complex math. The "Efficient computation" explains the idea well, but it could be made even simpler. The amplitude squaring step drops the phase there.
- hereforphone 4 years ago
  
  The phase is dropped but isn't there more going on? Comparing the signal to itself at various lags? I don't see how just dropping the phase accomplishes that.
  - gbh444g 4 years ago
    
    Not really. ACF is defined as a convolution of signal X with itself: XX. But FFT turns a convolution into a dot product: FFT[XX] = FFT[X]·FFT[X], or just |FFT[X]|². But what is this really? If X is a sum of A·cos(2πwt+φ) waves, then FFT[X] is a set of A·exp(iφ) complex numbers. What does |FFT[X]|² do? It turns those complex numbers into A². Inversing this FFT gives a sum of A²·cos(2πwt) waves, so in effect ACF has dropped the phases and squared amplitudes. This is also why ACF have this bright vertical line - this is cos(x) functions piling up together.
    
    hereforphone 4 years ago
    
    ACF has a lag argument correct? Also isn't the bright vertical line just the result of the fact that ACF at lag 0 is 1?
    
    gbh444g 4 years ago
    
    It does, but ACF[X] at 0 is the sum of X[i] squares, so when sound gets louder, ACF at 0 also gets higher.
    
    hereforphone 4 years ago
    
    You're probably sick of this conversation by now :). But at least in radio applications I think that acf[0] is normalized so that it's 1 (typically). And again the ACF is calculated at several lag arguments and the sum is used to build the final graph / array.
    But you obviously know more about this than me, I'm just putting out what I know. Your paragraph above, I actually copied so I can study it a few times. So thanks.
    
    gbh444g 4 years ago
    
    It sounds this is what I'm doing: taking ACF at equally spaced offsets. Not sure what the sum of ACFs would achieve, but this might turn out a good idea.

RosanaAnaDana 4 years ago

Things like this are why I go on the internet.

rawling 4 years ago

Well, I guess this one is never going to be more relevant: https://xkcd.com/26

ssghOP 4 years ago

The xkcd is half-right. Cats are ACFs, not FFTs, because they are even functions in polar coordinates, I mean the left half of the cat mirrors the right half, and the two halves merge nicely, without discontinuities. I probably don't want to know what cats look like with the phase components restored.

Settings

Cat meow sounds visualized with auto-correlation function

Keyboard Shortcuts