Settings

Theme

Transcribing Piano Rolls, the Pythonic Way

zulko.github.io

310 points by gcardone_ 12 years ago · 37 comments

Reader

eliteraspberrie 12 years ago

The faster way of doing this:

    def fourier_transform(signal, period, tt):
        """ See http://en.wikipedia.org/wiki/Fourier_transform
        How come Numpy and Scipy don't implement this ??? """
        f = lambda func : (signal*func(2*pi*tt/period)).sum()
        return f(cos)+ 1j*f(sin)
is using the FFT.

What you want is the power spectral density in the discrete case, called the power spectrum. It can be calculated by multiplying the discrete Fourier transform (FFT) with its conjugate, and shifting. NumPy can do it. Here is an example: http://stackoverflow.com/questions/15382076/plotting-power-s...

  • zulko 12 years ago

    I knew I was going to have this remark :) Now correct me if I am wrong, but I think the FFT (which computes the discrete Fourier transform) cannot replace the continous fourier transform in my case, because the optimal periods I find are non-integer values. In the first case, the holes are separated by 7.5 pixels. The FFT could only have told me that they are separated by 7 or 8 pixels, which is not precise enough. Same thing for the tempo, a beat corresponds to 7.1 frames of the video, and a FFT would have told me 7.

    If someone knows a way to use the FFT to get non-integer periods (apart from oversampling the signal) I'll gladly change the code.

    • peterwoo 12 years ago

      The maximum frequency you can detect is limited by your sampling rate, but there's not a limit on the precision with which you can break those frequencies up.

      It's controlled by a parameter NFFT -- the PSD will compute (NFFT/2+1) values evenly spaced between 0 and the Nyquist frequency.

      So say the frame rate is 15Hz and you compute with NFFT=2048, then PSD[970] contains the amplitude at 7.09Hz.

      This was a really cool project by the way!

      • Serow225 12 years ago

        Also, it's not as widely known as the FFT, but if you know roughly the frequency of interest you can use the Goertzel algorithm to calculate a chosen number of bins around that specific freq and then pick the max of them to find the freq of interest, instead of when using the FFT having to calculate a bunch of bins using a large nFFT in order to get enough freq resolution and then discarding 99% of the results. Going further, compared to the original Goertzel, the Generalized Goertzel algorithm does the same thing but allows you to query non-integer multiples of the fundamental frequency: http://asp.eurasipjournals.com/content/2012/1/56

      • zulko 12 years ago

        Thanks, I learned something. I will try it and amend the blog when I have time.

    • GFK_of_xmaspast 12 years ago

      There are a lot of parametric (as opposed to the nonparametric FFT) methods for tracking frequency, I'm not totally convinced they're applicable to this case, but I think they might be fun to try out. Maybe start here: http://en.wikipedia.org/wiki/Multiple_signal_classification

rfleck 12 years ago

See a master at work making original rolls at QRS. http://www.youtube.com/watch?v=i3FTaGwfXPM

If was a fun place to see in the 70's after watching my father rebuild our player piano.

msvan 12 years ago

What a fascinating convergence of math, music and Python. Many people I meet who don't specialize in math but have taken university-level courses in it seem to remember the Fourier transform as a highlight, probably because of its many applications.

kbd 12 years ago

I love the abundance of Python. For those unaware, even the youtube-dl command line utility he used to download the video is written in Python.

stevetjoa 12 years ago

Very cool!

Relevant: Zenph makes "re-performances" of old piano recordings. They take a recording, do music transcription magic to get the exact timings and velocities of each note event, and then feed that into a player piano. So it's as if you are listening to the ghost of Rachmaninov sitting at the piano, as shown here: https://www.youtube.com/watch?v=eevzbV6Hkkk&t=28 (music starts at 0:28)

(I just visited http://zenph.com for the first time in about a year, and it appears that they've pivoted into a music education company.)

nanidin 12 years ago

Interesting question - is the author's transcription a derivative work of the video? And if so, is he actually allowed to release his transcription into the public domain (without the permission of the author of the video)?

  • shakethemonkey 12 years ago

    No, it's only derivative in the sense of process. The video lacks originality; for the musical notes it is merely a mechanical reproduction of the punched holes. Similarly, a photograph of a public domain painting is also in the public domain. See: Bridgeman Art Library v. Corel Corp., 36 F. Supp. 2d 191 (S.D.N.Y. 1999). At least this is the law in the United States, which is sensible; absurdity of other jurisdictions may vary.

    • nanidin 12 years ago

      It's nice to know our system accounts for cases like this. Thanks for the detailed info!

ntoshev 12 years ago

What if you tried to transcribe the music solely from Fourier transform of the audio source? I expect the piano has an abundance of harmonics, but there should be some way to distinguish them from the keys. Hasn't someone done it already?

selmnoo 12 years ago

That was a lovely read, thank you so much for writing and sharing it.

elwell 12 years ago

Really fantastic hack. Now try transcribing with just the audio track.

  • anigbrowl 12 years ago

    That's a hard problem. If you have some material like that with a clear recording, the only good commercial solution that I know of is Melodyne, and he's not saying how he does it. In theory you just look for multiple peaks in the FFT, but this is much easier said than done.

    • d_loemax 12 years ago

      i built a plogue bidule patch before melodyne rolled out "dna" and it is extremely difficult to get the optimal fft parameters to get an accurate conversion. i cant imagine an algorithm that would get it right from analyzing the sample would be any less difficult. ableton's and cubase's options are pretty rough too. i am a drummer though, i am just trying to make up for my ears.

bede 12 years ago

My favourite blog post of 2014. Thank you for sharing.

analog31 12 years ago

I think this is a nice solution because it takes care of the hardware side of things by making use of a garden variety video camera.

StavrosK 12 years ago

This is beautiful, it's one good idea after another, good job!

peapicker 12 years ago

This is really nice, thanks for sharing it with us.

cdelsolar 12 years ago

So, so cool. I love posts like this.

evidencepi 12 years ago

Nice post, thanks for sharing!

smortaz 12 years ago

fantastic. with your permission, i'd love to use this to demo python!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection