In [1]:
%matplotlib inline
from ADTLib.models import ADTBDRNN
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, librosa, mir_eval, IPython.display as ipd, urllib

Drum Transcription using ADTLib

This notebook requires ADTLib. See ADTLib repo for installation instructions.

Download a drum signal containing bass drum, snare drum, and hi-hat:

In [2]:
filename = 'Classic Rock Beat 06.wav'
urllib.urlretrieve(
    'http://audio.musicinformationretrieval.com/Jam Pack 1/' + filename,
    filename=filename
)
Out[2]:
('Classic Rock Beat 06.wav', <httplib.HTTPMessage instance at 0x10d34b098>)

Load the audio file into an array:

In [3]:
x, fs = librosa.load(filename)

Listen to the signal:

In [4]:
ipd.Audio(x, rate=fs)
Out[4]:

ADTLib

Use ADTLib to identify the location and types of each onset:

In [5]:
onset_times, onset_types = ADTBDRNN([filename])
In [6]:
onset_times
Out[6]:
array([ 0.63854875,  0.84752834,  0.85913832,  0.90557823,  1.4860771 ,
        1.91564626,  1.92725624,  2.12462585,  2.13623583,  2.57741497,
        2.57741497,  2.77478458,  3.20435374,  3.20435374,  3.41333333,
        3.42494331,  4.06349206,  4.28408163,  4.29569161,  4.91102041,
        4.92263039,  5.35219955,  5.35219955,  5.56117914,  5.56117914,
        5.77015873,  5.78176871,  6.19972789,  6.21133787,  6.43192744,
        6.43192744,  6.62929705,  6.62929705])
In [7]:
onset_types
Out[7]:
array(['BD', 'HH', 'SD', 'HH', 'BD', 'BD', 'HH', 'HH', 'BD', 'HH', 'SD',
       'BD', 'BD', 'HH', 'HH', 'BD', 'BD', 'HH', 'SD', 'HH', 'BD', 'BD',
       'HH', 'BD', 'HH', 'HH', 'BD', 'HH', 'BD', 'HH', 'SD', 'BD', 'HH'], 
      dtype='|S32')

Listen to onsets

For each type of drum, create a click track from the onsets, and listen to it with the original signal.

Bass drum:

In [8]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='BD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Out[8]:

Snare drum:

In [9]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='SD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Out[9]:

Hi-hat:

In [10]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='HH'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Out[10]:

Adjust parameters

Clearly, the default parameters are not optimized for this input file. For example, you can hear many hi-hat onsets missed by the transcription system. Therefore, let's adjust the drum transcription parameters.

close_error is the maximum distance between two onsets without onsets being combined, in seconds. lambd is the value used for each instrument in the peak picking stage.

In [11]:
onset_times, onset_types = ADTBDRNN([filename], close_error=0.100, lambd=[1, 1, 1])

Listen to the new onsets:

In [12]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='BD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Out[12]:
In [13]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='SD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Out[13]:
In [14]:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='HH'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Out[14]:

Visualize spectrum

For each drum type, let's compute an average drum beat from the original signal and visualize the spectrum for that average drum beat.

Convert onsets from units of seconds to samples:

In [15]:
onset_samples = librosa.time_to_samples(onset_times, sr=fs)
print onset_samples
[     0      0   8959  14079  14079  18687  18943  28415  32768  32768
  42239  42496  46847  47103  56832  56832  61184  61184  70656  70656
  75263  75520  84735  89344  89599  94463  94720  99327 103680 108287
 108544 118016 118016 122623 122623 127232 127487 132096 136704 136959
 141823 141823 146175 146175]

Create a function that returns a log-amplitude spectrum of an average drum beat for a particular drum type:

In [16]:
def plot_avg_spectrum(x, onset_type):
    
    # Compute average drum beat signal.
    frame_sz = int(0.100*fs)
    def normalize(z): 
        return z/scipy.linalg.norm(z)
    x_avg = numpy.mean([normalize(x[i:i+frame_sz]) for i in onset_samples[onset_types==onset_type]], axis=0)
    
    # Compute average spectrum.
    X = librosa.spectrum.fft.fft(x_avg)
    Xmag = librosa.logamplitude(X)
    
    # Plot spectrum.
    f = numpy.arange(frame_sz)*fs/frame_sz
    plt.plot(f[:frame_sz/2], Xmag[:frame_sz/2])
    plt.xlim(xmax=f[frame_sz/2])
    plt.ylim([-40, 10])
    plt.xlabel('Frequency (Hertz)')

Plot the spectrum for an average bass drum:

In [17]:
plot_avg_spectrum(x, 'BD')

Snare drum:

In [18]:
plot_avg_spectrum(x, 'SD')

Hi-hat:

In [19]:
plot_avg_spectrum(x, 'HH')