%matplotlib inline
from ADTLib.models import ADTBDRNN
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, librosa, mir_eval, IPython.display as ipd, urllib
This notebook requires ADTLib
. See ADTLib repo for installation instructions.
Download a drum signal containing bass drum, snare drum, and hi-hat:
filename = 'Classic Rock Beat 06.wav'
urllib.urlretrieve(
'http://audio.musicinformationretrieval.com/Jam Pack 1/' + filename,
filename=filename
)
Load the audio file into an array:
x, fs = librosa.load(filename)
Listen to the signal:
ipd.Audio(x, rate=fs)
Use ADTLib to identify the location and types of each onset:
onset_times, onset_types = ADTBDRNN([filename])
onset_times
onset_types
For each type of drum, create a click track from the onsets, and listen to it with the original signal.
Bass drum:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='BD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Snare drum:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='SD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Hi-hat:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='HH'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
Clearly, the default parameters are not optimized for this input file. For example, you can hear many hi-hat onsets missed by the transcription system. Therefore, let's adjust the drum transcription parameters.
close_error
is the maximum distance between two onsets without onsets being combined, in seconds. lambd
is the value used for each instrument in the peak picking stage.
onset_times, onset_types = ADTBDRNN([filename], close_error=0.100, lambd=[1, 1, 1])
Listen to the new onsets:
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='BD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='SD'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
x_with_beeps = mir_eval.sonify.clicks(onset_times[onset_types=='HH'], fs, length=len(x))
ipd.Audio(x + x_with_beeps, rate=fs)
For each drum type, let's compute an average drum beat from the original signal and visualize the spectrum for that average drum beat.
Convert onsets from units of seconds to samples:
onset_samples = librosa.time_to_samples(onset_times, sr=fs)
print onset_samples
Create a function that returns a log-amplitude spectrum of an average drum beat for a particular drum type:
def plot_avg_spectrum(x, onset_type):
# Compute average drum beat signal.
frame_sz = int(0.100*fs)
def normalize(z):
return z/scipy.linalg.norm(z)
x_avg = numpy.mean([normalize(x[i:i+frame_sz]) for i in onset_samples[onset_types==onset_type]], axis=0)
# Compute average spectrum.
X = librosa.spectrum.fft.fft(x_avg)
Xmag = librosa.logamplitude(X)
# Plot spectrum.
f = numpy.arange(frame_sz)*fs/frame_sz
plt.plot(f[:frame_sz/2], Xmag[:frame_sz/2])
plt.xlim(xmax=f[frame_sz/2])
plt.ylim([-40, 10])
plt.xlabel('Frequency (Hertz)')
Plot the spectrum for an average bass drum:
plot_avg_spectrum(x, 'BD')
Snare drum:
plot_avg_spectrum(x, 'SD')
Hi-hat:
plot_avg_spectrum(x, 'HH')