In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (13, 5)

Onset Detection¶

Automatic detection of musical events in an audio signal is one of the most fundamental tasks in music information retrieval. Here, we will show how to detect an onset, the very instant that marks the beginning of the transient part of a sound, or the earliest moment at which a transient can be reliably detected.

For more reading:

Load an audio file into the NumPy array x and sampling rate sr.

In [2]:
x, sr = librosa.load('audio/classic_rock_beat.wav')
print x.shape, sr
(151521,) 22050

Plot the signal:

In [3]:
librosa.display.waveplot(x, sr)
Out[3]:
<matplotlib.collections.PolyCollection at 0x116b729d0>

Listen:

In [4]:
ipd.Audio(x, rate=sr)
Out[4]:

librosa.onset.onset_detect¶

librosa.onset.onset_detect works in the following way:

  1. Compute a spectral novelty function.
  2. Find peaks in the spectral novelty function.
  3. [optional] Backtrack from each peak to a preceding local minimum. Backtracking can be useful for finding segmentation points such that the onset occurs shortly after the beginning of the segment.

Compute the frame indices for estimated onsets in a signal:

In [5]:
onset_frames = librosa.onset.onset_detect(x, sr=sr)
print onset_frames # frame numbers of estimated onsets
[ 20  29  38  57  66  75  84  93 103 112 121 131 140 149 158 167 176 185
 196 204 213 232 241 250 260 269 278 288]

Convert onsets to units of seconds:

In [6]:
onset_times = librosa.frames_to_time(onset_frames)
print onset_times
[ 0.46439909  0.67337868  0.88235828  1.32353741  1.53251701  1.7414966
  1.95047619  2.15945578  2.39165533  2.60063492  2.80961451  3.04181406
  3.25079365  3.45977324  3.66875283  3.87773243  4.08671202  4.29569161
  4.55111111  4.73687075  4.94585034  5.38702948  5.59600907  5.80498866
  6.03718821  6.2461678   6.45514739  6.68734694]

Plot the onsets on top of a spectrogram of the audio:

In [7]:
S = librosa.stft(x)
logS = librosa.logamplitude(S)
librosa.display.specshow(logS, sr=sr, x_axis='time', y_axis='log')
plt.vlines(onset_times, 0, 10000, color='k')
Out[7]:
<matplotlib.collections.LineCollection at 0x116b8be50>

Let's also plot the onsets with the time-domain waveform.

In [8]:
librosa.display.waveplot(x, sr=sr)
plt.vlines(onset_times, -0.8, 0.79, color='r', alpha=0.8) 
Out[8]:
<matplotlib.collections.LineCollection at 0x116bab490>

librosa.clicks¶

We can add a click at the location of each detected onset.

In [9]:
clicks = librosa.clicks(frames=onset_frames, sr=sr, length=len(x))

Listen to the original audio plus the detected onsets:

In [10]:
ipd.Audio(x + clicks, rate=sr)
Out[10]:

Questions¶

In librosa.onset.onset_detect, use the backtrack=True parameter. What does that do, and how does it affect the detected onsets? (See librosa.onset.onset_backtrack.)

In librosa.onset.onset_detect, you can use the keyword parameters found in librosa.util.peak_pick, e.g. pre_max, post_max, pre_avg, post_avg, delta, and wait, to control the peak picking algorithm. Adjust these parameters. How does it affect the detected onsets?

Try with other audio files:

In [11]:
ls audio
125_bounce.wav         classic_rock_beat.wav  oboe_c6.wav
58bpm.wav              conga_groove.wav       prelude_cmaj.wav
beatbox_steve.wav      funk_groove.mp3        simple_loop.wav
c_strum.wav            jangle_pop.mp3         simple_piano.wav
clarinet_c6.wav        latin_groove.mp3       tone_440.wav