In [1]:
%matplotlib inline
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
import stanford_mir; stanford_mir.init()

Onset Detection

Automatic detection of musical events in an audio signal is one of the most fundamental tasks in music information retrieval. Here, we will show how to detect an onset, the very instant that marks the beginning of the transient part of a sound, or the earliest moment at which a transient can be reliably detected.

For more reading:

Load an audio file into the NumPy array x and sampling rate sr.

In [2]:
x, sr = librosa.load('audio/classic_rock_beat.wav')
print(x.shape, sr)
(151521,) 22050

Plot the signal:

In [3]:
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr)
Out[3]:
<matplotlib.collections.PolyCollection at 0x10f8f4668>

Listen:

In [4]:
ipd.Audio(x, rate=sr)
Out[4]:

librosa.onset.onset_detect

librosa.onset.onset_detect works in the following way:

  1. Compute a spectral novelty function.
  2. Find peaks in the spectral novelty function.
  3. [optional] Backtrack from each peak to a preceding local minimum. Backtracking can be useful for finding segmentation points such that the onset occurs shortly after the beginning of the segment.

Compute the frame indices for estimated onsets in a signal:

In [5]:
onset_frames = librosa.onset.onset_detect(x, sr=sr)
print(onset_frames) # frame numbers of estimated onsets
[ 20  29  38  57  66  75  84  93 103 112 121 131 140 149 158 167 176 185
 196 204 213 232 241 250 260 269 278 288]

Convert onsets to units of seconds:

In [6]:
onset_times = librosa.frames_to_time(onset_frames)
print(onset_times)
[0.46439909 0.67337868 0.88235828 1.32353741 1.53251701 1.7414966
 1.95047619 2.15945578 2.39165533 2.60063492 2.80961451 3.04181406
 3.25079365 3.45977324 3.66875283 3.87773243 4.08671202 4.29569161
 4.55111111 4.73687075 4.94585034 5.38702948 5.59600907 5.80498866
 6.03718821 6.2461678  6.45514739 6.68734694]

Plot the onsets on top of a spectrogram of the audio:

In [7]:
S = librosa.stft(x)
logS = librosa.amplitude_to_db(abs(S))
In [8]:
plt.figure(figsize=(14, 5))
librosa.display.specshow(logS, sr=sr, x_axis='time', y_axis='log', cmap='Reds')
plt.vlines(onset_times, 0, 10000, color='#3333FF')
Out[8]:
<matplotlib.collections.LineCollection at 0x10fbb2ef0>

Let's also plot the onsets with the time-domain waveform.

In [9]:
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
plt.vlines(onset_times, -0.8, 0.79, color='r', alpha=0.8) 
Out[9]:
<matplotlib.collections.LineCollection at 0x10fba4208>

librosa.clicks

We can add a click at the location of each detected onset.

In [10]:
clicks = librosa.clicks(frames=onset_frames, sr=sr, length=len(x))

Listen to the original audio plus the detected onsets:

In [11]:
ipd.Audio(x + clicks, rate=sr)
Out[11]:

Questions

In librosa.onset.onset_detect, use the backtrack=True parameter. What does that do, and how does it affect the detected onsets? (See librosa.onset.onset_backtrack.)

In librosa.onset.onset_detect, you can use the keyword parameters found in librosa.util.peak_pick, e.g. pre_max, post_max, pre_avg, post_avg, delta, and wait, to control the peak picking algorithm. Adjust these parameters. How does it affect the detected onsets?

Try with other audio files:

In [12]:
ls audio
125_bounce.wav                  jangle_pop.mp3
58bpm.wav                       latin_groove.mp3
README.md                       oboe_c6.wav
brahms_hungarian_dance_5.mp3    prelude_cmaj.wav
busta_rhymes_hits_for_days.mp3  simple_loop.wav
c_strum.wav                     simple_piano.wav
clarinet_c6.wav                 sir_duke_piano_fast.mp3
classic_rock_beat.wav           sir_duke_piano_slow.mp3
conga_groove.wav                sir_duke_trumpet_fast.mp3
drum_samples/                   sir_duke_trumpet_slow.mp3
funk_groove.mp3                 tone_440.wav