In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (13, 5)

Novelty Functions¶

To detect note onsets, we want to locate sudden changes in the audio signal that mark the beginning of transient regions. Often, an increase in the signal's amplitude envelope will denote an onset candidate. However, that is not always the case, for notes can change from one pitch to another without changing amplitude, e.g. a violin playing slurred notes.

Novelty functions are functions which denote local changes in signal properties such as energy or spectral content. We will look at two novelty functions:

  1. Energy-based novelty functions (FMP, p. 306)
  2. Spectral-based novelty functions (FMP, p. 309)

Energy-based Novelty Functions¶

Playing a note often coincides with a sudden increase in signal energy. To detect this sudden increase, we will compute an energy novelty function (FMP, p. 307):

  1. Compute the short-time energy in the signal.
  2. Compute the first-order difference in the energy.
  3. Half-wave rectify the first-order difference.

First, load an audio file into the NumPy array x and sampling rate sr.

In [2]:
x, sr = librosa.load('audio/simple_loop.wav')
print x.shape, sr
(49613,) 22050

Plot the signal:

In [3]:
librosa.display.waveplot(x, sr)
Out[3]:
<matplotlib.collections.PolyCollection at 0x119a68a10>

Listen:

In [4]:
ipd.Audio(x, rate=sr)
Out[4]:

RMS Energy¶

librosa.feature.rmse returns the root-mean-square (RMS) energy for each frame of audio. We will compute the RMS energy as well as its first-order difference.

In [5]:
hop_length = 512
frame_length = 1024
rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length).flatten()
rmse_diff = numpy.zeros_like(rmse)
rmse_diff[1:] = numpy.diff(rmse)
In [6]:
print rmse.shape
print rmse_diff.shape
(95,)
(95,)

To obtain an energy novelty function, we perform half-wave rectification (FMP, p. 307) on rmse_diff, i.e. any negative values are set to zero. Equivalently, we can apply the function $\max(0, x)$:

In [7]:
energy_novelty = numpy.max([numpy.zeros_like(rmse_diff), rmse_diff], axis=0)

Plot all three functions together:

In [8]:
frames = numpy.arange(len(rmse))
t = librosa.frames_to_time(frames, sr=sr)
In [9]:
plt.figure(figsize=(15, 6))
plt.plot(t, rmse, 'b--', t, rmse_diff, 'g--^', t, energy_novelty, 'r-')
plt.xlim(0, t.max())
plt.xlabel('Time (sec)')
plt.legend(('RMSE', 'delta RMSE', 'energy novelty')) 
Out[9]:
<matplotlib.legend.Legend at 0x11994c6d0>

Log Energy¶

The human perception of sound intensity is logarithmic in nature. To account for this property, we can apply a logarithm function to the energy before taking the first-order difference.

Because $\log(x)$ diverges as $x$ approaches zero, a common alternative is to use $\log(1 + \lambda x)$. This function equals zero when $x$ is zero, but it behaves like $\log(\lambda x)$ when $\lambda x$ is large. This operation is sometimes called logarithmic compression (FMP, p. 310).

In [10]:
log_rmse = numpy.log1p(10*rmse)
log_rmse_diff = numpy.zeros_like(log_rmse)
log_rmse_diff[1:] = numpy.diff(log_rmse)
In [11]:
log_energy_novelty = numpy.max([numpy.zeros_like(log_rmse_diff), log_rmse_diff], axis=0)
In [12]:
plt.figure(figsize=(15, 6))
plt.plot(t, log_rmse, 'b--', t, log_rmse_diff, 'g--^', t, log_energy_novelty, 'r-')
plt.xlim(0, t.max())
plt.xlabel('Time (sec)')
plt.legend(('log RMSE', 'delta log RMSE', 'log energy novelty')) 
Out[12]:
<matplotlib.legend.Legend at 0x119ac3e50>

Spectral-based Novelty Functions¶

There are two problems with the energy novelty function:

  1. It is sensitive to energy fluctuations belonging to the same note.
  2. It is not sensitive to spectral fluctuations between notes where amplitude remains the same.

For example, consider the following audio signal composed of pure tones of equal magnitude:

In [13]:
sr = 22050
def generate_tone(midi):
    T = 0.5
    t = numpy.linspace(0, T, int(T*sr), endpoint=False)
    f = librosa.midi_to_hz(midi)
    return numpy.sin(2*numpy.pi*f*t)
In [14]:
x = numpy.concatenate([generate_tone(midi) for midi in [48, 52, 55, 60, 64, 67, 72, 76, 79, 84]])

Listen:

In [15]:
ipd.Audio(x, rate=sr)
Out[15]:

The energy novelty function remains roughly constant:

In [16]:
hop_length = 512
frame_length = 1024
rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length).flatten()
rmse_diff = numpy.zeros_like(rmse)
rmse_diff[1:] = numpy.diff(rmse)
In [17]:
energy_novelty = numpy.max([numpy.zeros_like(rmse_diff), rmse_diff], axis=0)
In [18]:
frames = numpy.arange(len(rmse))
t = librosa.frames_to_time(frames, sr=sr)
In [19]:
plt.figure(figsize=(15, 4))
plt.plot(t, rmse, 'b--', t, rmse_diff, 'g--^', t, energy_novelty, 'r-')
plt.xlim(0, t.max())
plt.xlabel('Time (sec)')
plt.legend(('RMSE', 'delta RMSE', 'energy novelty')) 
Out[19]:
<matplotlib.legend.Legend at 0x11c547550>

Instead, we will compute a spectral novelty function (FMP, p. 309):

  1. Compute the log-amplitude spectrogram.
  2. Within each frequency bin, $k$, compute the energy novelty function as shown earlier, i.e. (a) first-order difference, and (b) half-wave rectification.
  3. Sum across all frequency bins, $k$.

Luckily, librosa has librosa.onset.onset_strength which computes a novelty function using spectral flux.

In [20]:
spectral_novelty = librosa.onset.onset_strength(x, sr=sr)
In [21]:
frames = numpy.arange(len(spectral_novelty))
t = librosa.frames_to_time(frames, sr=sr)
In [22]:
plt.figure(figsize=(15, 4))
plt.plot(t, spectral_novelty, 'r-')
plt.xlim(0, t.max())
plt.xlabel('Time (sec)')
plt.legend(('Spectral Novelty',))
Out[22]:
<matplotlib.legend.Legend at 0x10c170690>

Questions¶

Novelty functions are dependent on frame_length and hop_length. Adjust these two parameters. How do they affect the novelty function?

Try with other audio files. How do the novelty functions compare?

In [23]:
ls audio
125_bounce.wav         classic_rock_beat.wav  oboe_c6.wav
58bpm.wav              conga_groove.wav       prelude_cmaj.wav
beatbox_steve.wav      funk_groove.mp3        simple_loop.wav
c_strum.wav            jangle_pop.mp3         simple_piano.wav
clarinet_c6.wav        latin_groove.mp3       tone_440.wav