In [2]:
import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa, IPython.display as ipd, urllib

Audio Representation

In performance, musicians convert sheet music representations into sound which is transmitted through the air as air pressure oscillations. In essence, sound is simply air vibrating (Wikipedia). Sound vibrates through the air as longitudinal waves, i.e. the oscillations are parallel to the direction of propagation.

Audio refers to the production, transmission, or reception of sounds that are audible by humans. An audio signal is a representation of sound that represents the fluctuation in air pressure caused by the vibration as a function of time. Unlike sheet music or symbolic representations, audio representations encode everything that is necessary to reproduce an acoustic realization of a piece of music. However, note parameters such as onsets, durations, and pitches are not encoded explicitly. This makes converting from an audio representation to a symbolic representation a difficult and ill-defined task.

Waveforms and the Time Domain

The basic representation of an audio signal is in the time domain.

Let's download and listen to a file:

In [4]:
urllib.urlretrieve('http://audio.musicinformationretrieval.com/c_strum.wav')
x, fs = librosa.load('c_strum.wav', sr=44100)
ipd.Audio(x, rate=fs)
Out[4]:

The change in air pressure at a certain time is graphically represented by a pressure-time plot, or simply waveform.

To plot a waveform, use librosa.display.waveplot:

In [3]:
librosa.display.waveplot(x, fs, alpha=0.5)
Out[3]:
<matplotlib.collections.PolyCollection at 0x112d35890>

Let's zoom in:

In [4]:
plt.plot(x[8000:9000])
Out[4]:
[<matplotlib.lines.Line2D at 0x11321eed0>]

Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the sampling frequency (often abbreviated fs) or sampling rate (often abbreviated sr). For this workshop, we will mostly work with a sampling frequency of 44100 Hz, the sampling rate of CD recordings.

Frequency and Pitch

Dynamics, Intensity, and Loudness

Timbre

Timbre is the quality of sound that distinguishes the tone of different instruments and voices even if the sounds have the same pitch and loudness.

One characteristic of timbre is its temporal evolution. The envelope of a signal is a smooth curve that approximates the amplitude extremes of a waveform over time.

Envelopes are often modeled by the ADSR model (Wikipedia) which describes four phases of a sound: attack, decay, sustain, release.

During the attack phase, the sound builds up, usually with noise-like components over a broad frequency range. Such a noise-like short-duration sound at the start of a sound is often called a transient.

During the decay phase, the sound stabilizes and reaches a steady periodic pattern.

During the sustain phase, the energy remains fairly constant.

During the release phase, the sound fades away.

The ADSR model is a simplification and does not necessarily model the amplitude envelopes of all sounds.

In [6]:
ipd.Image("https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/ADSR_parameter.svg/640px-ADSR_parameter.svg.png")
Out[6]:

Another property used to characterize timbre is the existence of partials and their relative strengths. Partials are the dominant frequencies in a musical tone with the lowest partial being the fundamental frequency.

The partials of a sound are visualized with a spectrogram. A spectrogram shows the intensity of frequency components over time.

Autocorrelation

The autocorrelation of a signal describes the similarity of a signal against a delayed version of itself.

In [10]:
# Because the autocorrelation produces a symmetric signal, we only care about the "right half".
r = numpy.correlate(x, x, mode='full')[len(x)-1:]
print x.shape, r.shape
plt.plot(r[:10000])
plt.xlabel('Lag (samples)')
(204800,) (204800,)
Out[10]:
<matplotlib.text.Text at 0x113002950>
In [11]:
r = librosa.autocorrelate(x, max_size=10000)
plt.plot(r)
plt.xlabel('Lag (samples)')
Out[11]:
<matplotlib.text.Text at 0x113039ad0>
In [12]:
from essentia.standard import AutoCorrelation
autocorr = AutoCorrelation()
r = autocorr(x)
plt.plot(r[:10000])
plt.xlabel('Lag (samples)')
Out[12]:
<matplotlib.text.Text at 0x114c65d90>