In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa, IPython.display as ipd, urllib, os.path

Audio Representation

In performance, musicians convert sheet music representations into sound which is transmitted through the air as air pressure oscillations. In essence, sound is simply air vibrating (Wikipedia). Sound vibrates through the air as longitudinal waves, i.e. the oscillations are parallel to the direction of propagation.

Audio refers to the production, transmission, or reception of sounds that are audible by humans. An audio signal is a representation of sound that represents the fluctuation in air pressure caused by the vibration as a function of time. Unlike sheet music or symbolic representations, audio representations encode everything that is necessary to reproduce an acoustic realization of a piece of music. However, note parameters such as onsets, durations, and pitches are not encoded explicitly. This makes converting from an audio representation to a symbolic representation a difficult and ill-defined task.

Waveforms and the Time Domain

The basic representation of an audio signal is in the time domain.

Let's download and listen to a file:

In [2]:
if not os.path.exists("c_strum.wav"):
    urllib.urlretrieve('http://audio.musicinformationretrieval.com/c_strum.wav')
x, fs = librosa.load('c_strum.wav', sr=44100)
ipd.Audio(x, rate=fs)
Out[2]:

The change in air pressure at a certain time is graphically represented by a pressure-time plot, or simply waveform.

To plot a waveform, use librosa.display.waveplot:

In [3]:
librosa.display.waveplot(x, fs, alpha=0.5)
Out[3]:
<matplotlib.collections.PolyCollection at 0x1107f4110>

Let's zoom in:

In [4]:
plt.plot(x[8000:9000])
Out[4]:
[<matplotlib.lines.Line2D at 0x110274350>]

Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the sampling frequency (often abbreviated fs) or sampling rate (often abbreviated sr). For this workshop, we will mostly work with a sampling frequency of 44100 Hz, the sampling rate of CD recordings.

Timbre: Temporal Indicators

Timbre is the quality of sound that distinguishes the tone of different instruments and voices even if the sounds have the same pitch and loudness.

One characteristic of timbre is its temporal evolution. The envelope of a signal is a smooth curve that approximates the amplitude extremes of a waveform over time.

Envelopes are often modeled by the ADSR model (Wikipedia) which describes four phases of a sound: attack, decay, sustain, release.

During the attack phase, the sound builds up, usually with noise-like components over a broad frequency range. Such a noise-like short-duration sound at the start of a sound is often called a transient.

During the decay phase, the sound stabilizes and reaches a steady periodic pattern.

During the sustain phase, the energy remains fairly constant.

During the release phase, the sound fades away.

The ADSR model is a simplification and does not necessarily model the amplitude envelopes of all sounds.

In [5]:
ipd.Image("https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/ADSR_parameter.svg/640px-ADSR_parameter.svg.png")
Out[5]:

Timbre: Spectral Indicators

Another property used to characterize timbre is the existence of partials and their relative strengths. Partials are the dominant frequencies in a musical tone with the lowest partial being the fundamental frequency.

The partials of a sound are visualized with a spectrogram. A spectrogram shows the intensity of frequency components over time. (See Fourier Transform and Short-Time Fourier Transform for more.)

Pure Tone

Let's synthesize a pure tone at 1047 Hz, concert C6:

In [6]:
n = numpy.arange(50000)
f0 = 1047.0
fs = 44100.0
t = n/fs
x = numpy.sin(2*numpy.pi*f0*t)
ipd.Audio(x, rate=fs)
Out[6]:

Display the spectrum of the pure tone:

In [7]:
X = scipy.fft(x[:4096])
X_mag = numpy.absolute(X)
f = numpy.linspace(0, fs, 4096)
plt.plot(f[:1000], X_mag[:1000]) # magnitude spectrum
plt.xlabel('Frequency (Hz)')
Out[7]:
<matplotlib.text.Text at 0x1104403d0>

Oboe

Let's download an oboe playing a C6:

In [8]:
if not os.path.exists("oboeC6.wav"):
    urllib.urlretrieve(
        'http://audio.musicinformationretrieval.com/Instrument%20Samples/oboe/Oboe1/Oboe-C6.wav',
        filename="oboeC6.wav"
    )
x, fs = librosa.load('oboeC6.wav', sr=44100)
ipd.Audio(x, rate=fs)
Out[8]:
In [9]:
print x.shape
(47249,)

Display the spectrum of the oboe:

In [10]:
X = scipy.fft(x[10000:14096])
X_mag = numpy.absolute(X)
plt.plot(f[:1000], X_mag[:1000]) # magnitude spectrum
plt.xlabel('Frequency (Hz)')
Out[10]:
<matplotlib.text.Text at 0x11046bd10>

Clarinet

Let's download a clarinet playing a concert C6:

In [11]:
if not os.path.exists("clarinetC6.wav"):
    # despite the misleading filename, this is actually a concert C6
    urllib.urlretrieve(
        "http://audio.musicinformationretrieval.com/Instrument%20Samples/clarinet/Clar/ClarBb-ff-C5.wav",
        filename="clarinetC6.wav"
    )
x, fs = librosa.load('clarinetC6.wav', sr=44100)
ipd.Audio(x, rate=fs)
Out[11]:
In [12]:
print x.shape
(102772,)
In [13]:
X = scipy.fft(x[10000:14096])
X_mag = numpy.absolute(X)
plt.plot(f[:1000], X_mag[:1000]) # magnitude spectrum
plt.xlabel('Frequency (Hz)')
Out[13]:
<matplotlib.text.Text at 0x110958350>

Notice the difference in the relative amplitudes of the partial components. All three signals have approximately the same pitch and fundamental frequency, yet their timbres differ.