import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa
The basic representation of an audio signal is in the time domain.
Sound is air vibrating. An audio signal represents the fluctuation in air pressure caused by the vibration as a function of time.
Let's download and listen to a file:
import urllib
urllib.urlretrieve('http://audio.musicinformationretrieval.com/c_strum.wav')
x, fs = librosa.load('c_strum.wav', sr=44100)
from IPython.display import Audio
Audio(x, rate=fs)
To plot a signal in the time domain, use librosa.display.waveplot
:
librosa.display.waveplot(x, fs, alpha=0.5)
Let's zoom in:
plt.plot(x[8000:9000])
Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the sampling frequency (abbreviated fs
) or sample rate (abbreviated sr
). For this workshop, we will mostly work with a sampling frequency of 44100 Hz.
The autocorrelation of a signal describes the similarity of a signal against a delayed version of itself.
Using numpy.correlate
:
# Because the autocorrelation produces a symmetric signal, we only care about the "right half".
r = numpy.correlate(x, x, mode='full')[len(x)-1:]
print x.shape, r.shape
plt.plot(r[:10000])
plt.xlabel('Lag (samples)')
Using librosa.autocorrelate
:
r = librosa.autocorrelate(x, max_size=10000)
plt.plot(r)
plt.xlabel('Lag (samples)')
from essentia.standard import AutoCorrelation
autocorr = AutoCorrelation()
r = autocorr(x)
plt.plot(r[:10000])
plt.xlabel('Lag (samples)')