In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa, IPython.display as ipd, urllib

Autocorrelation

The autocorrelation of a signal describes the similarity of a signal against a time-shifted version of itself. For a signal $x$, the autocorrelation $r$ is:

$$ r(k) = \sum_n x(n) x(n-k) $$

In this equation, $k$ is often called the lag parameter. $r(k)$ is maximized at $k = 0$ and is symmetric about $k$.

The autocorrelation is useful for finding repeated patterns in a signal. For example, at short lags, the autocorrelation can tell us something about the signal's fundamental frequency. For longer lags, the autocorrelation may tell us something about the tempo of a musical signal.

Let's download and listen to a file:

In [2]:
urllib.urlretrieve('http://audio.musicinformationretrieval.com/c_strum.wav')
x, fs = librosa.load('c_strum.wav', sr=44100)
ipd.Audio(x, rate=fs)
Out[2]:
In [3]:
librosa.display.waveplot(x, fs, alpha=0.5)
Out[3]:
<matplotlib.collections.PolyCollection at 0x109fc6f50>

numpy.correlate

Use numpy.correlate to compute the autocorrelation:

In [4]:
# Because the autocorrelation produces a symmetric signal, we only care about the "right half".
r = numpy.correlate(x, x, mode='full')[len(x)-1:]
print x.shape, r.shape
(204800,) (204800,)

Plot the autocorrelation:

In [5]:
plt.plot(r[:10000])
plt.xlabel('Lag (samples)')
Out[5]:
<matplotlib.text.Text at 0x10a03d650>

librosa.autocorrelate

In [6]:
r = librosa.autocorrelate(x, max_size=10000)
plt.plot(r)
plt.xlabel('Lag (samples)')
Out[6]:
<matplotlib.text.Text at 0x10a03dfd0>