The mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10-20) which concisely describe the overall shape of a spectral envelope. In MIR, it is often used to describe timbre.
import urllib
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/simpleLoop.wav'
urllib.urlretrieve(url, filename='simpleLoop.wav')
from essentia.standard import MonoLoader
x = MonoLoader(filename='simpleLoop.wav')()
fs = 44100.0
t = arange(len(x))/fs
plot(t, x)
xlabel('Time (seconds)')
from IPython.display import Audio
Audio(x, rate=fs)
essentia.standard.MFCC
¶We will use essentia.standard.MFCC
to compute MFCCs across a signal, and we will display them as a "MFCC-gram":
from essentia.standard import MFCC, Spectrum, Windowing, FrameGenerator
hamming_window = Windowing(type='hamming')
spectrum = Spectrum() # we just want the magnitude spectrum
mfcc = MFCC(numberCoefficients=13)
frame_sz = 1024
hop_sz = 500
mfccs = array([mfcc(spectrum(hamming_window(frame)))[1]
for frame in FrameGenerator(x, frameSize=frame_sz, hopSize=hop_sz)])
print mfccs.shape
imshow(mfccs[:,1:].T, origin='lower', aspect='auto', interpolation='nearest') # Ignore the 0th MFCC
yticks(range(12), range(1,13)) # Ignore the 0th MFCC
ylabel('MFCC Coefficient Index')
xlabel('Frame Index')
The very first MFCC, the 0th coefficient, does not convey information relevant to the overall shape. (It only conveys a vertical offset, i.e. adding a constant value to the entire spectrum.) Therefore, we discard the first MFCC when performing classification.
librosa.feature.mfcc
¶import librosa
mfccs = array([librosa.feature.mfcc(x[i:i+frame_sz], sr=fs, n_mfcc=13)
for i in range(0, len(x), hop_sz)])
print mfccs.shape
librosa.display.specshow(mfccs[:,1:].T, sr=fs, hop_length=hop_sz, x_axis='time')