K-Nearest Neighbor

My first audio classifier: introducing K-NN!

We can now appreciate why we need additional intelligence in our systems -- heuristics don't go very far in the world of complex audio signals. We'll be using scikit-learn's implementation of the k-NN for our work here. It proves be a straight-forward and easy to use implementation. The steps and skills of working with one classifier will scale nicely to working with other, more complex classifiers.

Training Data

In [1]:
from sklearn import preprocessing
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
import essentia
import essentia.standard as ess
import urllib
import urllib2

Read in a list of filenames of ten kick drums and ten snare drums:

In [2]:
kicks_URL = "https://ccrma.stanford.edu/workshops/mir2014/KickCorpus.txt"
kick_file_list = [audio_file_URL.rstrip() for audio_file_URL in urllib2.urlopen(kicks_URL)]
kick_file_list
Out[2]:
['https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_01_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_02_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_03_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_04_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_05_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_06_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_07_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_08_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_09_V01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/kicks/Bass_Drum_10_V01.WAV']
In [3]:
snares_URL = "https://ccrma.stanford.edu/workshops/mir2014/SnareCorpus.txt"
snare_file_list = [audio_file_URL.rstrip() for audio_file_URL in urllib2.urlopen(snares_URL)]
snare_file_list
Out[3]:
['https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_01_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_02_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_04_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_05_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_06_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_07_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_08_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_09_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_10_01.WAV',
 'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/snare_mono.wav']

To access the filenames contained in the array, use the square brackets [ ] to get to the element that you want to access:

In [4]:
snare_file_list[0]
Out[4]:
'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_01_01.WAV'

When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them. Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes the audio file. Note that the only information that we retain in memory are the features that are extracted.

Create a function that reads an audio file from a remote URL and outputs a feature vector.

In [5]:
zcr = ess.ZeroCrossingRate()
centroid = ess.Centroid()
spectrum = ess.Spectrum()
centralmoments = ess.CentralMoments()
distributionshape = ess.DistributionShape()
hamming_window = ess.Windowing(type='hamming')

def extract_features_from_audio(frame):
    spectral_magnitude = spectrum(hamming_window(frame))
    spectral_moments = distributionshape(centralmoments(spectral_magnitude))
    feature_vector = [zcr(frame), centroid(spectral_magnitude)]
    feature_vector.extend( spectral_moments )
    return feature_vector

def extract_features(url, fs=44100, frame_sz=0.200):
    urllib.urlretrieve(url, filename='temp.wav')
    try:
        audio = ess.MonoLoader(filename='temp.wav', sampleRate=fs)() 
        frame = audio[:int(frame_sz*fs)]
        return extract_features_from_audio(frame)
    except RuntimeError:
        print 'error:', url
        return None

Use this function to compute one feature vector for each of the training audio files:

In [6]:
feature_table = array([extract_features(url) for url in kick_file_list + snare_file_list])
print feature_table.min(axis=0)
print feature_table.max(axis=0)
[ 0.00408163  0.01719355  0.00287186  0.55032426 -0.47705555]
[  1.62471652e-01   2.71532953e-01   4.44976576e-02   7.22230673e+00
   7.60733871e+01]

Feature Scaling

Since the features are different scales, we will want to normalize each feature vector to a common range - storing the scaling coefficients for later use. Many techniques exist for scaling your features. We'll use linear scaling, which forces the features into the range -1 to 1.

For this, we'll use a scikit.learn class called MinMaxScaler. MinMaxScaler fits and transforms, returning an array of scaled values, and retains coefficients which were used to scale each column into -1 to 1.

In [7]:
scaler = preprocessing.MinMaxScaler(feature_range=(-1, 1))
training_features = scaler.fit_transform(feature_table)
print training_features.min(axis=0)
print training_features.max(axis=0)
[-1. -1. -1. -1. -1.]
[ 1.  1.  1.  1.  1.]

Finally, create the associated labels for the training files, where 0 denotes a kick drum, and 1 denotes a snare drum:

In [8]:
training_labels = concatenate([zeros(10), ones(10)])

Plot the training features:

In [9]:
scatter(training_features[:10,0], training_features[:10,1], c='b')
scatter(training_features[10:,0], training_features[10:,1], c='r')
xlabel('Zero Crossing Rate')
ylabel('Spectral Centroid')
Out[9]:
<matplotlib.text.Text at 0x117482e50>

Building the K-NN Model

Build a k-NN model for the snare drums using scikit.learn's KNeighborsClassifier class.

In [10]:
model = KNeighborsClassifier(n_neighbors=1)
#model = SVC()
#model = LogisticRegression()
#model = GaussianNB()
#model = AdaBoostClassifier()
model.fit(training_features, training_labels)
Out[10]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=1.0, n_estimators=50, random_state=None)

Try it out:

In [11]:
model.predict(training_features[0])
Out[11]:
array([ 0.])

Evaluate the model on the training data:

In [12]:
model.score(training_features, training_labels)
Out[12]:
1.0

Testing

In [13]:
test_URL = "https://ccrma.stanford.edu/workshops/mir2014/TestKicksCorpus.txt"
test_URL = "https://ccrma.stanford.edu/workshops/mir2014/TestSnaresCorpus.txt"
test_file_list = [audio_file_URL.replace('#', '%23').rstrip() for audio_file_URL in urllib2.urlopen(test_URL)]
In [14]:
test_features = [extract_features(url) for url in test_file_list if extract_features(url) is not None]
error: https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/test%20snares/15%20Mid%20Rim.wav
error: https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/test%20snares/16%20Tone%20syn.wav
error: https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/test%20snares/17%20Tiny.wav
error: https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/test%20snares/18%20Shortest.wav
In [15]:
test_features = scaler.transform(test_features)
print test_features.min(axis=0)
print test_features.max(axis=0)
[-0.4158912  -0.10375141 -0.46853682 -1.04463977 -0.99018434]
[ 1.1159628   1.48739906  1.60301345 -0.50959573 -0.77698699]

Evaluate:

In [16]:
test_labels = ones(len(test_features))
model.score(test_features, test_labels)
Out[16]:
1.0

Exercise: Instrument Classification using K-NN

This exercise is loosely based upon "Lab 1" from previous MIR workshops (2010).

Goals

  1. Extract spectral features from an audio signal.
  2. Train a K-Nearest Neighbor classifier.
  3. Use the classifier to classify beats in a drum loop.

Step 1: Retrieve Audio, Detect Onsets, and Segment

Follow the same steps here.

  1. Download the file simpleLoop.wav onto your local machine.
  2. Save the audio signal into an array.
  3. Find the times, in seconds, when onsets occur in the audio signal.
  4. Save into an array, segments, 100-ms segments beginning at each onset.
In [17]:
import urllib
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/simpleLoop.wav'
#url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/125BOUNC-mono.WAV'
urllib.urlretrieve(url, filename='test.wav')
Out[17]:
('test.wav', <httplib.HTTPMessage instance at 0x118e39200>)
In [18]:
fs = 44100
loader = ess.MonoLoader(filename='test.wav')
audio = loader()
In [19]:
from IPython.display import Audio
Audio(data=audio, rate=fs)
Out[19]:
In [20]:
get_onsets = ess.OnsetRate()
onset_times, onset_rate = get_onsets(audio)
print onset_times
[ 0.01160998  0.48761904  0.98684806  1.24226761  1.48607707]
In [21]:
onset_samples = [int(fs*i) for i in onset_times]
print onset_samples
[511, 21503, 43519, 54784, 65535]
In [22]:
onsets_marker = ess.AudioOnsetsMarker(onsets=onset_times, type='beep')
with_beeps = onsets_marker(audio)
Audio(data=with_beeps, rate=fs)
Out[22]:
In [23]:
frame_sz = int(0.200*fs)
segments = array([audio[i:i+frame_sz] for i in onset_samples])

Step 2: Extract Features

For each segment, compute the same features as above: zero crossing rate and spectral moments.

In [24]:
feature_table = [extract_features_from_audio(segment) for segment in segments]
test_features = scaler.transform(feature_table)

Step 3: Train K-NN Classifier

For training, see Training Data below.

For more on K-NN, see the notebook on K-NN.

Step 4: Run the Classifier

In [25]:
test_labels = model.predict(test_features)
test_labels
Out[25]:
array([ 0.,  1.,  0.,  0.,  1.])

Step 5: Sonify the Classifier Output

Play a "beep" for each detected kick drum.

In [26]:
onsets_marker = ess.AudioOnsetsMarker(onsets=onset_times[test_labels==1], type='beep')
with_beeps = onsets_marker(audio)
Audio(data=with_beeps, rate=fs)
Out[26]:

Bonus

In addition to the MFCCs, extract the following features:

  • spectral centroid
  • spectral spread
  • spectral skewness
  • spectral kurtosis.
In [27]:
from essentia.standard import CentralMoments, DistributionShape
DistributionShape?

Re-train the classifier, and re-run the classifier over the test audio signal. Do the results change?

Repeat the steps above for the following audio files:

In [28]:
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/CongaGroove-mono.wav'
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/125BOUNC-mono.WAV'