My first audio classifier: introducing K-NN!
We can now appreciate why we need additional intelligence in our systems -- heuristics don't go very far in the world of complex audio signals. We'll be using scikit-learn's implementation of the k-NN for our work here. It proves be a straight-forward and easy to use implementation. The steps and skills of working with one classifier will scale nicely to working with other, more complex classifiers.
from sklearn import preprocessing
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
import essentia
import essentia.standard as ess
import urllib
import urllib2
Read in a list of filenames of ten kick drums and ten snare drums:
kicks_URL = "https://ccrma.stanford.edu/workshops/mir2014/KickCorpus.txt"
kick_file_list = [audio_file_URL.rstrip() for audio_file_URL in urllib2.urlopen(kicks_URL)]
kick_file_list
snares_URL = "https://ccrma.stanford.edu/workshops/mir2014/SnareCorpus.txt"
snare_file_list = [audio_file_URL.rstrip() for audio_file_URL in urllib2.urlopen(snares_URL)]
snare_file_list
To access the filenames contained in the array, use the square brackets [ ] to get to the element that you want to access:
snare_file_list[0]
When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them. Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes the audio file. Note that the only information that we retain in memory are the features that are extracted.
Create a function that reads an audio file from a remote URL and outputs a feature vector.
zcr = ess.ZeroCrossingRate()
centroid = ess.Centroid()
spectrum = ess.Spectrum()
centralmoments = ess.CentralMoments()
distributionshape = ess.DistributionShape()
hamming_window = ess.Windowing(type='hamming')
def extract_features_from_audio(frame):
spectral_magnitude = spectrum(hamming_window(frame))
spectral_moments = distributionshape(centralmoments(spectral_magnitude))
feature_vector = [zcr(frame), centroid(spectral_magnitude)]
feature_vector.extend( spectral_moments )
return feature_vector
def extract_features(url, fs=44100, frame_sz=0.200):
urllib.urlretrieve(url, filename='temp.wav')
try:
audio = ess.MonoLoader(filename='temp.wav', sampleRate=fs)()
frame = audio[:int(frame_sz*fs)]
return extract_features_from_audio(frame)
except RuntimeError:
print 'error:', url
return None
Use this function to compute one feature vector for each of the training audio files:
feature_table = array([extract_features(url) for url in kick_file_list + snare_file_list])
print feature_table.min(axis=0)
print feature_table.max(axis=0)
Since the features are different scales, we will want to normalize each feature vector to a common range - storing the scaling coefficients for later use. Many techniques exist for scaling your features. We'll use linear scaling, which forces the features into the range -1 to 1.
For this, we'll use a scikit.learn class called MinMaxScaler. MinMaxScaler fits and transforms, returning an array of scaled values, and retains coefficients which were used to scale each column into -1 to 1.
scaler = preprocessing.MinMaxScaler(feature_range=(-1, 1))
training_features = scaler.fit_transform(feature_table)
print training_features.min(axis=0)
print training_features.max(axis=0)
Finally, create the associated labels for the training files, where 0 denotes a kick drum, and 1 denotes a snare drum:
training_labels = concatenate([zeros(10), ones(10)])
Plot the training features:
scatter(training_features[:10,0], training_features[:10,1], c='b')
scatter(training_features[10:,0], training_features[10:,1], c='r')
xlabel('Zero Crossing Rate')
ylabel('Spectral Centroid')
Build a k-NN model for the snare drums using scikit.learn's KNeighborsClassifier class.
model = KNeighborsClassifier(n_neighbors=1)
#model = SVC()
#model = LogisticRegression()
#model = GaussianNB()
#model = AdaBoostClassifier()
model.fit(training_features, training_labels)
Try it out:
model.predict(training_features[0])
Evaluate the model on the training data:
model.score(training_features, training_labels)
Now, test the classifier with these test examples:
test_URL = "https://ccrma.stanford.edu/workshops/mir2014/TestKicksCorpus.txt"
test_URL = "https://ccrma.stanford.edu/workshops/mir2014/TestSnaresCorpus.txt"
test_file_list = [audio_file_URL.replace('#', '%23').rstrip() for audio_file_URL in urllib2.urlopen(test_URL)]
test_features = [extract_features(url) for url in test_file_list if extract_features(url) is not None]
test_features = scaler.transform(test_features)
print test_features.min(axis=0)
print test_features.max(axis=0)
Evaluate:
test_labels = ones(len(test_features))
model.score(test_features, test_labels)
This exercise is loosely based upon "Lab 1" from previous MIR workshops (2010).
simpleLoop.wav
onto your local machine.segments
, 100-ms segments beginning at each onset.import urllib
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/simpleLoop.wav'
#url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/125BOUNC-mono.WAV'
urllib.urlretrieve(url, filename='test.wav')
fs = 44100
loader = ess.MonoLoader(filename='test.wav')
audio = loader()
from IPython.display import Audio
Audio(data=audio, rate=fs)
get_onsets = ess.OnsetRate()
onset_times, onset_rate = get_onsets(audio)
print onset_times
onset_samples = [int(fs*i) for i in onset_times]
print onset_samples
onsets_marker = ess.AudioOnsetsMarker(onsets=onset_times, type='beep')
with_beeps = onsets_marker(audio)
Audio(data=with_beeps, rate=fs)
frame_sz = int(0.200*fs)
segments = array([audio[i:i+frame_sz] for i in onset_samples])
For each segment, compute the same features as above: zero crossing rate and spectral moments.
feature_table = [extract_features_from_audio(segment) for segment in segments]
test_features = scaler.transform(feature_table)
For training, see Training Data below.
For more on K-NN, see the notebook on K-NN.
test_labels = model.predict(test_features)
test_labels
Play a "beep" for each detected kick drum.
onsets_marker = ess.AudioOnsetsMarker(onsets=onset_times[test_labels==1], type='beep')
with_beeps = onsets_marker(audio)
Audio(data=with_beeps, rate=fs)
In addition to the MFCCs, extract the following features:
from essentia.standard import CentralMoments, DistributionShape
DistributionShape?
Re-train the classifier, and re-run the classifier over the test audio signal. Do the results change?
Repeat the steps above for the following audio files:
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/CongaGroove-mono.wav'
url = 'https://ccrma.stanford.edu/workshops/mir2014/audio/125BOUNC-mono.WAV'