Commit 28ac4286 authored by Steve Tjoa's avatar Steve Tjoa

kmeans instrument classification

parent 6522dbee
{
"metadata": {
"name": "",
"signature": "sha256:9f061a2cc6cbebeaca7a1ea19b5b37a68914601c93400dee837284902e04876f"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Unsupervised Instrument Classification Using K-Means "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lab is loosely based on [Lab 3](https://ccrma.stanford.edu/workshops/mir2010/Lab3_2010.pdf) (2010)."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Read Audio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve an audio file, load it into an array, and listen to it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urllib\n",
"urllib.urlretrieve?\n",
"\n",
"from essentia.standard import MonoLoader\n",
"MonoLoader?\n",
"\n",
"from IPython.display import Audio\n",
"Audio?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Extract Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract a set of features from the audio. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the [Essentia algorithm overview](http://essentia.upf.edu/documentation/algorithms_overview.html)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"num_frames = 100 # placeholder\n",
"num_features = 5 # placeholder\n",
"features = zeros([num_frames, num_features]) # placeholder"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Scale Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `sklearn.preprocessing.MinMaxScaler` to scale your features to be within `[-1, 1]`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn.preprocessing import MinMaxScaler\n",
"MinMaxScaler?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Plot Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `scatter` to plot features on a 2-D plane. (Choose two features at a time.)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scatter?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Cluster Using K-Means"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `KMeans` to cluster your features and compute labels."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn.cluster import KMeans\n",
"KMeans?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Plot Features by Class Label"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `scatter`, but this time choose a different marker color (or type) for each class."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Listen to Clustered Frames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the `concatenated_signal` function from the previous exercise to concatenate frames from the same cluster into one signal. Then listen to the signal. Compare across separate classes. What do you hear?\n",
"\n",
"You may want to do this for every frame, or only for onset-detected frames (using `essentia.standard.OnsetRate`)."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Bonus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different number of clusters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different initialization method in `KMeans`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use different features. Compare tonal features against timbral features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use different audio files."
]
}
],
"metadata": {}
}
]
}
\ No newline at end of file
......@@ -292,7 +292,7 @@ div#notebook {
<li><a href="knn.html">K-Nearest Neighbor Classification</a> (<a href="knn.ipynb">ipynb</a>)</li>
<li><a href="knn_instrument_classification.html">Exercise: K-Nearest Neighbor Instrument Classification</a> (<a href="knn_instrument_classification.ipynb">ipynb</a>)</li>
<li><a href="kmeans.html">K-Means Clustering</a> (<a href="kmeans.ipynb">ipynb</a>)</li>
<li><a href="exercises/kmeans_instrument_classification.ipynb">Exercise: Unsupervised Instrument Classification using K-Means</a></li>
<li><a href="kmeans_instrument_classification.html">Exercise: Unsupervised Instrument Classification using K-Means</a> (<a href="kmeans_instrument_classification.ipynb">ipynb</a>)</li>
</ol>
</div>
......
......@@ -81,7 +81,7 @@
"1. [K-Nearest Neighbor Classification](knn.html) ([ipynb](knn.ipynb))\n",
"1. [Exercise: K-Nearest Neighbor Instrument Classification](knn_instrument_classification.html) ([ipynb](knn_instrument_classification.ipynb))\n",
"1. [K-Means Clustering](kmeans.html) ([ipynb](kmeans.ipynb))\n",
"1. [Exercise: Unsupervised Instrument Classification using K-Means](exercises/kmeans_instrument_classification.ipynb)"
"1. [Exercise: Unsupervised Instrument Classification using K-Means](kmeans_instrument_classification.html) ([ipynb](kmeans_instrument_classification.ipynb))"
]
},
{
......
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, mir_eval, IPython.display, urllib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[&larr; Back to Index](index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unsupervised Instrument Classification Using K-Means "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lab is loosely based on [Lab 3](https://ccrma.stanford.edu/workshops/mir2010/Lab3_2010.pdf) (2010)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read Audio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve an audio file, load it into an array, and listen to it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"urllib.urlretrieve?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.load?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"IPython.display.Audio?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Detect Onsets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Detect onsets in the audio signal:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.onset.onset_detect?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert the onsets from units of frames to seconds (and samples):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.frames_to_time?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.frames_to_samples?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Listen to detected onsets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mir_eval.sonify.clicks?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"IPython.display.Audio?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the [librosa API reference](http://bmcfee.github.io/librosa/index.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, define which features to extract:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def extract_features(x, fs):\n",
" feature_1 = librosa.zero_crossings(x).sum() # placeholder\n",
" feature_2 = 0 # placeholder\n",
" return [feature_1, feature_2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For each onset, extract a feature vector from the signal:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Assumptions:\n",
"# x: input audio signal\n",
"# fs: sampling frequency\n",
"# onset_samples: onsets in units of samples\n",
"frame_sz = fs*0.100\n",
"features = numpy.array([extract_features(x[i:i+frame_sz], fs) for i in onset_samples])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scale Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `sklearn.preprocessing.MinMaxScaler` to scale your features to be within `[-1, 1]`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"sklearn.preprocessing.MinMaxScaler?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"sklearn.preprocessing.MinMaxScaler.fit_transform?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `scatter` to plot features on a 2-D plane. (Choose two features at a time.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"plt.scatter?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cluster Using K-Means"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `KMeans` to cluster your features and compute labels."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"sklearn.cluster.KMeans?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"sklearn.cluster.KMeans.fit_predict?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot Features by Class Label"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `scatter`, but this time choose a different marker color (or type) for each class."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"plt.scatter?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Listen to Click Track"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a beep for each onset within a class:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"beeps = mir_eval.sonify.clicks(onset_times[labels==0], fs, length=len(x))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"IPython.display.Audio?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Listen to Clustered Frames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the `concatenated_segments` function from the [feature sonification exercise](feature_sonification.html) to concatenate frames from the same cluster into one signal. Then listen to the signal. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def concatenate_segments(segments, fs=44100, pad_time=0.300):\n",
" padded_segments = [numpy.concatenate([segment, numpy.zeros(int(pad_time*fs))]) for segment in segments]\n",
" return numpy.concatenate(padded_segments)\n",
"concatenated_signal = concatenate_segments(segments, fs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compare across separate classes. What do you hear?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## For Further Exploration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different number of clusters in `KMeans`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different initialization method in `KMeans`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use different features. Compare tonal features against timbral features."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.feature?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use different audio files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#filename = '1_bar_funk_groove.mp3'\n",
"#filename = '58bpm.wav'\n",
"#filename = '125_bounce.wav'\n",
"#filename = 'prelude_cmaj_10s.wav'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[&larr; Back to Index](index.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment