"Unsupervised Instrument Classification Using K-Means "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lab is loosely based on [Lab 3](https://ccrma.stanford.edu/workshops/mir2010/Lab3_2010.pdf) (2010)."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Read Audio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve an audio file, load it into an array, and listen to it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bbb"
"import urllib\n",
"urllib.urlretrieve?\n",
"\n",
"from essentia.standard import MonoLoader\n",
"MonoLoader?\n",
"\n",
"from IPython.display import Audio\n",
"Audio?"
],
"language": "python",
"metadata": {},
"outputs": []
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"cell_type": "heading",
"level": 2,
"metadata": {},
"outputs": []
"source": [
"Extract Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract a set of features from the audio. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the [Essentia algorithm overview](http://essentia.upf.edu/documentation/algorithms_overview.html)."
"Write a script to list which audio slices (or audio files) were categorized as Cluster # 1. Do the same or Cluster # 2. Do the clusters make sense? Now, modify the script to play the audio slices that in each cluster - listening to the clusters will help us build intuition of what's in each cluster. \n",
"\n",
"Repeat this clustering (steps 3-7), and listening to the contents of the clusters with CongaGroove-mono.wav. \n",
"\n",
"Repeat this clustering (steps 3-7) using the CongaGroove and 3 clusters. Listen to the results. Try again with 4 clusters. Listen to the results. (etc, etc\u2026)\n",
"\n",
"Once you complete this, try out some of the many, many other audio loops in the audio loops. (Located In audio\\Miscellaneous Loops Samples and SFX)\n"
"Use `scatter` to plot features on a 2-D plane. (Choose two features at a time.)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"input": [
"scatter?"
],
"language": "python",
"metadata": {},
"outputs": []
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Cluster Using K-Means"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's add MFCCs to the mix. Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the \"0th\" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is started to get \"real world\"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Let's try it with the some other audio to truly appreciate the power of timbral clustering."
"Use `KMeans` to cluster your features and compute labels."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"input": [
"from sklearn.cluster import KMeans\n",
"KMeans?"
],
"language": "python",
"metadata": {},
"outputs": []
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Plot Features by Class Label"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"BONUS (ONLY IF YOU HAVE EXTRA TIME\u2026)\n",
"Now that we can take ANY LOOP, onset detect, feature extract, and cluster it, let's have some fun. \n",
"Choose any audio file from our collection and use the above techniques break it up into clusters. \n",
"Listen to those clusters.\n",
"\n",
"Some rules of thumb: since you need to pick the number of clusters ahead of time, listen to your audio files first. \n",
"You can break a drum kit or percussion loop into 3 - 6 clusters for it to segment well. More is OK too.\n",
"Musical loops: 3-6 clusters should work nicely. \n",
"Songs - lots of clusters for them to segment well. Try 'em out!\n",
"\n",
"BONUS (ONLY IF YOU REALLY HAVE EXTRA TIME\u2026)\n",
"Review your script that PLAYs all of the audio files that were categorized as Cluster # 1 or Cluster # 2. \n",
"Now, modify your script to play and plot the audio files which are closest to the center of your clusters.\n",
"Use `scatter`, but this time choose a different marker color (or type) for each class."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Listen to Clustered Frames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the `concatenated_signal` function from the previous exercise to concatenate frames from the same cluster into one signal. Then listen to the signal. Compare across separate classes. What do you hear?\n",
"\n",
"This hopefully provides you with which files are representative of your cluster. \n"
"You may want to do this for every frame, or only for onset-detected frames (using `essentia.standard.OnsetRate`)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Bonus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different number of clusters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"outputs": []
"source": [
"Use a different initialization method in `KMeans`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use different features. Compare tonal features against timbral features."