" #return array( [zcr(frame), centroid(spectrum(hamming_window(frame)))] ) # Try this too!\n",
" return array( [zcr(frame), energy(frame)] )\n",
"\n",
"features = array([compute_features(frame) for frame in FrameGenerator(simple_loop, frameSize=1024, hopSize=500)])\n",
"print features.shape"
],
],
"language": "python",
"language": "python",
"metadata": {},
"metadata": {},
...
@@ -109,95 +125,37 @@
...
@@ -109,95 +125,37 @@
"output_type": "stream",
"output_type": "stream",
"stream": "stdout",
"stream": "stdout",
"text": [
"text": [
"(260, 13)\n",
"(266, 2)\n"
"(260, 13)\n"
]
]
}
}
],
],
"prompt_number": 7
"prompt_number": 13
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Feature Scaling"
]
},
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"It's cluster time! We're using NETLAB's implementation of the kmeans algorithm. \n",
"Scale the features (using the scale function) from -1 to 1."
"\n",
"Use the kmeans algorithm to create clusters of your feature. kMeans will output 2 things of interest to you: \n",
"(1) The center-points of clusters. You can use the coordinates of the center of the cluster to measure the distance of any point from the center. This not only provides you with a distance metric of how \"good\" a point fits into a given cluster, but this allows you to sort by the points which are closest to the center of a given frame! Quite useful. \n",
"\n",
"(2) Each point will be assigned a label, or cluster #. You can then use this label to produce a transcription, do creative stuff, or further train another downstream classifier.\n",
"\n",
"Attention:\n",
"There are 2 functions called kmeans - one from the CATBox and another from Netlab. You should be using the one from Netlab. Verify that you are by typing which kmeans in your command line to verify...\n",
"\n",
"Here's the help function for kmeans: \n",
"\n",
"> help kmeans\n",
"\n",
" KMEANS\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Trains a k means cluster model.\n",
"Write a script to list which audio slices (or audio files) were categorized as Cluster # 1. Do the same or Cluster # 2. Do the clusters make sense? Now, modify the script to play the audio slices that in each cluster - listening to the clusters will help us build intuition of what's in each cluster. \n",
"Plot the features."
"\n",
"Repeat this clustering (steps 3-7), and listening to the contents of the clusters with CongaGroove-mono.wav. \n",
"\n",
"Repeat this clustering (steps 3-7) using the CongaGroove and 3 clusters. Listen to the results. Try again with 4 clusters. Listen to the results. (etc, etc\u2026)\n",
"\n",
"Once you complete this, try out some of the many, many other audio loops in the audio loops. (Located In audio\\Miscellaneous Loops Samples and SFX)"
"Let's add MFCCs to the mix. Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the \"0th\" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is started to get \"real world\"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Let's try it with the some other audio to truly appreciate the power of timbral clustering."
"Time to cluster! Let's initialize the algorithm to find three clusters."