"Unsupervised Instrument Classification Using K-Means "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lab is loosely based on [Lab 3](https://ccrma.stanford.edu/workshops/mir2010/Lab3_2010.pdf) (2010)."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Read Audio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve an audio file, load it into an array, and listen to it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urllib\n",
"urllib.urlretrieve?\n",
"\n",
"from essentia.standard import MonoLoader\n",
"MonoLoader?\n",
"\n",
"from IPython.display import Audio\n",
"Audio?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Extract Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract a set of features from the audio. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the [Essentia algorithm overview](http://essentia.upf.edu/documentation/algorithms_overview.html)."
"Use `scatter` to plot features on a 2-D plane. (Choose two features at a time.)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scatter?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Cluster Using K-Means"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `KMeans` to cluster your features and compute labels."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn.cluster import KMeans\n",
"KMeans?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Plot Features by Class Label"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `scatter`, but this time choose a different marker color (or type) for each class."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Listen to Clustered Frames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the `concatenated_signal` function from the previous exercise to concatenate frames from the same cluster into one signal. Then listen to the signal. Compare across separate classes. What do you hear?\n",
"\n",
"You may want to do this for every frame, or only for onset-detected frames (using `essentia.standard.OnsetRate`)."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Bonus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different number of clusters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a different initialization method in `KMeans`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use different features. Compare tonal features against timbral features."
"# Unsupervised Instrument Classification Using K-Means "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This lab is loosely based on [Lab 3](https://ccrma.stanford.edu/workshops/mir2010/Lab3_2010.pdf) (2010)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read Audio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve an audio file, load it into an array, and listen to it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"urllib.urlretrieve?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.load?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"IPython.display.Audio?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Detect Onsets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Detect onsets in the audio signal:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.onset.onset_detect?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert the onsets from units of frames to seconds (and samples):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.frames_to_time?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"librosa.frames_to_samples?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Listen to detected onsets:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mir_eval.sonify.clicks?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"IPython.display.Audio?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract a set of features from the audio at each onset. Use any of the features we have learned so far: zero crossing rate, spectral moments, MFCCs, chroma, etc. For more, see the [librosa API reference](http://bmcfee.github.io/librosa/index.html)."
"Use the `concatenated_segments` function from the [feature sonification exercise](feature_sonification.html) to concatenate frames from the same cluster into one signal. Then listen to the signal. "