Commit 07be62ec authored by Leigh Smith's avatar Leigh Smith

First version of python corpus retrieval and feature extraction

parent 4f1fe9d1
{
"metadata": {
"name": "",
"signature": "sha256:98b053589d06e625718d1e1c0a2b65a02f9098813c4b80e061bb5732bb95b293"
"signature": "sha256:7ed94cc2b18891285f61c97615ead2cb2f265d2e6a8d305e44280affbff5a673"
},
"nbformat": 3,
"nbformat_minor": 0,
......@@ -19,7 +19,7 @@
"\n",
"As in Lab 1, extract features from each training sample in the kick and snare drum directories.\n",
"\n",
"1. Train a K-NN model using the kick and snare drum samples."
"Train a K-NN model using the kick and snare drum samples:"
]
},
{
......@@ -59,14 +59,6 @@
"\n",
"Good luck!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
......
{
"metadata": {
"name": "",
"signature": "sha256:c5ceab4dd15cd4c672761897b42ca810c7e32f1ca641cc1b0ee141dc0df0a57f"
"signature": "sha256:efc8cbbea3ba5f9eea210026cb9f5e9a85fa6b79bca7393a29e27e7810eab0db"
},
"nbformat": 3,
"nbformat_minor": 0,
......@@ -15,38 +15,107 @@
"Section 2: Spectral Features & k-NN\n",
"------------------------------------\n",
"\n",
"My first audio classifier: introducing K-NN! We can now appreciate why we need additional intelligence in our systems - heuristics can't very far in the world of complex audio signals. We'll be using Netlab's implementation of the k-NN for our work here. It proves be a straight-forward and easy to use implementation. The steps and skills of working with one classifier will scale nicely to working with other, more complex classifiers. \n",
"My first audio classifier: introducing K-NN! We can now appreciate why we need additional intelligence in our systems - heuristics can't go very far in the world of complex audio signals. We'll be using scikit.learn's implementation of the k-NN for our work here. It proves be a straight-forward and easy to use implementation. The steps and skills of working with one classifier will scale nicely to working with other, more complex classifiers. \n",
"\n",
"We're also going to be using the new features in our arsenal: cherishing those \"spectral moments\" (centroid, bandwidth, skewness, kurtosis) and also examining other spectral statistics. \n",
" \n",
"### TRAINING DATA\n",
"\n",
"First off, we want to analyze and feature extract a small collection of audio samples - storing their feature data as our \"training data\". The below commands read all of the .wav files in a directory into a structure, snareFileList. \n",
"First off, we want to analyze and feature extract a small collection of audio samples - storing their feature data as our \"training data\". The commands below read all of the drum example .wav files from the MIR web site into an array, snareFileList. \n",
"\n",
"1. Use these commands to read in a list of filenames (samples) in a directory, replacing the path with the actual directory that the audio \\ drum samples are stored in.\n",
"\n",
" snareDirectory = ['/usr/ccrma/courses/mir2013/audio/drum samples/snares/'];\n",
" snareFileList = getFileNames(snareDirectory ,'wav')\n",
"\n",
" kickDirectory = ['/usr/ccrma/courses/mir2013/audio/drum samples/kicks/'];\n",
" kickFileList = getFileNames(kickDirectory ,'wav')\n",
"\n",
"2. To access the filenames contained in the cell array, use the brackets { } to get to the element that you want to access. \n",
"\n",
" For example, to access the text file name of the 1st file in the list, you would type:\n",
"\n",
" snareFileList{1}\n",
"First we define a function to retrieve a list of URLs from a text file."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urllib2\n",
"\n",
"def process_corpus(corpus_URL):\n",
" \"\"\"Read a list of files to process from the text file at corpusURL. Return a list of URLs\"\"\" \n",
" # Open and read each line\n",
" url_list_text_data = urllib2.urlopen(corpus_URL) # it's a file like object and works just like a file\n",
" for file_URL in url_list_text_data: # files are iterable\n",
" yield file_URL.rstrip()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use these commands to read in a list of filenames (samples) in a directory, replacing the path with the actual directory that the audio / drum samples are stored in."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"snares_URL = \"https://ccrma.stanford.edu/workshops/mir2014/SnareCorpus.txt\"\n",
"snare_file_list = [audio_file_URL for audio_file_URL in process_corpus(snares_URL)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"kicks_URL = \"https://ccrma.stanford.edu/workshops/mir2014/KickCorpus.txt\"\n",
"kick_file_list = [audio_file_URL for audio_file_URL in process_corpus(kicks_URL)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To access the filenames contained in the array, use the square brackets [ ] to get to the element that you want to access. \n",
"\n",
" When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them. Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes the audio file. Note that the only information that we retain in memory are the features that are extracted.\n",
"For example, to access the text URL file name of the first file in the list, you would type:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"snare_file_list[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"'https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_01_01.WAV'"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them. Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes the audio file. Note that the only information that we retain in memory are the features that are extracted.\n",
"\n",
"3. Create a loop which reads in an audio file, extracts the zero crossing rate, and some spectral statistics. The feature information for each audio file (the \"feature vector\") should be stored as a feature array, with columns being the features and rows for each file. \n",
"1. Create a loop which reads in an audio file, extracts the zero crossing rate, and some spectral statistics. You can use the \"in\" operator to retrieve each audio file URL from process_corpus(), as used above. The feature information for each audio file (the \"feature vector\") should be stored as a feature array, with columns being the features and rows for each file. \n",
" \n",
" Or in Matlab, for example:\n",
" for example:\n",
"\n",
" featuresSnare =\n",
"\n",
" 1.0e+003 *\n",
" \n",
" 0.5730 1.9183 2.9713 0.0004 0.0002\n",
" 0.4750 1.4834 2.4463 0.0004 0.0012\n",
" 0.5900 2.2857 3.1788 0.0003 0.0041\n",
......@@ -58,23 +127,84 @@
" 0.5490 2.0137 3.0342 0.0004 0.0016\n",
" 0.5900 2.2857 3.1788 0.0003 0.0012\n",
" \n",
" In your loop, here's how to read in your wav files, using a structure of file names:\n",
" [x,fs]=wavread([snareDirectory snareFileList{i}]); %note the use of brackets for snareFileList\n",
" \n",
" Here's an example of how to feature extract for the current audio file..\n",
" frameSize = 0.100 * fs; % 100ms\n",
" currentFrame = x(1:frameSize)\n",
" featuresSnare(i,1) = zcr(currentFrame);\n",
" [centroid, bandwidth, skew, kurtosis]=spectralMoments(currentFrame,fs,8192)\n",
" featuresSnare(i,2:5) = [centroid, bandwidth, skew, kurtosis];\n",
" \n",
" Within your loop, here's a reminder how to read in your wav files, using an array of audio file URLs:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from essentia.standard import MonoLoader\n",
"file_index = 0\n",
"audio = MonoLoader(filename = snare_file_list[file_index])()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "RuntimeError",
"evalue": "Error while configuring MonoLoader: AudioLoader: Could not open file \"https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_01_01.WAV\"",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-17-fdc46de435f1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0messentia\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstandard\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mMonoLoader\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mfile_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0maudio\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mMonoLoader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msnare_file_list\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mfile_index\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/usr/local/lib/python2.7/site-packages/essentia/standard.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, **kwargs)\u001b[0m\n\u001b[1;32m 41\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;31m# configure the algorithm\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 43\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconfigure\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 44\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mconfigure\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/local/lib/python2.7/site-packages/essentia/standard.py\u001b[0m in \u001b[0;36mconfigure\u001b[0;34m(self, **kwargs)\u001b[0m\n\u001b[1;32m 56\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconvertedVal\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 57\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 58\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__configure__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 59\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 60\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcompute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mRuntimeError\u001b[0m: Error while configuring MonoLoader: AudioLoader: Could not open file \"https://ccrma.stanford.edu/workshops/mir2014/audio/drum%20samples/snares/SNARE_01_01.WAV\""
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an example of how to feature extract the first from for the current audio file..."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
" frameSize = 0.100 * sample_rate # 100ms\n",
" currentFrame = audio[0 : frameSize]\n",
" featuresSnare[i, 0] = zcr(currentFrame)\n",
" [centroid, bandwidth, skew, kurtosis] = spectralMoments(currentFrame, sample_rate, 8192)\n",
" featuresSnare[i, 1:4] = [centroid, bandwidth, skew, kurtosis]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'sample_rate' is not defined",
"output_type": "pyerr",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-18-4b1c17fe2ac8>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mframeSize\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.100\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0msample_rate\u001b[0m \u001b[0;31m# 100ms\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mcurrentFrame\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0maudio\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m \u001b[0;34m:\u001b[0m \u001b[0mframeSize\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mfeaturesSnare\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mzcr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcurrentFrame\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mcentroid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbandwidth\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mskew\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkurtosis\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mspectralMoments\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcurrentFrame\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msample_rate\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m8192\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mfeaturesSnare\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mcentroid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbandwidth\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mskew\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkurtosis\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'sample_rate' is not defined"
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. First, extract all of the feature data for the kick drums and store it in a feature array. (For my example, above, I'd put it in \"featuresKick\")\n",
"\n",
"5. Next, extract all of the feature data for the snares, storing them in a different array. \n",
"Again, the kick and snare features should be separated in two different arrays!\n",
" \n",
" OK, no more help. The rest is up to you! \n",
"\n",
"OK, no more help. The rest is up to you!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building Models\n",
"\n",
"1. Examine the feature array for the various snare samples. What do you notice? \n",
......
{
"metadata": {
"name": "",
"signature": "sha256:eb40e34116f231d0e2616b21dbd3eaac2494c326f9b61a67dc99f2bd3b9cb7ba"
"signature": "sha256:6fdc1ab1e764720916a430a54d66054f9d7b5acbd257f42c0315df27b32abc92"
},
"nbformat": 3,
"nbformat_minor": 0,
......@@ -126,14 +126,6 @@
"\n",
" Which output contains more noise? Why?\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment