"First, we want to analyze and feature extract a small collection of audio samples - storing their feature data as our \"training data\". The commands below read all of the drum example .wav files from the MIR web site into an array, `snare_file_list`. \n",
"\n",
"Let's define a function to retrieve a list of URLs from a text file."
" \"\"\"Read a list of files to process from the text file at corpusURL. Return a list of URLs\"\"\" \n",
" # Open and read each line\n",
" url_list_text_data = urllib2.urlopen(corpus_URL) # it's a file like object and works just like a file\n",
" for file_URL in url_list_text_data: # files are iterable\n",
" yield file_URL.rstrip()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Play a \"beep\" for each detected snare drum."
"Use these commands to read in a list of filenames (samples) in a directory, replacing the URL with a URL to a list of URLs (one per line) indicating where the audio / drum samples are stored."
"kick_file_list = [audio_file_URL for audio_file_URL in process_corpus(kicks_URL)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read a training set of drum samples. For each test signal, extract MFCCs, and use `mean` to obtain one MFCC vector per signal.\n",
"\n",
"Train a K-NN classifier using test signals. When training, discard the 0th MFCC coefficient, because it only represents the energy in the frame and does not add any discriminative power. \n",
"\n",
"\n",
"\n",
"For each segment in the test audio signal, feed it into the trained K-NN classifier, and save the label."
"To access the filenames contained in the array, use the square brackets [ ] to get to the element that you want to access. For example, to access the text URL file name of the first file in the list, you would type:"
"In addition to the MFCCs, extract the following features:\n",
"When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them. Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes the audio file. Note that the only information that we retain in memory are the features that are extracted.\n",
"\n",
"- spectral centroid\n",
"- spectral spread\n",
"- spectral skewness\n",
"- spectral kurtosis. "
"Create a loop which reads in an audio file, extracts the zero crossing rate, and some spectral statistics. You can use the \"in\" operator to retrieve each audio file URL from process_corpus(), as used above. The feature information for each audio file (the \"feature vector\") should be stored as a feature array, with columns being the features and rows for each file. For example:"
"Read a training set of drum samples. For each test signal, extract MFCCs, and use `mean` to obtain one MFCC vector per signal.\n",
"\n",
"Train a K-NN classifier using test signals. When training, discard the 0th MFCC coefficient, because it only represents the energy in the frame and does not add any discriminative power. \n",
"\n",
"\n",
"\n",
"For each segment in the test audio signal, feed it into the trained K-NN classifier, and save the label.\n",
"\n",
"4. First, extract all of the feature data for the kick drums and store it in a feature array. (For my example, above, I'd put it in \"features_kick\")\n",
"\n",
"5. Next, extract all of the feature data for the snares, storing them in a different array. \n",
"Again, the kick and snare features should be separated in two different arrays!\n",
"For classification, we're going to be using the new features in our arsenal: cherishing those \"spectral moments\" (centroid, bandwidth, skewness, kurtosis) and also examining other spectral statistics."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Training Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we want to analyze and feature extract a small collection of audio samples - storing their feature data as our \"training data\". The commands below read all of the drum example .wav files from the MIR web site into an array, `snare_file_list`. \n",
"\n",
"Let's define a function to retrieve a list of URLs from a text file."
"*Moments* is a term used in physics and statistics. There are raw moments and central moments. The first raw moment is known as the mean. The second central moment is known as the variance."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urllib2\n",
"\n",
"def process_corpus(corpus_URL):\n",
" \"\"\"Read a list of files to process from the text file at corpusURL. Return a list of URLs\"\"\" \n",
" # Open and read each line\n",
" url_list_text_data = urllib2.urlopen(corpus_URL) # it's a file like object and works just like a file\n",
" for file_URL in url_list_text_data: # files are iterable\n",
" yield file_URL.rstrip()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Use these commands to read in a list of filenames (samples) in a directory, replacing the URL with a URL to a list of URLs (one per line) indicating where the audio / drum samples are stored."
"kick_file_list = [audio_file_URL for audio_file_URL in process_corpus(kicks_URL)]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To access the filenames contained in the array, use the square brackets [ ] to get to the element that you want to access. For example, to access the text URL file name of the first file in the list, you would type:"
"To compute the spectral centroid, we will use [`essentia.standard.Centroid`](http://essentia.upf.edu/documentation/reference/std_Centroid.html):"
"When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them. Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes the audio file. Note that the only information that we retain in memory are the features that are extracted.\n",
"\n",
"Create a loop which reads in an audio file, extracts the zero crossing rate, and some spectral statistics. You can use the \"in\" operator to retrieve each audio file URL from process_corpus(), as used above. The feature information for each audio file (the \"feature vector\") should be stored as a feature array, with columns being the features and rows for each file. For example:"
" For example, when populated, features_snare might look like:\n",
" \n",
" features_snare =\n",
"\n",
" 0.5730 1.9183 2.9713 0.0004 0.0002\n",
" 0.4750 1.4834 2.4463 0.0004 0.0012\n",
" 0.5900 2.2857 3.1788 0.0003 0.0041\n",
" 0.5090 1.6622 2.6369 0.0004 0.0051\n",
" 0.4860 1.4758 2.2085 0.0004 0.0021\n",
" 0.6060 2.2119 3.2798 0.0004 0.0651\n",
" 0.4990 2.0607 2.7654 0.0004 0.0721\n",
" 0.6360 2.3153 3.0256 0.0003 0.0221\n",
" 0.5490 2.0137 3.0342 0.0004 0.0016\n",
" 0.5900 2.2857 3.1788 0.0003 0.0012\n",
" \n",
" Within your loop, here's a reminder how to read in your wav files, using an array of audio file URLs:"
"This value is normalized between 0 and 1. If 0, then the centroid is at zero. If 1, then the centroid is all the way to the \"right\", i.e., equal to `fs/2`, the Nyquist frequency, or the highest frequency a digital signal can possibly have."
"Here's an example of how to feature extract the first frame from the current audio file, using Essentia's [ZeroCrossingRate](http://essentia.upf.edu/documentation/reference/streaming_ZeroCrossingRate.html) and [CentralMoments](http://essentia.upf.edu/documentation/reference/std_CentralMoments.html) classes..."
"To compute the spectral spread, skewness, and kurtosis, we use [`essentia.standard.DistributionShape`](http://essentia.upf.edu/documentation/reference/std_DistributionShape.html):"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Remember the zeroth column of features_snare is the ZCR.\n",
"4. First, extract all of the feature data for the kick drums and store it in a feature array. (For my example, above, I'd put it in \"features_kick\")\n",
"\n",
"5. Next, extract all of the feature data for the snares, storing them in a different array. \n",
"Again, the kick and snare features should be separated in two different arrays!\n",