"At each onset, extract one 100-ms segment from the audio signal.\n",
"\n",
"For each segment, compute the MFCCs.\n",
"\n",
"Read a training set of drum samples. For each test signal, extract MFCCs, and use `mean` to obtain one MFCC vector per signal.\n",
"\n",
"Train a K-NN classifier using test signals. When training, discard the 0th MFCC coefficient, because it only represents the energy in the frame and does not add any discriminative power. \n",
"\n",
"\n",
"\n",
"For each segment in the test audio signal, feed it into the trained K-NN classifier, and save the label.\n",
"\n",
"Play a \"beep\" for each detected kick drum.\n",
"\n",
"Play a \"beep\" for each detected snare drum.\n",
"\n",
"In addition to the MFCCs, extract the delta-MFCCs. Re-train the classifier, and re-run the classifier over the test audio signal. Do the results change?\n",
"Purpose: To gain an understanding of feature extraction, windowing, MFCCs.\n",
"\n",
"SECTION 1 SEGMENTING INTO EVERY N ms FRAMES\n",
"-------------------------------------------\n",
"\n",
"Segmenting: Chopping up into frames every N seconds\n",
"\n",
"Previously, we've either chopped up a signal by the location of it's onsets (and taking the following 100 ms) or just analyzing the entire file. \n",
"Analyzing the audio file by \"frames\" is another technique for your arsenal that is good for analyzing entire songs, phrases, or non-onset-based audio examples.\n",
"You easily chop up the audio into frames every, say, 100ms, with a for loop. \n",
"\n",
" frameSize = 0.100 * fs; % 100ms\n",
" for i = 1: frameSize : (length(x)-frameSize+1) \n",
" currentFrame = x(i:i+frameSize-1); % this is the current audio frame \n",
" % Now, do your feature extraction here and store the features in some matrix / array\n",
" end\n",
"\n",
"Very often, you will want to have some overlap between the audio frames - taking an 100ms long frame but sliding it 50 ms each time. To do a 100ms frame and have it with 50% overlap, try: \n",
"\n",
" frameSize = 0.100 * fs; % 100ms\n",
" hop = 0.5; % 50%overlap\n",
" for i = 1: hop * frameSize : (length(x)-frameSize+1) \n",
" ...\n",
" end\n",
"\n",
"Note that it's also important to multiple the signal by a window (e.g., Hamming / Hann window) equal to the frame size to smoothly transition between the frames. \n",
"\n",
"SECTION 2 MFCC\n",
"--------------\n",
"\n",
"Load an audio file of your choosing from the audio folder on `/usr/ccrma/courses/mir2012/audio`.\n",
"Use this as an opportunity to explore this collection.\n",
"\n",
"BAG OF FRAMES\n",
"\n",
"Test out MFCC to make sure that you know how to call it. We'll use the CATbox implementation of MFCC.\n",
"\n",
" currentFrameIndex = 1; \n",
" for i = 1: frameSize : (length(x)-frameSize+1)\n",
" currentFrame = x(i:i+frameSize-1) + eps ; % this is the current audio frame\n",
" % Note that we add EPS to prevent divide by 0 errors % Now, do your other feature extraction here \n",
" % The code generates MFCC coefficients for the audio signal given in the current frame.\n",
" [mfceps] = mfcc(currentFrame ,fs)' ; %note the transpose operator!\n",
" features = [MFCC_mean MFCC_delta_mean ]; % In this case, we'll only store the MFCC and delta-MFCC means\n",
" % NOTE: You might want to toss out the FIRST MFCC coefficient and delta-coefficient since it's much larger than \n",
" others and only describes the total energy of the signal.\n",
"\n",
"You can include this code inside of your frame-hopping loop to extract the MFCC-values for each frame. \n",
"\n",
"Once MFCCs per frame have been calculated, consider how they can be used as features for expanding the k-NN classification and try implementing it!\n",
"\n",
"Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the \"0th\" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is starting to get \"real world\"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Try it with the some other audio to truly appreciate the power of timbral classification. "