Commit 4f1fe9d1 authored by Leigh Smith's avatar Leigh Smith

Factored out cross validation to it's own notebook

parent 07109be5
{
"metadata": {
"name": "",
"signature": "sha256:92cb9f5bc7783db462ab80b4382c5ba9e78c9d448ea4751fd09cfdd52648a0b7"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"CROSS VALIDATION\n",
"----------------\n",
"\n",
"You'll need some of this code and information to calculate your accuracy rate on your classifiers.\n",
"\n",
"EXAMPLE\n",
"\n",
"Let's say we have 10-fold cross validation...\n",
"\n",
"1. Divide test set into 10 random subsets.\n",
"2. 1 test set is tested using the classifier trained on the remaining 9.\n",
"3. We then do test/train on all of the other sets and average the percentages. \n",
"\n",
"To achieve the first step (divide our training set into k disjoint subsets), use the function [Kfold](http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html) in the scikit.learn cross_validation package.\n",
"\n",
" K-Folds cross validation iterator.\n",
" Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds (without shuffling).\n",
"\n",
" You can visit the scikit.learn documentation to look at all the other options. This code is also posted as a template in \n",
" `/usr/ccrma/courses/mir2014/Toolboxes/crossValidationTemplate.py` "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"from sklearn import cross_validation\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn import preprocessing\n",
"\n",
"def crossValidateKNN(features, labels):\n",
" \"\"\"\n",
" This code is provided as a template for your cross-validation\n",
" computation. Pass into the variables \"features\", \"labels\" your own data. \n",
"\n",
" As well, you can replace the code in the \"BUILD\" and \"EVALUATE\" sections\n",
" to be useful with other types of Classifiers.\n",
" \"\"\"\n",
" #\n",
" # CROSS VALIDATION \n",
" # The features array is arranged as rows of instances, columns of features in our training set.\n",
" numInstances, numFeatures = features.shape\n",
" numFolds = min(10, numInstances) # how many cross-validation folds do you want - (default=10)\n",
" # divide test set into 10 random subsets\n",
" indices = cross_validation.KFold(numInstances, n_folds = numFolds)\n",
"\n",
" errors = np.empty(numFolds)\n",
" for foldIndex, (train_index, test_index) in enumerate(indices):\n",
" # SEGMENT DATA INTO FOLDS\n",
" print('Fold: %d' % foldIndex) \n",
" print(\"TRAIN: %s\" % train_index)\n",
" print(\"TEST: %s\" % test_index)\n",
" \n",
" # SCALE\n",
" scaler = preprocessing.MinMaxScaler(feature_range = (-1, 1))\n",
" trainingFeatures = scaler.fit_transform(features.take(train_index, 0))\n",
" # BUILD NEW MODEL - ADD YOUR MODEL BUILDING CODE HERE...\n",
" model = KNeighborsClassifier(n_neighbors = 3)\n",
" model.fit(trainingFeatures, labels.take(train_index, 0))\n",
" # RESCALE TEST DATA TO TRAINING SCALE SPACE\n",
" testingFeatures = scaler.transform(features.take(test_index, 0))\n",
" # EVALUATE WITH TEST DATA - ADD YOUR MODEL EVALUATION CODE HERE\n",
" model_output = model.predict(testingFeatures)\n",
" print(\"KNN prediction %s\" % model_output) # Debugging.\n",
" # CONVERT labels(test,:) LABELS TO SAME FORMAT TO COMPUTE ERROR \n",
" labels_test = labels.take(test_index, 0)\n",
" # COUNT ERRORS. matches is a boolean array, taking the mean does the right thing.\n",
" matches = model_output != labels_test\n",
" errors[foldIndex] = matches.mean()\n",
" print('cross validation error: %f' % errors.mean())\n",
" print('cross validation accuracy: %f' % (1.0 - errors.mean()))"
],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
\ No newline at end of file
{ {
"metadata": { "metadata": {
"name": "", "name": "",
"signature": "sha256:fac5244635d1c30aeb93ebb9a63a2dd73149e822761ae77c22152ff7a2bf9170" "signature": "sha256:742424805f9e3dc3a107463a6b7ec768e0c79c45502dfb2ac8397c9cdd9d702b"
}, },
"nbformat": 3, "nbformat": 3,
"nbformat_minor": 0, "nbformat_minor": 0,
...@@ -75,95 +75,8 @@ ...@@ -75,95 +75,8 @@
"\n", "\n",
"Once MFCCs per frame have been calculated, consider how they can be used as features for expanding the k-NN classification and try implementing it!\n", "Once MFCCs per frame have been calculated, consider how they can be used as features for expanding the k-NN classification and try implementing it!\n",
"\n", "\n",
"Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the \"0th\" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is starting to get \"real world\"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Try it with the some other audio to truly appreciate the power of timbral classification. \n", "Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the \"0th\" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is starting to get \"real world\"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Try it with the some other audio to truly appreciate the power of timbral classification. "
"\n",
"\n",
"\n",
"SECTION 3 CROSS VALIDATION\n",
"--------------------------\n",
"\n",
"You'll need some of this code and information to calculate your accuracy rate on your classifiers.\n",
"\n",
"EXAMPLE\n",
"\n",
"Let's say we have 10-fold cross validation...\n",
"\n",
"1. Divide test set into 10 random subsets.\n",
"2. 1 test set is tested using the classifier trained on the remaining 9.\n",
"3. We then do test/train on all of the other sets and average the percentages. \n",
"\n",
"To achieve the first step (divide our training set into k disjoint subsets), use the function [Kfold](http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html) in the scikit.learn cross_validation package.\n",
"\n",
" K-Folds cross validation iterator.\n",
" Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds (without shuffling).\n",
"\n",
" You can visit the scikit.learn documentation to look at all the other options. This code is also posted as a template in \n",
" `/usr/ccrma/courses/mir2014/Toolboxes/crossValidationTemplate.py`\n",
" "
] ]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"from sklearn import cross_validation\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn import preprocessing\n",
"\n",
"def crossValidateKNN(features, labels):\n",
" \"\"\"\n",
" This code is provided as a template for your cross-validation\n",
" computation. Pass into the variables \"features\", \"labels\" your own data. \n",
"\n",
" As well, you can replace the code in the \"BUILD\" and \"EVALUATE\" sections\n",
" to be useful with other types of Classifiers.\n",
" \"\"\"\n",
" #\n",
" # CROSS VALIDATION \n",
" # The features array is arranged as rows of instances, columns of features in our training set.\n",
" numInstances, numFeatures = features.shape\n",
" numFolds = min(10, numInstances) # how many cross-validation folds do you want - (default=10)\n",
" # divide test set into 10 random subsets\n",
" indices = cross_validation.KFold(numInstances, n_folds = numFolds)\n",
"\n",
" errors = np.empty(numFolds)\n",
" for foldIndex, (train_index, test_index) in enumerate(indices):\n",
" # SEGMENT DATA INTO FOLDS\n",
" print('Fold: %d' % foldIndex) \n",
" print(\"TRAIN: %s\" % train_index)\n",
" print(\"TEST: %s\" % test_index)\n",
" \n",
" # SCALE\n",
" scaler = preprocessing.MinMaxScaler(feature_range = (-1, 1))\n",
" trainingFeatures = scaler.fit_transform(features.take(train_index, 0))\n",
" # BUILD NEW MODEL - ADD YOUR MODEL BUILDING CODE HERE...\n",
" model = KNeighborsClassifier(n_neighbors = 3)\n",
" model.fit(trainingFeatures, labels.take(train_index, 0))\n",
" # RESCALE TEST DATA TO TRAINING SCALE SPACE\n",
" testingFeatures = scaler.transform(features.take(test_index, 0))\n",
" # EVALUATE WITH TEST DATA - ADD YOUR MODEL EVALUATION CODE HERE\n",
" model_output = model.predict(testingFeatures)\n",
" print(\"KNN prediction %s\" % model_output) # Debugging.\n",
" # CONVERT labels(test,:) LABELS TO SAME FORMAT TO COMPUTE ERROR \n",
" labels_test = labels.take(test_index, 0)\n",
" # COUNT ERRORS. matches is a boolean array, taking the mean does the right thing.\n",
" matches = model_output != labels_test\n",
" errors[foldIndex] = matches.mean()\n",
" print('cross validation error: %f' % errors.mean())\n",
" print('cross validation accuracy: %f' % (1.0 - errors.mean()))\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
} }
], ],
"metadata": {} "metadata": {}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment