"import numpy, scipy, matplotlib.pyplot as plt, sklearn, stanford_mir\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[← Back to Index](index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cross Validation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://en.wikipedia.org/wiki/Cross-validation_(statistics)\">K-fold cross validation</a> is a method for evaluating the correctness of a classifier.\n",
"\n",
"For example, with 10-fold cross validation:\n",
"\n",
"1. Divide the data set into 10 random partitions.\n",
"2. Choose one of the partitions as the test set. Train on the other nine partitions.\n",
"3. Repeat for the partitions.\n",
"\n",
"Why cross validation is good?\n",
"* In K-fold cross validation, evaluation on models can be done K times, but each time on a different partition of the data. \n",
"* It can be used to tune parameters and to choose the best model and/or features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load some features from ten kick drums and ten snare drums:\n",
"* training_features is a 2 dimensional vector with zero crossing rate and spectral centroids as features of drum samples."
"Since value of K has been arbitrarily chosen, we do not know whether it was the best choice (although here in this example, we have a perfect score anyway...). \n",
"Therefore, testing the result with several other values will help to choose the best parameter."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[K=1] Accuracy=1.000\n",
"[K=2] Accuracy=1.000\n",
"[K=3] Accuracy=1.000\n",
"[K=4] Accuracy=1.000\n",
"[K=5] Accuracy=1.000\n"
]
}
],
"source": [
"K_choices = [1,2,3,4,5]\n",
"for k in K_choices:\n",
" model = sklearn.neighbors.KNeighborsClassifier(n_neighbors=k)\n",