"Once we have computed results using our test set which can be compared against their ground truth labels, there are a number of evaluation methods to understand how well a classifier is performing, particularly when we want to compare different classifiers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The simplest evaluation measures to produce are the 'True Positive', 'False Positive' and 'False Negative' counts."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"\n",
"positive_label = 1 # The label that we call correct.\n",
"\n",
"# An example set of labels for our ground truth\n",
"With the true and false positive counts we can compute 'Precision', 'Recall' and 'F-Measure'. \n",
"Precision describes the ratio of correctly classified files against all which were classified as positive. That is, precision = #TP / (#TP + #FP). Hence:"
"Alternatively, often we want to understand the performance of a single classifier over a range of different threshold parameter settings. A \"Receiver Operating Characteristics\" (ROC) curve plots true positive rate vs. false positive rate for different parameter settings. This depicts the relative trade-offs between true positive (benefits) and false positive (costs) for each parameter value.\n",
"\n",
"Plotting this curve is made easy with scikit.learn's [ROC](http://scikit-learn.org/stable/modules/model_evaluation.html#receiver-operating-characteristic-roc) functions. It is however, restricted to binary classifications (i.e. snare vs. non-snare)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"from sklearn.metrics import roc_curve\n",
"roc_curve?"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 48
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# We can use roc_curve to produce the false and true positive rates for each example, given the labels and normalised scores. \n",
"# The scores represent the classifier's confidence of each classification.\n",
"# Since our classifications do not issue a confidence measure, we make them binary.\n",
"scores = (model_output == gt_labels) + 0.0\n",
"print scores\n",
"\n",
"# We indicate to roc_curve that the value 1 in gt_labels is our positive snare label.\n",
"The roc_auc_score function computes the area under the receiver operating characteristic (ROC) curve, which is also denoted by the acronyms AUC or AUROC. This summarises the behaviour of the system to one number."