The ROC curve
ROC stands for receiver operating characteristic and is a curve first designed and used at the times of World War I for radar signals purposes. It is a curve used to validate the performance of a binary classifier which depends on a threshold parameter when it varies.
The ROC plots the true positive rate against the false positive rate at each varying value of the parameter, also called threshold. See the note on the performance metrics for classification for a description of those. In this plane, the point (0, 1) represent the perfect classification; the diagonal line shows what a random guesser (a coin flip) would give, so that points above it are good results, points below are poorly classified results.
Discussion borrowed from Wikipedia. Let's say that we got a continuous random variable
and a binary classifier which depends on a threshold
, so that
yields a "positive" (1) classification and
yields a "negative" (0) classification. The pdf of
if the point is actually positive and
if the point is actually negative, so we can write
(the last one because above
the point is classed as positive but it is negative, so follows
). The ROC curve, plots TPR(T) versus FPR(T) as a parametric function of
. At the same time, the other two metrics which quantify the performance of the classification can be expressed as
(the last one because below T the point is classed as negative but it is actually positive so it follows
The area under the curve (AUC) tells us how fast the curve grows (how convex it is), quantifying how good its performance is. In a typical use case, different classifiers will be evaluated by comparing their AUCs.