Two indices are used to evaluate the accuracy of a test that predicts dichotomous outcomes (e.g. logistic regression) – sensitivity and specificity. They describe how well a test discriminates between cases with and without a certain condition.
Sensitivity - the proportion of true positives or the proportion of cases correctly identified by the test as meeting a certain condition (e.g. in mammography testing, the proportion of patients with cancer who test positive).
Specificity - the proportion of true negatives or the proportion of cases correctly identified by the test as not meeting a certain condition (e.g. in mammography testing, the proportion of patients without cancer who test negative).
-is a measure of a predictive model calculated as the ratio between the results obtained with and without the predictive model.
Choosing a Cut-off
The position of the cut-off determines the number of true positives, true negatives, false positives, and false negatives. As you increase your sensitivity (true positives) and can identify more cases with a certain condition, you also sacrifice accuracy on identifying those without the condition (specificity). This value (C) can be estimated by maximizing the index J
J=MAX(Sensitivity(C) + Specificity(C))
Receiver Operating Characateristic (ROC) Curve
A Receiver Operating Characteristic (ROC) curve is a graphical representation of the trade off between the false negative and false positive rates for every possible cut off. By tradition, the plot shows the false positive rate (1-specificity) on the X axis and the true positive rate (sensitivity or 1 - the false negative rate) on the Y axis.1 The accuracy of a test (i.e. the ability of the test to correctly classify cases with a certain condition and cases without the condition) is measured by the area under the ROC curve. An area of 1 represents a perfect test, while an area of .5 represents a worthless test. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test; the true positive rate is high and the false positive rate is low. Statistically, more area under the curve means that it is identifying more true positives while minimizing the number/percent of false positives
ods select parameterestimates association; proc logistic data=data1; model disease/n=age / outroc=roc1 roceps=0; output out=outp p=phat; ods output association=assoc; run; data _null_; set assoc; if label2='c' then call symput("area",cvalue2); title "area=&area"; proc gplot data=roc1; plot _sensit_*_1mspec_; run; quit; run;
It is important to use the ROCEPS=0 option in the MODEL statement of PROC LOGISTIC when you fit your model because this option allows all the unique predicted values to be output to the OUTROC= data set. Otherwise, the values may be rounded yielding fewer points on the ROC plot.