Using the ROC Curve to Measure Sensitivity & Specificity

by Irina 16. October 2007 11:58


Two indices are used to evaluate the accuracy of a test that predicts dichotomous outcomes (e.g. logistic regression) – sensitivity and specificity. They describe how well a test discriminates between cases with and without a certain condition.

Sensitivity - the proportion of true positives or the proportion of cases correctly identified by the test as meeting a certain condition (e.g. in mammography testing, the proportion of patients with cancer who test positive).

Specificity - the proportion of true negatives or the proportion of cases correctly identified by the test as not meeting a certain condition (e.g. in mammography testing, the proportion of patients without cancer who test negative).

The lift -is a measure of a predictive model calculated as the ratio between the results obtained with and without the predictive model.


Choosing a Cut-off

The position of the cut-off determines the number of true positives, true negatives, false positives, and false negatives. As you increase your sensitivity (true positives) and can identify more cases with a certain condition, you also sacrifice accuracy on identifying those without the condition (specificity). This value (C) can be estimated by maximizing the index J

J=MAX(Sensitivity(C) + Specificity(C))

Receiver Operating Characateristic (ROC) Curve

A Receiver Operating Characteristic (ROC) curve is a graphical representation of the trade off between the false negative and false positive rates for every possible cut off. By tradition, the plot shows the false positive rate (1-specificity) on the X axis and the true positive rate (sensitivity or 1 - the false negative rate) on the Y axis.1 The accuracy of a test (i.e. the ability of the test to correctly classify cases with a certain condition and cases without the condition) is measured by the area under the ROC curve. An area of 1 represents a perfect test, while an area of .5 represents a worthless test. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test; the true positive rate is high and the false positive rate is low. Statistically, more area under the curve means that it is identifying more true positives while minimizing the number/percent of false positives

  ods select parameterestimates association;
    proc logistic data=data1;
       model disease/n=age / outroc=roc1 roceps=0;
       output out=outp p=phat;
       ods output association=assoc;
       run;
        data _null_;
        set assoc;
        if label2='c' then call symput("area",cvalue2);
        title "area=&area";

        proc gplot data=roc1; 
        plot _sensit_*_1mspec_; 

        run; 
        quit; 
       run;

It is important to use the ROCEPS=0 option in the MODEL statement of PROC LOGISTIC when you fit your model because this option allows all the unique predicted values to be output to the OUTROC= data set. Otherwise, the values may be rounded yielding fewer points on the ROC plot.

About the author

Irina Spivak Irina Spivak
Team Leader at G-Stat. More...


Send mail Email

Authors

Blogroll

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2010

    Sign in

    eXTReMe Tracker