Tips for model planning:

by Irina 12. March 2009 09:17

The modeling strategy in general involves three

stages:

(1) variable specification

(2) interaction assessment

(3) confounding assessment followed by consideration of precision

A few statistical issues needing attention when we build model.

These issues are

multicollinearity, multiple testing, and influential observations.

 

Multicollinearity occurs when one or more of the independent variables in the model can be approximately determined by some of the other independent variables. When there is multicollinearity, the estimated regression coefficients of the fitted model can be highly unreliable. Consequently, any modeling strategy must check for possible multicollinearity at various steps in the variable selection process.

 

Multiple testing

the more tests, the more likely significant findings, even if no real effects

• variable selection procedures may yield

an incorrect model because of multiple testing

 

Influential observations

• individual data may influence regression coefficients, e.g., outlier

• coefficients may change if outlier is dropped from analysis

 

A hierarchically well-formulated model is a model satisfying the following characteristic: Given any variable in the model, all lower-order components of the variable must also be contained in the model.

 

The Hierarchical Backward

Elimination Approach

The strategy called hierarchical backward because we are working backward

from our largest starting model to a smaller final and we are treating variables of different orders at different steps. For those terms that are retained at a given stage, there is a rule for identifying lower-order components that must also be retained in

any further models.

 

 

How logistic regression may be used to analyze matched data .

 

Matching is a procedure carried out at the design stage

of a study which compares two or more groups. To match, we select a referent group for our study that is to be compared with the group of primary interest, called the index group. Matching is accomplished by constraining the referent group to be comparable to the index group on one or more risk factors, called “matching factors.”

For example, if the matching factor is age, then matching on age would constrain the referent group to have essentially the same age structure as the index group.

 

The most popular method for matching is called category matching. This involves first categorizing each of the matching factors and then finding, for each case,

one or more controls from the same combined set of matching categories.

 

For example, if we are matching on age, race, and sex, we first categorize each of these three variables separately.

For each case, we then determine his or her age–race–sex combination. For instance, the case may be 52 years old, white, and female. We then find one or

more controls with the same age–race–sex combination.

 

If our study involves matching, we must decide on the number of controls to be chosen for each case. If we decide to use only one control for each case, we call

this one-to-one or pair-matching. If we choose R controls for each case, for example, R equals 4, then we call this R-to-1 matching.

 

The primary advantage for matching over random sampling without matching is that matching can often lead to a more statistically efficient analysis. In particular,

matching may lead to a tighter confidence interval, that is, more precision, around the odds or risk ratio being estimated than would be achieved without matching.

 

The major disadvantage to matching is that it can be costly, both in terms of the time and labor required to find appropriate matches and in terms of information

loss due to discarding of available controls not able to satisfy matching criteria. In fact, if too much information is lost from matching, it may be possible to lose

statistical efficiency by matching.

 

In deciding whether to match or not on a given factor, the safest strategy is to match only on strong risk factors expected to cause confounding in the data.

The analysis of matched data can be carried out using a stratified analysis in which the strata consist of the collection of matched sets.

 

 

Logistic regression can also account for matching in the analysis of data, using a special method called conditional logistic regression.

 The computer calculates odds ratios in much the same way as McNemar’s test, but the results are “conditioned” on the matching variables. Interpretation of matched odds ratios (MORs) using conditional logistic regression is the same as interpretation of matched odds ratios calculated from tables. A stratified conditional logistic model has the same flexibility as an unconditional model, yet can still take into account the correlation structure attributable to matching.

Exist a SAS macro that fits a conditional logistic regression model to matched or finely stratified data using the PHREG procedure .

Phreg macro

 

The following SAS code fits a conditional logistic regression model to matched case-control data.

proc phreg;

model time*case(0)=X1 X2 / ties=discrete;

strata set;


Here CASE refers to case-control status, with zero indicating the variable level for controls. TIME is a dummy variable in this application and should be coded so that all

cases and controls have the same nonzero value. X1 and X2 are the independent variables of interest. The variable SET is used in the STRATA statement to uniquely

define each matched set.

 

 

About the author

Irina Spivak Irina Spivak
Team Leader at G-Stat. More...


Send mail Email

Authors

Blogroll

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2010

    Sign in

    eXTReMe Tracker