Relative importance of explanatory variables

by Irina 26. January 2008 13:02

by David Firth

Question: What are the most important variables in regression?

Important for what? Without a criterion for importance, the inquiries are meaningless.' Goes on to distinguish three notions of importance, namely :

  1. `theoretical importance' (measured by βΧ)  
  2. `level importance' (measured by βΧ *  μΧ  
  3.  `dispersion importance' (measured by     σΥ       /  βΧ* σΧ  


    Of the last, Achen says: `although almost no one is substantively interested in it, many social scientists use it as their sole importance measure.' He suggests standardization by the (supposed fixed) range of a variable, rather than by its s.d., to achieve comparability across samples.

The QUANTREG Procedure

by Irina 22. October 2007 12:40
The QUANTREG procedure models the effects of covariates on the conditional quantiles of a response variable by means of quantile regression.Quantile regression, which was introduced by Koenker and Bassett (1978), extends the regression model to conditional quantiles of the response variable, such as the median or the 90th percentile. Quantile regression is particularly useful when the rate of change in the conditional quantile, expressed by the regression coefficients, depends on the quantile.

  • Quantile regression is also flexible in the sense that it does not involve a link function that relates the variance and the mean of the response variable.
  • Quantile regression also offers a degree of data robustness.
  • Quantile regression cannot be carried out simply by segmenting the unconditional distribution of the response variable and then obtaining least-squares fits for the subsets. This approach leads to disastrous results when, for example, the data include outliers. In contrast, quantile regression uses all of the data for fitting quantiles, even the extreme quantiles.
    proc quantreg data=trout alpha=0.01 ci=resampling;
    model LnDensity = WDRatio / quantile=0.9
    CovB CorrB
    seed=12345;
    test WDRatio;
    run;
    
    ods html;
    ods graphics on;
    proc quantreg data=trout alpha=0.1 ci=resampling;
    model LnDensity = WDRatio / quantile=all seed=12345
    plot=quantplot;
    run;
    ods graphics off;
    ods html close;
    
    %macro quantiles(NQuant, Quantiles);
    %do i=1 %to &NQuant;
    proc quantreg data=bmimen ci=none algorithm=interior;
    model logbmi = inveage sqrtage age sqrtage*age
    age*age age*age*age
    / quantile=%scan(&Quantiles,&i,",");
    output out=outp&i pred=p&i;
    run;
    %end;
    %mend;
    %let quantiles = %str(.03,.05,.10,.25,.5,.75,.85,.90,.95,.97);
    %quantiles(10,&quantiles);
  • Factor Analysis

    by Irina 17. July 2007 10:45

    A Factor is a dimension underlying several variables.

    Analytical, it is a linear combination of the variables: F1=W1X1+W2X2+... Where: F1 - factor1, Xj - the variables of the study (5 in our example), Wj - weights used to combine the individual scores. The various methods of factor analysis are distinguished by the manner in which the weights Wj are determined.

    A Factor score: The score of a respondent on a factor. If we decide to settle with two factors we will have two factor scores for each of the 500 respondents.

    A Factor loading: The correlation between a factor and a variable

    Labeling Factors: The art of segmentation; consists of selecting a term which best describes all the variables that load highly a factor. Factor #1 may be labeled as “price conscious”: and factor #2 as “ fashion conscious”.

    The proportion of total variance of a certain variable accounted for by a factor may be obtained by squaring the loading. In our example factor #1 explains .92342=86.94% of the variance in variable 4.

    proc transpose  data =event_transaction  out=result prefix=event;
    by branch_cust_ip;
    id Event_Costing_Activity_Type_Co;
    var count;
    run;
    
    data result;
    set result;
    
    array events{*} _NUMERIC_ ;
    do i = 1 to dim(events);
    if events{i} = . then events{i} = 0;
    end;
    drop i;
    run;
    
    
    proc factor score data=result method=p rotate=orthomax nfactors=10  outstat=fact_events;
    var  event: ;
    run;
    
    proc score data=personal score=fact_events  out=scores_events;
    var  event: ;
    run;
    
    data scores_events;
    set  scores_events;
    max=max(Factor1,
    Factor2,
    Factor3,
    Factor4,
    Factor5,
    Factor6,
    Factor7,
    Factor8,
    Factor9,
    Factor10)
    ;
    
    min=min(Factor1,
    Factor2,
    Factor3,
    Factor4,
    Factor5,
    Factor6,
    Factor7,
    Factor8,
    Factor9,
    Factor10)
    ;
    run;
    
    data scores_events;
    set  scores_events;
    array factor Factor1-factor10;
    
    do i=1 to dim(factor);
    if max=factor [i] then factor_max=i;
    if min=factor [i] then factor_min=i;
    end;
    run;
    

    Principal Component Analysis

    by Irina 5. May 2007 01:49

    The Basics of Principal Component Analysis

    Principal component analysis is appropriate when you have obtained measures on a number of observed variables and wish to develop a smaller number of artificial variables (called principal components) that will account for most of the variance in the observed variables.and believe that there is some redundancy in those variables. In this case, redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same construct.The principal components may then be used as predictor or criterion variables in subsequent analyses

    Because it is a variable reduction procedure, principal component analysis is similar in many respects to exploratory factor analysis. In fact, the steps followed when conducting a principal component analysis are virtually identical to those followed when conducting an exploratory factor analysis. However, there are significant conceptual differences between the two procedures, and it is important that you do not mistakenly claim that you are performing factor analysis when you are actually performing principal component analysis

    What is a Principal Component?

    How principal components are computed. Technically, a principal component can be defined as a linear combination of optimally-weighted observed variables. In order to understand the meaning of this definition, it is necessary to first describe how subject scores on a principal component are computed.It is possible to calculate a score for each subject on a given principal component. For example, in the preceding study, each subject would have scores on two components: one score on the satisfaction with supervision component, and one score on the satisfaction with pay component. The subject’s actual scores on the seven questionnaire items would be optimally weighted and then summed to compute their scores on a given component.

    For example, assume that component 1 in the present study was the “satisfaction with supervision” component. You could determine each subject’s score on principal component 1 by using the following fictitious formula:
    C1 = .44 (X1) + .40 (X2) + .47 (X3) + .32 (X4) + .02 (X5) + .01 (X6) + .03 (X7)

    The SAS System’s PROC FACTOR solves for these weights by using a special type of equation called an eigenequation. The weights produced by these eigenequations are optimal weights in the sense that, for a given set of data, no other set of weights could produce a set of components that are more successful in accounting for variance in the observed variables. The weights are created so as to satisfy a principle of least squares similar (but not identical) to the principle of least squares used in multiple regression.

    Number of components extracted.

    In reality, the number of components extracted in a principal component analysis is equal to the number of observed variables being analyzed. However, in most analyses, only the first few components account for meaningful amounts of variance, so only these first few components are retained, interpreted, and used in subsequent analyses (such as in multiple regression analyses).

    What is meant by “total variance” in the data set?

    The “total variance” in the data set is simply the sum of the variances of these observed variables. Because they have been standardized the total variance in a principal component analysis will always be equal to the number of observed variables being analyzed

    Principal Component Analysis is Not Factor Analysis !

    Both procedures can be performed with the SAS System’s FACTOR procedure, and they sometimes even provide very similar results. But factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables. And in contrast, principal component analysis makes no assumption about an underlying causal model. Principal component analysis is simply a variable reduction procedure that (typically) results in a relatively small number of components that account for most of the variance in a set of observed variables.

    What is a communality?

    A communality refers to the percent of variance in an observed variable that is accounted for by the retained components (or factors).

    SAS Program and Output.

    You may perform a principal component analysis using either the PRINCOMP or FACTOR procedures.
    PROC FACTOR DATA=data-set-name
    PREPLOT PLOT
    SIMPLE
    METHOD=PRIN
    PRIORS=ONE
    MINEIGEN=p
    SCREE
    ROTATE=VARIMAX
    ROUND
    FLAG=desired-size-of-"significant"-factor-loadings ;
    VAR variables-to-be-analyzed ;
    RUN;
    FLAG=desired-size-of-”significant”-factor-loadings causes the printer to flag (with an asterisk) any factor loading whose absolute value is greater than some specified size.
    METHOD=factor-extraction-method specifies the method to be used in extracting the factors or components. The current program specifies METHOD=PRIN to request that the principal axis (principal factors) method be used for the initial extraction. This is the appropriate method for a principal component analysis
    PREPLOT option will show us a factor plot before rotation.
    PLOT option will show us a factor plot after rotation
    MINEIGEN=p specifies the critical eigenvalue a component must display if that component is to be retained.This statement will cause PROC FACTOR to retain and rotate any component whose eigenvalue is p or larger. Negative values are not allowed. (here, p = the critical eigenvalue).
    NFACT=n  allows you to specify the number of components to be retained and rotated, where n = the number of components.
    PRIORS=prior-communality-estimates specifies prior communality estimates. Users should always specify PRIORS=ONE to perform a principal component analysis.
    ROTATE=rotation-method specifies the rotation method to be used. The preceding program requests a varimax rotation, which results in orthogonal (uncorrelated) components.
    ROUND causes all coefficients to be limited to two decimal places, rounded to the nearest integer, and multiplied by 100 (thus eliminating the decimal point).
    SIMPLE requests simple descriptive statistics: the number of usable cases on which the analysis was performed, and the means and standard deviations of the observed variables.

    DETERMINING SAMPLE SIZE

    by Irina 20. April 2007 10:11

    Sample size determination is computed using three inputs:

  • The estimate of the population standard deviation (often obtained from earlier studies )
  • The acceptable level of sampling error
  • The desired confidence level

    Generally, research practitioners utilize the following sequence and inputs in computing sample size:

    1. Survey respondents will split 50/50 in response to dichotomous (e.g. yes/no) questions.

    2. The desired level of confidence will be 95%, or 1.96 standard deviations from the mean or .05 possible .

    Py = Proportion responding “yes”

    Pn = Proportion responding “no”

     Standard error is the acceptable amount of error/confidence interval. In the above case .05/1.96 (about 2 standard deviations), or .0255102. The standard formula for computing the sample size is:

    Py) (Pn)

    Std Error2

    So, when the respective values are input, we end up with .25/.0006507 or 384 respondents. This is why a survey sample size of 400 is often recommended.Sample size is important in avoiding Type I or Type II errors.

    Type I errors  are made by stating that there is a difference between two groups within a population on a given measurement, when in fact there is no difference. Accommodating this potential outcome is where most sample size calculations stop. Often, practitioners simply ignore the possibility of making a Type II error. The sample size typically needed to address Type I errors is 384.

    Type II errors  are made by stating that there is no difference between two groups within a population on a given measurement, when in fact there is a difference. While important, many researchers ignore statistical power calculations. In the “real world” tables and canned statistical tools are utilized to determine survey power, due to the complexity of the formulas. The sample size typically needed to address Type II errors is 1,236.

    Confidence level   suggests that other samples drawn from the same population will have similar values X% of the time. For most marketing research exercises, confidence levels are set at 95%.

    Confidence interval   includes the possible end point values for the entire population. The confidence interval allows for a computed amount of variation from the mean value based on the precision/cost value trade-off.

  • Carl Bergemann

    Sampling

    by Irina 19. April 2007 05:11

    Nonprobability Sampling

    The difference between nonprobability and probability sampling is that nonprobability sampling does not involve random selection and probability sampling does.Ii is not necessarily mean that nonprobability samples aren't representative of the population, but it does mean that nonprobability samples cannot depend upon the rationale of probability theory. With nonprobability samples, we may or may not represent the population well, and it will often be hard for us to know how well we've done so.However,in applied social research there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling.

    Accidental, Haphazard or Convenience Sampling

    One of the most common methods of sampling goes under the various titles listed here. I would include in this category the traditional "man on the street" (of course, now it's probably the "person on the street") interviews conducted frequently by television news programs to get a quick (although nonrepresentative) reading of public opinion.

    Purposive Sampling

    In purposive sampling, we sample with a purpose in mind. We usually would have one or more specific predefined groups we are seeking, for instance, Caucasian females between 30-40 years old . One of the first things is to do is verify that the respondent does in fact meet the criteria for being in the sample. Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible.

    • Modal Instance Sampling

    In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a modal instance sample, we are sampling the most frequent case, or the "typical" case.

    • Expert Sampling

    Expert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts." There are actually two reasons you might do expert sampling. First, because it would be the best way to elicit the views of persons who have specific expertise. But the other reason you might use expert sampling is to provide evidence for the validity of another sampling approach you've chosen.The disadvantage is that even the experts can be, and often are, wrong.

  • Quota Sampling
  • In quota sampling, you select people nonrandomly according to some fixed quota. There are two types of quota sampling: proportional and non proportional. In proportional quota sampling you want to represent the major characteristics of the population by sampling a proportional amount of each. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. So, if you've already got the 40 women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women respondents come along, you will not sample them because you have already "met your quota." The problem here (as in much purposive sampling) is that you have to decide the specific characteristics on which you will base the quota. Will it be by gender, age, education race, religion, etc.?

    Nonproportional quota sampling is a bit less restrictive. In this method, you specify the minimum number of sampled units you want in each category. here, you're not concerned with having numbers that match the proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk about even small groups in the population. This method is the nonprobabilistic analogue of stratified random sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.

    • Heterogeneity Sampling

    We sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about representing these views proportionately. Another term for this is sampling for diversity. In many brainstorming or nominal group processes (including concept mapping), we would use some form of heterogeneity sampling because our primary interest is in getting broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what we would like to be sampling is not people, but ideas. We imagine that there is a universe of all possible ideas relevant to some topic and that we want to sample this population, not the population of people who have the ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, we have to include a broad and diverse range of participants. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.

    • Snowball Sampling

    In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you are not likely to be able to find good lists of homeless people within a specific geographical area. However, if you go to that area and identify one or two, you may find that they know very well who the other homeless people in their vicinity are and how you can find them.

    Sampling

    by Irina 16. April 2007 12:15

    Sampling is the process of selecting units (e.g., people, organizations) from a population of interest .

    Let's begin by covering some of the

  • key terms in sampling
  • statistical Terms in Sampling

    Probability Sampling

    A probability sampling method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen.

       

    Some Definitions

    • N = the number of cases in the sampling frame
    • n = the number of cases in the sample
    • NCn = the number of combinations (subsets) of n from N
    • f = n/N = the sampling fraction
    • Objective: To select n units out of N such that each NCn has an equal chance of being selected.
    • Procedure: Use a table of random numbers, a computer random number generator, or a mechanical device to select the sample.
  • more...

     

    Stratified Random Sampling

    Stratified Random Sampling, also sometimes called proportional or quota random sampling, involves dividing your population into homogeneous subgroups and then taking a simple random sample in each subgroup.

    There are several major reasons why you might prefer stratified sampling over simple random sampling. First, it assures that you will be able to represent not only the overall population, but also key subgroups of the population, especially small minority groups. If you want to be able to talk about subgroups, this may be the only way to effectively assure you'll be able to. If the subgroup is extremely small, you can use different sampling fractions (f) within the different strata to randomly over-sample the small group (although you'll then have to weight the within-group estimates using the sampling fraction whenever you want overall population estimates). When we use the same sampling fraction within strata we are conducting proportionate stratified random sampling. When we use different sampling fractions in the strata, we call this disproportionate stratified random sampling. Second, stratified random sampling will generally have more statistical precision than simple random sampling. This will only be true if the strata or groups are homogeneous. If they are, we expect that the variability within-groups is lower than the variability for the population as a whole. Stratified sampling capitalizes on that fact.

    Systematic Sampling

    This is random sampling with a system!  From the sampling frame, a starting point is chosen at random, and thereafter at regular intervals.For example, suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116. If there were 125 houses, 125/8=15.625, so should you take every 15th house or every 16th house? If you take every 16th house, 8*16=128 so there is a risk that the last house chosen does not exist. To overcome this the random starting point should be between 1 and 10. On the other hand if you take every 15th house, 8*15=120 so the last five houses will never be selected. The random starting point should now be between 1 and 20 to ensure that every house has some chance of being selected.

    Cluster (Area) Random Sampling

    In cluster sampling the units sampled are chosen in clusters, close to each other. Examples are households in the same street, or successive items off a production line. The population is divided into clusters, and some of these are then chosen at random. Within each cluster units are then chosen by simple random sampling or some other method. Ideally the clusters chosen should be dissimilar so that the sample is as representative of the population as possible.Clearly this strategy will help us to economize on our mileage, but the possible disadvantages that units close to each other may be very similar and so less likely to represent the whole population and usually sampling error is larger than in simple random sampling .

  • About the author

    Irina Spivak Irina Spivak
    Team Leader at G-Stat. More...


    Send mail Email

    Blogroll

      Disclaimer

      The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

      © Copyright 2012

      Sign in

      eXTReMe Tracker