Using Panel Data in Stata

by Irina 9. February 2008 11:27
A panel dataset should have data on n cases, over t time periods, for a total of n × t observations. Data like this is said to be in long form. In some cases your data may come in what is called the wide form, with only one observation per case and variables for each different value at each different time period. To analyze data like this in Stata using commands for panel data analysis, you need to first convert it to long form. This can be done using Stata's reshape command
reshape long EXPOSURE, i(GROUP) j(year)  
(note:  j = 2000 2001 2002)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        200 ->       600
Number of variables                   4   ->       3
j variable (3 values)                     ->   year
xij variables:
 EXPOSURE2000 EXPOSURE2001 EXPOSURE2002 ->       EXPOSURE---------------------

long tells reshape that we want to go from wide to long

EXPOSURE tells Stata that the stem of the variable to be converted from wide to long is EXPOSURE

i(GROUP) option tells reshape that GROUP is the unique identifier for records in their wide format

j(year) tells reshape that the suffix of faminc (i.e., 2000 2001 2002) should be placed in a variable called year

The reshape wide command puts the data back into wide format

Confidence intervals for the predicted values - logistic regression

by Irina 14. April 2007 08:36
Using predict after logistic to get predicted probabilities and confidence intervals is somewhat tricky. The following two commands will give you predicted probabilities:
        . logistic ...
        . predict phat
The following does not give you the standard error of the predicted probabilities:
        . logistic ...
        . predict se_phat, stdp
Despite the name we chose, se_phat does not contain the standard error of phat. What does it contain? The standard error of the predicted index. The index is the linear combination of the estimated coefficients and the values of the independent variable for each observation in the dataset. Suppose we fit the following logistic regression model:
        . logistic y x 
This model estimates b0 and b1 of the following model: P(y = 1) = exp(b0+b1*x)/(1 + exp 0+b1*x)) Here the index is b0 + b1*x. We could get predicted values of the index and its standard error as follows:
        . logistic y x
        . predict lr_index, xb
        . predict se_index, stdp
We could transform our predicted value of the index into a predicted probability as follows:
. gen p_hat = exp(lr_index)/(1+exp(lr_index))
This is just what predict does by default after a logistic regression if no options are specified. Using a similar procedure, we can get a 95% confidence interval for our predicted probabilities by first generating the lower and upper bounds of a 95% confidence interval for the index and then converting these to probabilities:

. gen lb = lr_index - invnorm(0.975)*se_index
. gen ub = lr_index + invnorm(0.975)*se_index
. gen plb = exp(lb)/(1+exp(lb))
. gen pub = exp(ub)/(1+exp(ub))
Generating the confidence intervals for the index and then converting them to probabilities to get confidence intervals for the predicted probabilities is better than estimating the standard error of the predicted probabilities and then generating the confidence intervals directly from that standard error. The distribution of the predicted index is closer to normality than the predicted probability.
  • Confidence intervals for the predicted values - logistic regression-stata
  • About the author

    Irina Spivak Irina Spivak
    Team Leader at G-Stat. More...


    Send mail Email

    Authors

    Blogroll

      Disclaimer

      The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

      © Copyright 2010

      Sign in

      eXTReMe Tracker