Categorising a continuous variable into deciles

by Irina 12. May 2007 07:53

This page demonstrates methods of categorising continuous variables into deciles with TERADATA Query.

The formula used to calculate the quantile rank of a value is :

FLOOR ( rank*k/(n+1) )

where rank is the value's rank, k is the number of groups specified with the GROUPS= option, and n is the number of observations having nonmissing values of the ranking variable.

Note that this means that when grouping variables into quintiles, the groups will be assigned values 0,1,2,3, and 4.
In the presence of ties, the default behaviour if that each value is assigned the same rank, which is given by the mean of the corresponding ranks. As such, tied values are always assigned to the same quantile.

The query is:

proc sql;
connect to teradata
(user=me password=aaa tdpid=DWPROD);
create table income_april as
select * from connection to teradata
    (select    a.*,
     1 as indikator
case when income>0 then 1 else 0 end as kod_income,
sum(kod_income)       over (partition by  c_age) as sum_income,
case when kod_income=1 then  RANK( ) OVER (PARTITION BY
c_age,kod_income  ORDER by income) end as   ranker_income
case when kod_income =1 then((ranker_income *9)/(sum_income+1)) +1 else 0 end   as R_L_income,

from dwp_sas.income  a    ) ;
disconnect from teradata;
quit;
 

In this case we want to each group of age create deciles and to separate the group with income=0.

About the author

Irina Spivak Irina Spivak
Team Leader at G-Stat. More...


Send mail Email

Blogroll

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2012

    Sign in

    eXTReMe Tracker