Categorising a continuous variable into deciles
12. May 2007 07:53This page demonstrates methods of categorising continuous variables into deciles with TERADATA Query.
The formula used to calculate the quantile rank of a value is :
FLOOR ( rank*k/(n+1) )
where rank is the value's rank, k is the number of groups specified with the GROUPS= option, and n is the number of observations having nonmissing values of the ranking variable.
Note that this means that when grouping variables into quintiles, the groups will be assigned values 0,1,2,3, and 4.
In the presence of ties, the default behaviour if that each value is assigned the same rank, which is given by the mean of the corresponding ranks. As such, tied values are always assigned to the same quantile.
The query is:
proc sql;
connect to teradata
(user=me password=aaa tdpid=DWPROD);
create table income_april as
select * from connection to teradata
(select a.*,
1 as indikator
case when income>0 then 1 else 0 end as kod_income,
sum(kod_income) over (partition by c_age) as sum_income,
case when kod_income=1 then RANK( ) OVER (PARTITION BY
c_age,kod_income ORDER by income) end as ranker_income
case when kod_income =1 then((ranker_income *9)/(sum_income+1)) +1 else 0 end as R_L_income,
from dwp_sas.income a ) ;
disconnect from teradata;
quit;
In this case we want to each group of age create deciles and to separate the group with income=0.

Email 