Arrays vs Transpose

by Irina 28. May 2007 11:41
To transpose your data (turning variables into observations or turning observations into variables), you can use either PROC TRANSPOSE or array processing within a DATA step.
           data a;
 	   input id  date total cod flounder haddock perch tuna;
 	   cards;
 	  1  1983	75	3.00	8.20	1.30	 .90	10.00
	  1  1984	80	4.60	8.80	2.60	1.40	11.20
	  1  1985	90	5.00	9.30	4.10	2.20	10.00
	  2  1983	100	6.50	12.20	7.90	4.00	15.70
	  2  1984	125	8.00	9.50	8.90	4.10	15.80
	  2  1985	134	11.20	15.20	22.70	4.90	26.20
	  2  1986	152	24.20	34.90	32.80	10.30	33.30
	  2  1997	177	22.60	41.70	31.90	15.90	58.70
	 ;
	run;

A Simple Transposition:

proc

sort data=a;by id date;run;

proc transpose prefix=total data=a out=b;

by id;

var total;

id date;

run;

Result:

 

 id NAME OF FORMER VARIABLE total1983 total1984 total1985 total1986 total1997
1 1 total 75 80 90 . .
2 2 total 100 125 134 152 177

Transposing Two Variables.

Become more complex to use

proc transpose.And the DATA step has much more flexibility in this case .

Example :

proc sql noprint ;
select max (c) into :dim
from (select count(*) as c from a group id) ;
quit ;

data ll (keep= id date _total: _flounder: ) ;
retain id ;
do _n_ = 1 by 1 until (last.id) ;
set a ;
by id ;
array _total [&dim] ;
array _flounder[&dim];
_total [_n_] = total ;
_flounder [_n_]=flounder;
end ;
run ;

 

Arrays from A to Z

by Irina 16. May 2007 01:19

Arrays from A to Z
by Phil Spector

Arrays are a convenient way of grouping variables,and can be used for simple repetitive tasks, reshaping data sets, and \remembering" values from observation-to-observation. Arrays can be used to allow some traditional matrix-style programming techniques to be used in the data step.

Array Statement: Syntax:

ARRAY name<fnelemg> <$> <<elements <(initial-values)>>;

Examples:
array x x1-x3;
array check{5} _temporary_;
array miss{4} _temporary_ (9 9 99 9);
array dept $ dept1-dept4 ('Sales','Research','Training');
array value{3}; * generates value1, value2 and value3;

  • All variables in an array must have the same type (numeric or character).
  • An array name can't have the same name as a variable .
  • You must explicitly state the number of elements when using
    _temporary_; in other cases SAS figures it out from context,generating new variables if necessary.

    Advanced Features of Arrays :
  • You can specify the range of subscripts in an array with the notation start:finish. For example, the declaration: array income{1997:2000} in1 - in4; would allow you to refer to income{1997}, income{1998}, etc. The functions lbound and hbound will return the lowest and highest indices defined for an array.
  • Array names can be used in RETAIN statements, and, when used with the subscript {*} in PUT or INPUT statements;
  • If an array name coincides with the name of a SAS function, the array will override the function for the duration of the data step.
  • When an array is declared using _temporary_, values of the elements of the array are not set to missing at the beginning of each observation.

    Using Parallel Arrays:

    data new;
    set old;
    array x x1-x10;
    array mval _temporary_ (9 9 9 9 9 99 99 99 99 99);
    do i=1 to dim(x);
    if x{i} = mval{i} then x{i} = .;
    end;
    run;
  • If you don't use _temporary__, you usually need to include a drop statement.
  • There's no limit to the number of parallel arrays you can create.

    Another Example of the array Statement :

    array class class1-class5;
    total = 0;
    do i = 1 to 5 until(total >= 10);
    total = total + class{i};
    end;
    year = i;
    if total lt 10 then year = .;

    Reshaping Data Sets: I. One to Many:

    Consider a data set with 4 variables (x1-x4) stored as follows:
    ID X1 X2 X3 X4
    1 17 19 22 24
    2 18 14 33 16
    3 19 28 31 42
    The goal is to create four observations for each original observation, one for each variable.
    data new;
    set old;
    array xx x1-x4;
    do time=1 to 4;
    x = xx{time};
    output;
    end;
    drop x1-x4;

    Reshaping Data Sets: Example 2:


    Consider the transformed data set from the previous example. Suppose we wish to put it back to it's original form:
    data next;
    set new;
    by id;
    array xx x1-x4;
    retain xx;
    if first.id then do i=1 to 4;xx{i} = .;end;
    xx{time} = x;
    if last.id then output;
    drop x i;
    run;
  • Tags: array

    SAS

    BASKET ANALYSES.

    by Irina 14. April 2007 08:06

    data  PRODUCTS;
    set    PRODUCTS;
    array dich[15]  prod1-prod15;

    array name{15} $ _temporary_ ('p1',....,'p2' );

    do i=1 to 15;
    category=dich [i];
    prod=name[i];
    num_product=i;
    output;
    end;
    keep sd10_numerator category num_product ;
    run;


    data BASKET_ANALYSES;
    set zevet1.for_model(where=(tzvt_activ_form='300'));
    array PR prod1-prod15;
    m=0;
    do i=1 to 15;
    do j=i+1 to 15;
    do k=j+1 to 15;

    comb=PR[i]+PR[j]+PR[k];
    name=compress(vname(aa[i])||'_'||vname(aa[j])||'_'||vname(aa[k]));
    m+1;
    if comb<3 then continue;
    else output;

    end;
    end;
    end;
    run;

    Now we wante to fix for each kind of treatment the most frequent combinations:

    proc freq data=basket_analyses;
    tables name*treatment/out=basket_result;
    run;

    proc sort data=basket_result;
    by treatment descending count;
    run;

    data basket_result;
    set basket_result;
    by  treatment ;
    retain counter ;
    if first.treatment  then counter=1;
    if treatment =lag(treatment) then counter=counter+1;
    output;
    run;

    data frequent_resultes;
    set basket_result;
    if counter  <=4;
    run;

    Transpose with Array

    by Irina 14. April 2007 07:58

    data product_transpose;
    set product_transpose_200110_200201;
    array product

    ms_cards
    salary
    l_miuazim
    miuazim
    num_hk
    sum_hk 
    loan
    misgeret_ashrai
    osher
    pasiv;

    array name{10} $ 16;
    do i=1 to dim(product);
    name[i]=vname(product[i]);
    amount=product [i];
    varname=name[i];
    month_lag=month_lag6;
    output;
    end;
    keep amount varname month_lag;
    run;


    proc sql;
    select distinct
    year into: period
    separated by '+'
    from example;
    quit;

     

    %macro trans;
    %do i=1 %to 2;
    DATA example (DROP=I year x y year1 year2 z);
    ;
    ARRAY years {2} year1- year2;
    ARRAY xs {2} x1-x2;
    ARRAY ys {2} y1-y2;
    DO I=1 TO 2 UNTIL (LAST.b);
    SET dug;
    BY b;
    years {I}=year;
    x_%scan(&period,&i,+)=xs{I};

    xs{I}=x;
    ys{I}=y;
    END;

    run;
    %end;
    %mend;
    %trans;

     

    The way of creating the same distribution

    by Irina 14. April 2007 06:41

    Sometimes We want to create the same distribution .

    We can do it in this way : The ttt limits the commulative distribution of another variable, which we can receive from proc freq )

    DATA  SAME_DISTR;
    SET SAME_DISTR;
    IF TARGET =0 THEN DO;
    ttt=ranuni(31311115)*100;
    if ttt<=5 then kod=200502;
    else if ttt<=28 then kod=200503;
    else if ttt<=40 then kod=200504;
    else if ttt<=52 then kod=200505;
    else if ttt<=62 then kod=200506;
    else if ttt<=71 then kod=200507;
    else if ttt<=80 then kod=200508;
    else if ttt<=88 then kod=200509;
    else if ttt<=93 then kod=200510;
    else if ttt<=100 then kod=200511;
    end;
    if target=1 then kod=100* year( Loan_Value_Date)+month(Loan_Value_Date);
    run;

    This is more wise way to do the same:


    proc freq DATA = required_destribution;
    tables Loan_Value_Date /out = outkod outcum noprint;
    run; data _null_ ; length kod_str pct_str $5000;
    set outkod end=eof;
    retain kod_str pct_str;
    kod_str=compress(kod_str||','||Loan_Value_Date);
    pct_str=compress(pct_str||','||cum_pct);
    if eof then do;
    call symput('a1',substr(pct_str,2));
    call symput('a2',substr(kod_str,2));
    call symput('nn',_n_);
    end;
    run;

    DATA SAME_DISTR;
    SET SAME_DISTR;
    array a1 {&nn} _temporary_ (&a1);
    array a2 {&nn} _temporary_ (&a2);
    IF TARGET =0 THEN DO;
    ttt=ranuni(31311115)*100;
    do i=1 to dim(a1);
    if i=1 then do;
    if ttt<=a1[i] then Loan_Value_Date=a2[i];
    end;
    else do;
    if a1[i-1]<ttt<=a1[i] then Loan_Value_Date=a2[i];
    end;
    end;
    end;

    kod=100* year( Loan_Value_Date)+month(Loan_Value_Date);
    run;

    About the author

    Irina Spivak Irina Spivak
    Team Leader at G-Stat. More...


    Send mail Email

    Authors

    Blogroll

      Disclaimer

      The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

      © Copyright 2010

      Sign in

      eXTReMe Tracker