1

I'm writing a program in SAS.

Here's the dataset I have:

id huuse days  
1   0   4  
1   0   3  
1   1   12    
1   1   1  
1   2   15  
2   1   13  
2   0   16  
2   1   18  
2   0   44

For each ID, I want to delete the record if variable huuse ne 1, until I get to the first huuse=1. Then I want to keep that record and all subsequent records for that id, no matter what value huuse is. So for id=1, I want to delete the first two records than keep all records for id=1 starting with the 3rd record. For id=2, the first record has huuse=1, so I want to keep all records for id=2.

The data set I want should look like this:

id huuse days  
1   0   4  
1   0   3  
1   1   12  
1   1   1  
1   2   15  
2   1   13  
2   0   16  
2   1   18  
2   0   44  

I tried this code, but it removes all records that have huuse ne 1.

data want;  
set have;  
by id;  
do until (huuse=1);  
if huuse = 1 then LEAVE;            
if huuse ne 1 then DELETE;  
END;  
run;  

I've tried several variations of do loops, but they all do the same thing.

2 Answers 2

2

The DATA step is a program with an implicit loop that reads every record of the data set specified in the SET statement. Any program data vector (pdv) variables not coming from the data set are, by default, reset to missing at the top of the implicit loop. You change that behavior using a RETAIN statement to name variables that should not get reset.

So, in your problem you have two situations when a tracking variable is needed. The variable will track the state of the condition Have I seen huuse=1 yet in this group ?. Call this variable one_flag

  • RETAIN one_flag; so you control when it's value changes
  • At the start of a BY group one_flag needs to be reset to false (0)
  • When huuse is first seen as 1 set the flag to true (1)

Example:

data want(drop=one_flag);
  set have;
  by id;

  retain one_flag 0;

  if first.id then one_flag = 0;

  if not one_flag and huuse = 1 then one_flag = 1;

  if one_flag then OUTPUT;   * want all rows in group starting at first huuse=1;
run;

You can place the SET and BY statement inside an explicit DO and that changes the operating behavior of the program, especially if the explicit loop is terminated according to a LAST.<var> automatic variable. Such a loop is commonly called a DOW loop by SAS programmers. There is no phrase DOW loop in the SAS documentation.

Example:

data want;
  do until (last.id);
    set have;
    by id;

    if not one_flag and huuse=1 then one_flag = 1;

    if one_flag then OUTPUT;   * want all rows in group starting at first huuse=1;
  end;
run;

Because the looping is explicit and never reaches the TOP of the program with in the loop, there is no need to RETAIN the flag variable, nor reset it. Program variables that are not retained are reset automatically at the top of the program, and the top of the program is only reached at the start of the BY group. Learn more about this programming construct in the SGF 2013 paper "The Magnificent DO", Paul M. Dorfman

Sign up to request clarification or add additional context in comments.

2 Comments

Your code looks like it will eliminating the opposite set of records from what was asked. You are deleting once the first 1 is seen instead of outputting once the first 1 is seen.
Yes! Worked perfectly. Thank you not only for the answer, but for the guidance and suggestion for further reading so I can learn more. I'd been looking at another article yesterday and saw the Do loop of Whitlock, but I couldn't understand their explanation. Your explanation was great.
0

Your source and result are same :-) But if I understood your question correctly the solution is quite simple with a retain solution. I add 2 lines to the example to make it clear that I understood correctly.

The code with example table:

    data test; 
    id=1;huuse=0;days=4;output;  
    id=1;huuse=0;days=3;output;  
    id=1;huuse=1;days=12;output;    
    id=1;huuse=1;days=1;output;  
    id=1;huuse=2;days=15;output;  
    id=2;huuse=1;days=13;output;  
    id=2;huuse=0;days=16;output;  
    id=2;huuse=1;days=18;output;  
    id=2;huuse=0;days=44;output;
    id=3;huuse=0;days=1;output;
    id=3;huuse=1;days=2;output;
    run;

    data test_output;
    set test;
    retain keep_id -1;
    if (keep_id ne id and huuse ne 0) then keep_id=id;
    if keep_id = id then output;
    run;


    /* the results:
    id  huuse   days
    1   1   12  1
    1   1   1   1
    1   2   15  1
    2   1   13  2
    2   0   16  2
    2   1   18  2
    2   0   44  2
    3   1   2   3
    */      

1 Comment

Yes, you understood it correctly. And thank you for the solution.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.