Vector comparison inside for-loop

Question

The following code finds no-rain periods in a time series that were preceded by rain over a threshold (e.g. 10mm in 2 days [k=48]).

significant_drydowns <- function(rain, threshold=10, k=48) {
    require(zoo)
    require(dplyr)

    # Find drydowns: sequences of 0 rain; give them a group
    starts <- (lag(rain, default=0) > 0) & (rain == 0) 
    groups <- cumsum(starts)
    groups[rain > 0] <- NaN
    groups[groups == 0] <- NaN

    # remove drydowns where previous rain is below threshold
    past_rain <- rollsum(rain, k, fill=0, align='right')
    for (t in which(starts)) {
        if (past_rain[t-1] < threshold) {
            groups[groups == groups[t]] <- NaN
        }
    }

    return(groups)
}

The for loop is really slow, partly because of the == comparison. Is there any way to speed this code up?

Example data:

rain <-c(0,0,0,2,3,0,0,0,1,5,0,0,0,0,0,0,6,1,1,1,0,0,0)
names(rain) <- rain

Example output:

R> significant_drydowns(rain, 5, 5)
  0   0   0   2   3   0   0   0   1   5   0   0   0   0   0   0   6   1   1   1   0   0   0 
NaN NaN NaN NaN NaN   1   1   1 NaN NaN   2   2   2   2   2   2 NaN NaN NaN NaN   3   3   3 
R> significant_drydowns(rain, 7, 4)
  0   0   0   2   3   0   0   0   1   5   0   0   0   0   0   0   6   1   1   1   0   0   0 
 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   3   3   3

So, the group names don't matter, as long as they are unique. Groups are only assigned to dry-downs for which the previous k steps has a sum greater than threshold.

Sven Hohenstein · Accepted Answer · 2014-10-03 15:42:53Z

The following function works without for loops or any function of the *apply family. Furthermore, it does not require additional packages but makes use of base functions only. See the code comments for further details.

significant_drydowns <- function(rain, threshold = 10, k = 48) {
  # all values except the first one
  rain_tail <- rain[-1L]
  # logical index of start of no-rain period
  starts <- head(rain, -1L) > 0 & !rain_tail
  # no-rain groups
  groups <- cumsum(starts)
  # sum of the amount of rain in the last k elements
  # (for e.g., k = 5 the filter is c(1,1,1,1,1), therefore
  # the sum of 5 preceding elements is calculated)
  past_rain <- filter(rain, rep.int(1L, k), sides = 1L)
  # valid groups (previous amount of rain exceeds threshold)
  valid <- past_rain[starts] >= threshold
  # replace `groups` values with NA if 
  #  (a) there is rain or 
  #  (b) this is the first rain or no-rain period
  is.na(groups) <- rain_tail | !groups
  # replace `groups` values with NA if amount of rain is below threshold
  # (here `groups` is used as a numeric index for `valid`)
  is.na(groups) <- !valid[groups]
  # add NA to match length of original vector and set names
  setNames(c(NA_integer_, groups), rain)
}

Some results:

> significant_drydowns(rain, 5, 5)
 0  0  0  2  3  0  0  0  1  5  0  0  0  0  0  0  6  1  1  1  0  0  0 
NA NA NA NA NA  1  1  1 NA NA  2  2  2  2  2  2 NA NA NA NA  3  3  3 
> significant_drydowns(rain, 7, 4)
 0  0  0  2  3  0  0  0  1  5  0  0  0  0  0  0  6  1  1  1  0  0  0 
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  3  3  3

Excellent. I haven't come across the filter() function before. Very handy. Only problem is that I also use dplyr, which has its own filter(), but that's a fairly minor problem. Thanks! — naught101
– naught101, Commented Oct 3, 2014 at 13:36
@naught101 You can explicitly access the functions with stats::filter and dplyr::filter. — Sven Hohenstein
– Sven Hohenstein, Commented Oct 3, 2014 at 15:07

Stack Exchange Network

Vector comparison inside for-loop

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Vector comparison inside for-loop

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions