5
\$\begingroup\$

The following code finds no-rain periods in a time series that were preceded by rain over a threshold (e.g. 10mm in 2 days [k=48]).

significant_drydowns <- function(rain, threshold=10, k=48) {
    require(zoo)
    require(dplyr)

    # Find drydowns: sequences of 0 rain; give them a group
    starts <- (lag(rain, default=0) > 0) & (rain == 0) 
    groups <- cumsum(starts)
    groups[rain > 0] <- NaN
    groups[groups == 0] <- NaN

    # remove drydowns where previous rain is below threshold
    past_rain <- rollsum(rain, k, fill=0, align='right')
    for (t in which(starts)) {
        if (past_rain[t-1] < threshold) {
            groups[groups == groups[t]] <- NaN
        }
    }

    return(groups)
}

The for loop is really slow, partly because of the == comparison. Is there any way to speed this code up?

Example data:

rain <-c(0,0,0,2,3,0,0,0,1,5,0,0,0,0,0,0,6,1,1,1,0,0,0)
names(rain) <- rain

Example output:

R> significant_drydowns(rain, 5, 5)
  0   0   0   2   3   0   0   0   1   5   0   0   0   0   0   0   6   1   1   1   0   0   0 
NaN NaN NaN NaN NaN   1   1   1 NaN NaN   2   2   2   2   2   2 NaN NaN NaN NaN   3   3   3 
R> significant_drydowns(rain, 7, 4)
  0   0   0   2   3   0   0   0   1   5   0   0   0   0   0   0   6   1   1   1   0   0   0 
 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   3   3   3

So, the group names don't matter, as long as they are unique. Groups are only assigned to dry-downs for which the previous k steps has a sum greater than threshold.

\$\endgroup\$
0

1 Answer 1

6
\$\begingroup\$

The following function works without for loops or any function of the *apply family. Furthermore, it does not require additional packages but makes use of base functions only. See the code comments for further details.

significant_drydowns <- function(rain, threshold = 10, k = 48) {
  # all values except the first one
  rain_tail <- rain[-1L]
  # logical index of start of no-rain period
  starts <- head(rain, -1L) > 0 & !rain_tail
  # no-rain groups
  groups <- cumsum(starts)
  # sum of the amount of rain in the last k elements
  # (for e.g., k = 5 the filter is c(1,1,1,1,1), therefore
  # the sum of 5 preceding elements is calculated)
  past_rain <- filter(rain, rep.int(1L, k), sides = 1L)
  # valid groups (previous amount of rain exceeds threshold)
  valid <- past_rain[starts] >= threshold
  # replace `groups` values with NA if 
  #  (a) there is rain or 
  #  (b) this is the first rain or no-rain period
  is.na(groups) <- rain_tail | !groups
  # replace `groups` values with NA if amount of rain is below threshold
  # (here `groups` is used as a numeric index for `valid`)
  is.na(groups) <- !valid[groups]
  # add NA to match length of original vector and set names
  setNames(c(NA_integer_, groups), rain)
}

Some results:

> significant_drydowns(rain, 5, 5)
 0  0  0  2  3  0  0  0  1  5  0  0  0  0  0  0  6  1  1  1  0  0  0 
NA NA NA NA NA  1  1  1 NA NA  2  2  2  2  2  2 NA NA NA NA  3  3  3 
> significant_drydowns(rain, 7, 4)
 0  0  0  2  3  0  0  0  1  5  0  0  0  0  0  0  6  1  1  1  0  0  0 
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  3  3  3 
\$\endgroup\$
2
  • \$\begingroup\$ Excellent. I haven't come across the filter() function before. Very handy. Only problem is that I also use dplyr, which has its own filter(), but that's a fairly minor problem. Thanks! \$\endgroup\$ Commented Oct 3, 2014 at 13:36
  • 2
    \$\begingroup\$ @naught101 You can explicitly access the functions with stats::filter and dplyr::filter. \$\endgroup\$ Commented Oct 3, 2014 at 15:07

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.