1

I have a data.frame where I want to remove rows which have consecutive days. For example, I have the following data.frame (head), its name is sell_tv and I want to remove the rows which have consecutive dates. In this particular case I want to remove row 5, as row 5 & 6 have consecutive dates.

    Date Open High  Low Close  Sell.TV   Buy.TV
1 2015-04-08 2207 2204 2165  2166 4.038113 3.083603
2 2015-03-16 2214 2215 2172  2198 4.041986 3.087017
3 2015-03-05 2343 2364 2320  2324 4.023689 3.081034
4 2015-01-27 2171 2182 2151  2178 4.021998 3.070200
5 2015-01-23 2234 2244 2222  2230 4.032086 3.061206
6 2015-01-22 2278 2282 2242  2246 4.037248 3.095450

I have written the following code for this but getting:

****"Error in if (sell_tv$Date[i] == sell_tv$Date[i + 1] + 1) { :   missing value where TRUE/FALSE needed"****

Code :

for( i in 1:nrow(sell_tv))
{
  if (sell_tv$Date[i] == sell_tv$Date[i+1] + 1 )
  {
    new_sell<- sell_tv[-i,]  
  }       
  else
  {
    new_sell<- sell_tv[,]
  }
  i= i+1      
}

Thankful for any help!

2
  • I see several problems in your code. first, you don't need to increment i because R will do that for you (because that's how nice R is ;-)). Then i+1 when i is nrow(sell_tv) will not correspond to an existing row. One way to go, keeping the basis of your code, could be to create a variable where you can put the numbers of the rows that are to be deleted and then you can delete them all at once Commented Apr 9, 2015 at 9:43
  • No need for a loop. It's a one-liner with logical-indexing with the diff operator: sell_tv[ c(9999,diff(sell_tv$Date)) != -1, ] Commented Apr 9, 2015 at 11:43

3 Answers 3

1

As I said in my comments, you can keep your loop and save the number of the rows that should be deleted in a variable or you can try getting the row numbers at once:

to_delete <- which(sell_tv$Date[-nrow(sell_tv)]==sell_tv$Date[-1]+1) #5
new_sell <- sell_tv[-to_delete, ]

new_sell
        # Date Open High  Low Close  Sell.TV   Buy.TV
# 1 2015-04-08 2207 2204 2165  2166 4.038113 3.083603
# 2 2015-03-16 2214 2215 2172  2198 4.041986 3.087017
# 3 2015-03-05 2343 2364 2320  2324 4.023689 3.081034
# 4 2015-01-27 2171 2182 2151  2178 4.021998 3.070200
# 6 2015-01-22 2278 2282 2242  2246 4.037248 3.095450

data

sell_tv <- structure(list(Date = structure(c(16533, 16510, 16499, 16462, 
16458, 16457), class = "Date"), Open = c(2207L, 2214L, 2343L, 
2171L, 2234L, 2278L), High = c(2204L, 2215L, 2364L, 2182L, 2244L, 
2282L), Low = c(2165L, 2172L, 2320L, 2151L, 2222L, 2242L), Close = c(2166L, 
2198L, 2324L, 2178L, 2230L, 2246L), Sell.TV = c(4.038113, 4.041986, 
4.023689, 4.021998, 4.032086, 4.037248), Buy.TV = c(3.083603, 
3.087017, 3.081034, 3.0702, 3.061206, 3.09545)), .Names = c("Date", 
"Open", "High", "Low", "Close", "Sell.TV", "Buy.TV"), row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")
Sign up to request clarification or add additional context in comments.

Comments

0

This solution could be used for unique and duplicate dates in Date column of sell_tv data frame

sell_tv = read.table("myfile.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE)

print(sell_tv)
#         Date Open High  Low Close  Sell.TV   Buy.TV
# 1 2015-04-08 2207 2204 2165  2166 4.038113 3.083603
# 2 2015-03-16 2214 2215 2172  2198 4.041986 3.087017
# 3 2015-03-05 2343 2364 2320  2324 4.023689 3.081034
# 4 2015-01-27 2171 2182 2151  2178 4.021998 3.070200
# 5 2015-01-23 2234 2244 2222  2230 4.032086 3.061206
# 6 2015-01-22 2278 2282 2242  2246 4.037248 3.095450

#add duplicate date
sell_tv[3,1] = "2015-01-23"

print(sell_tv)
# Date Open High  Low Close  Sell.TV   Buy.TV
# 1 2015-04-08 2207 2204 2165  2166 4.038113 3.083603
# 2 2015-03-16 2214 2215 2172  2198 4.041986 3.087017
# 3 2015-01-23 2343 2364 2320  2324 4.023689 3.081034
# 4 2015-01-27 2171 2182 2151  2178 4.021998 3.070200
# 5 2015-01-23 2234 2244 2222  2230 4.032086 3.061206
# 6 2015-01-22 2278 2282 2242  2246 4.037248 3.095450

date_str = sell_tv$Date

to_delete = c()

for(i in date_str){
  a1 = which(unlist(lapply(date_str, function(x) as.numeric(difftime(x, i))))== 1)
  if(length(a1) > 0){
    to_delete = c(to_delete, a1) 
    } else
      next
}

sell_tv = sell_tv[-to_delete,]

Output:

print(sell_tv)
        Date Open High  Low Close  Sell.TV   Buy.TV
1 2015-04-08 2207 2204 2165  2166 4.038113 3.083603
2 2015-03-16 2214 2215 2172  2198 4.041986 3.087017
4 2015-01-27 2171 2182 2151  2178 4.021998 3.070200
6 2015-01-22 2278 2282 2242  2246 4.037248 3.095450

Comments

0

Use logical-indexing with the diff operator on Date:

sell_tv[ c(9999,diff(sell_tv$Date)) != -1, ]

where we just prepend some sentinel value to the output from diff(...)

and if you want to exclude both 'day before or after', then boolean-not the %in% operator:

sell_tv[ ! (c(9999,diff(sell_tv$Date)) %in% c(-1,+1)), ]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.