0

I want to extract the rows with at least one value in the columns c1-c10. I have data like this (for my data there are in total 11 columns counting the date and Q. Date and Q have values in all rows).

 date c1   c2   c3   c4 ...  Q
 1   0.1  NA   NA   NA     300
 2   NA   0.2  1.3  NA     100
 3   NA   NA   NA   NA     200
 4   NA   0.3  NA   0.4    100
 5   NA   1.4  NA   NA     150
 6   NA   NA   NA   NA     200
 7   0.5  0.3   0.5   0.6  100

I want to get this

 date c1   c2   c3   c4 ...  Q
 1   0.1  NA   NA   NA     300
 2   NA   0.2  1.3  NA     100
 4   NA   0.3  NA   0.4    100
 5   NA   1.4  NA   NA     150
 7   0.5  0.3   0.5   0.6  100

I tried this

 datawide2<- datawide1 %>% filter(rowSums(.[2:10]!="NULL")>=1)

But the result is just the rows that has all values in all columns c1-10. So in this case just shows me

 date c1   c2   c3   c4 ...  Q
 7   0.5  0.3   0.5   0.6  100

Can you help me? I don't know what is missing in what I did.

I searched in other questions but didn't find the answer.

0

2 Answers 2

3

As pointed out by @David Arenburg, you are misunderstanding NA and "NULL". You have NAs in your dataframe and you are checking for "NULL". Your code would work if you do

#Again taking from David in comments
library(dplyr)
df %>% filter(rowSums(!is.na(.[2:5])) > 0) 

This is also a good case to use filter_at

df %>%
  filter_at(vars(c1:c4), any_vars(!is.na(.)))

#  date  c1  c2  c3  c4   Q
#1    1 0.1  NA  NA  NA 300
#2    2  NA 0.2 1.3  NA 100
#3    4  NA 0.3  NA 0.4 100
#4    5  NA 1.4  NA  NA 150
#5    7 0.5 0.3 0.5 0.6 100

We could also use base R using rowSums finding out rows where there is at least one non-NA value.

cols <- 2:5 #Use this as index for columns for c1 to c10, here I have only till c4
df[rowSums(!is.na(df[cols])) > 0, ]


#  date  c1  c2  c3  c4   Q
#1    1 0.1  NA  NA  NA 300
#2    2  NA 0.2 1.3  NA 100
#4    4  NA 0.3  NA 0.4 100
#5    5  NA 1.4  NA  NA 150
#7    7 0.5 0.3 0.5 0.6 100

data

df <- structure(list(date = 1:7, c1 = c(0.1, NA, NA, NA, NA, NA, 0.5
), c2 = c(NA, 0.2, NA, 0.3, 1.4, NA, 0.3), c3 = c(NA, 1.3, NA, 
NA, NA, NA, 0.5), c4 = c(NA, NA, NA, 0.4, NA, NA, 0.6), Q = c(300L, 
100L, 200L, 100L, 150L, 200L, 100L)), .Names = c("date", "c1", 
"c2", "c3", "c4", "Q"), class = "data.frame", row.names = c(NA, 
-7L))
Sign up to request clarification or add additional context in comments.

3 Comments

Might have some small speed gains with filter(rowSums(is.na(.[2:5])) != 4L)?
@snoram yes, that would work too although I am not sure if that would be faster.
some more hacky (and seemingly fast alternatives): has_min <- function(x) !is.na(do.call(pmin, c(x, list(na.rm = TRUE)))); filter(df, has_min(.[2:5]))
1

To understand what happened you can try

df %>% mutate(rowSums(.[2:5]!="NULL"))
date  c1  c2  c3  c4   Q rowSums(.[2:5] != "NULL")
1    1 0.1  NA  NA  NA 300                        NA
2    2  NA 0.2 1.3  NA 100                        NA
3    3  NA  NA  NA  NA 200                        NA
4    4  NA 0.3  NA 0.4 100                        NA
5    5  NA 1.4  NA  NA 150                        NA
6    6  NA  NA  NA  NA 200                        NA
7    7 0.5 0.3 0.5 0.6 100                         4

This will get what you want

df %>% filter(rowSums(.[2:5]!="NULL", na.rm = TRUE)>=1)
date  c1  c2  c3  c4   Q
1    1 0.1  NA  NA  NA 300
2    2  NA 0.2 1.3  NA 100
3    4  NA 0.3  NA 0.4 100
4    5  NA 1.4  NA  NA 150
5    7 0.5 0.3 0.5 0.6 100

2 Comments

There is nothing related to "NULL" here. Your code works just by chance just because NA compared to anything is still NA. You would get the same results by running df %>% filter(rowSums(.[2:5]!="A.Suliman", na.rm = TRUE)>=1) too
@DavidArenburg good to know. Actually, I thought 'NULL' wouldn't work since > NA=='NULL' [1] NA and > NA==NULL logical(0) but it surprises me, Now I get it, thanks.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.