Subsetting data frame based on columns

Question

I would like to remove certain rows from data based on values in one column. I have tried a few approaches:

#reads in data
sbc016formants.df <- read.table("file path", sep="\t", header = F, strip.white = T)

# names columns
names(sbc016formants.df) <- c("fileName", "start", "end", "vowelLabel")

# list of values I want to remove
list16 <- c(615.162, 775.885)

# produces a subset of data - removes rows with values from list 16 in the start column
sbc016formants.df <- subset(sbc016formants.df, !start %in% list16)

which produces this error message for some, but not all of my data files:

Error in match(x, table, nomatch = 0L) : 
'match' requires vector arguments

I also tried this, based on the second answer in this topic

sbc002formants.df <- sbc002formants.df[ apply(sbc002formants.df, 1 , function(x) any(unlist(x) %in% list2) ) , ]

And this gets rid of some of the items on the list (list16), but not all. I wanted to use the first answer, but I don't understand the code (I'm not sure what bl is, in the example).

Here is the code to make a reproducible example:

# creates dataframe
fileName <- c("sbc016", "sbc016", "sbc016", "sbc016")
start <- c(1.345, 2.345, 615.162, 775.885)
end <- c(100.345, 200.345, 715.162, 875.885)
sbc016formants.df <- data.frame(fileName, start, end)

# list of what I want to get rid of
list16 <- c(615.162, 775.885)

try sbc016formants.df[!(sbc016formants.df$start %in% list16),]? — aichao
– aichao, Commented Sep 25, 2016 at 19:22
I tried to reproduce the error but I do not get an error message — Pieter
– Pieter, Commented Sep 25, 2016 at 19:22
@aichao, this doesn't produce any error message, but it does not do the subsetting, either. — Lisa
– Lisa, Commented Sep 25, 2016 at 19:35
@aichaos comment does sub-setting and it works on your example data — David Arenburg
– David Arenburg, Commented Sep 25, 2016 at 19:41
Hmm, it does for me on your reproducible example. Also, I agree with @Pieter that your subset command, which is equivalent, does not produce an error on your reproducible example. So, we are forced to conclude that there is something with your data that is not the same as your reproducible example. — aichao
– aichao, Commented Sep 25, 2016 at 19:43

conor · Accepted Answer · 2016-09-25 20:28:30Z

1

Presuming I understand the question correctly, dplyr should be able to do this easily and efficiently.

fileName <- c("sbc016", "sbc016", "sbc016", "sbc016")
start <- c(1.345, 2.345, 615.162, 775.885)
end <- c(100.345, 200.345, 715.162, 875.885)
sbc016formants.df <- data.frame(fileName, start, end)

# list of what I want to get rid of
list16 <- c(615.162, 775.885)

install.packages("dplyr", dependencies = TRUE)
library(dplyr)
sbc016formants.df %>% filter(!start %in% list16)

or

sbc016formants.df %>% filter(start != list16)

answered Sep 25, 2016 at 20:28

conor

1,28710 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lisa Over a year ago

This does work, although I'm still not sure why previous solutions failed. Thank you!

Collectives™ on Stack Overflow

Subsetting data frame based on columns

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related