0

I would like to remove certain rows from data based on values in one column. I have tried a few approaches:

#reads in data
sbc016formants.df <- read.table("file path", sep="\t", header = F, strip.white = T)

# names columns
names(sbc016formants.df) <- c("fileName", "start", "end", "vowelLabel")

# list of values I want to remove
list16 <- c(615.162, 775.885)

# produces a subset of data - removes rows with values from list 16 in the start column
sbc016formants.df <- subset(sbc016formants.df, !start %in% list16)

which produces this error message for some, but not all of my data files:

Error in match(x, table, nomatch = 0L) : 
'match' requires vector arguments

I also tried this, based on the second answer in this topic

sbc002formants.df <- sbc002formants.df[ apply(sbc002formants.df, 1 , function(x) any(unlist(x) %in% list2) ) , ]

And this gets rid of some of the items on the list (list16), but not all. I wanted to use the first answer, but I don't understand the code (I'm not sure what bl is, in the example).

Here is the code to make a reproducible example:

# creates dataframe
fileName <- c("sbc016", "sbc016", "sbc016", "sbc016")
start <- c(1.345, 2.345, 615.162, 775.885)
end <- c(100.345, 200.345, 715.162, 875.885)
sbc016formants.df <- data.frame(fileName, start, end)

# list of what I want to get rid of
list16 <- c(615.162, 775.885)
8
  • try sbc016formants.df[!(sbc016formants.df$start %in% list16),]? Commented Sep 25, 2016 at 19:22
  • I tried to reproduce the error but I do not get an error message Commented Sep 25, 2016 at 19:22
  • @aichao, this doesn't produce any error message, but it does not do the subsetting, either. Commented Sep 25, 2016 at 19:35
  • 1
    @aichaos comment does sub-setting and it works on your example data Commented Sep 25, 2016 at 19:41
  • Hmm, it does for me on your reproducible example. Also, I agree with @Pieter that your subset command, which is equivalent, does not produce an error on your reproducible example. So, we are forced to conclude that there is something with your data that is not the same as your reproducible example. Commented Sep 25, 2016 at 19:43

1 Answer 1

1

Presuming I understand the question correctly, dplyr should be able to do this easily and efficiently.

fileName <- c("sbc016", "sbc016", "sbc016", "sbc016")
start <- c(1.345, 2.345, 615.162, 775.885)
end <- c(100.345, 200.345, 715.162, 875.885)
sbc016formants.df <- data.frame(fileName, start, end)

# list of what I want to get rid of
list16 <- c(615.162, 775.885)

install.packages("dplyr", dependencies = TRUE)
library(dplyr)
sbc016formants.df %>% filter(!start %in% list16)

or

sbc016formants.df %>% filter(start != list16)
Sign up to request clarification or add additional context in comments.

1 Comment

This does work, although I'm still not sure why previous solutions failed. Thank you!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.