1

I have a number of dataframes (imported from CSV) that have the same structure. I would like to loop through all these dataframes and keep only two of these columns.

The loop below does not seem to work, any ideas why? Would ideally like to do this using a loop as I am trying to get better at using these.

frames <- ls()

for (frame in frames){ 
frame   <- subset(frame, select = c("Col_A","Col_B"))
 }

Cheers in advance for any advice.

6
  • 1) it is list, not ls; 2) you need to supply the indices to frame, as in frames[[frame]] <- subset(...); 3) "frame in frames" doesn't make sense since you just created a null list with frames <- list()-- it should be like for (frame in 1:5) Commented May 2, 2014 at 15:36
  • well, unless you intended for the list to be ls, which I would not recommend Commented May 2, 2014 at 15:40
  • The ls() part is working fine: 'frames' contains a list of all the dataframes to be operated on, however the loops gives an error of 'argument "subset" is missing, with no default' Commented May 2, 2014 at 16:06
  • You could use argument colClasses to only read the columns you want. Details here. Commented May 2, 2014 at 16:20
  • 2
    Suggestion: rm(list = ls()) followed by frames <- lapply(files, read.table), where files is a vector of file names Commented May 2, 2014 at 16:39

2 Answers 2

1

For anyone interested I used Richard Scriven's idea of reading in the dataframes as one object, with a function added that showed where the file had been imported from. This allowed me to then use the Plyr package to manipulate the data:

library(plyr)

dataframes <- list.files(path = TEESMDIR, full.names = TRUE)

## Define a function to add the filename to the dataframe

read_csv_filename <- function(filename){
  ret <- read.csv(filename)
  ret$Source <- filename #EDIT
  ret
}

list_dataframes <- ldply(dataframes, read_csv_filename)

selection <- llply(list_dataframes, subset, select = c(var1,var3))
Sign up to request clarification or add additional context in comments.

Comments

0

The basic problem is that ls() returns a character vector of all the names of the objects in your environment, not the objects themselves. To get and replace an object using a character variable containing it's name, you can use the get()/assign() functions. You could re-write your function as

frames <- ls()

for (frame in frames){ 
    assign(frame, subset(get(frame), select = c("Col_A","Col_B")))
}

3 Comments

While this does directly answer your question, I would not recommend this strategy for working with multiple data.frames. If they are similar, reading them into a common list is a much better idea.
assign can be dangerous as well.
Thanks. It seems difficult to work with data in this way with R, and the consensus is to read all CSVs in as one object. The problem with this approach is where pre-processing needs to occur on the data before bulk manipulation can occur.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.