0

I am trying to create subsets of 3 original dataframes (data_A, data_B, data_c) based on the value of a certain variable that is shared across those datasets (i.e. workhours). The value of the variable under which I want to create subsets will be the same across datasets. I want the created subsets to be labeled as Dataset_1 to Dataset_11 for subsets of data_A, Dataset_12 to Dataset_22 for subsets of data_B, and Dataset_23 to Dataset_33 for subsets of data_C.

Right now I have the following solution:

for (i in 1:11){
  assign(paste0("Dataset_",i), subset(data_A, workhours>=(0+(i-1)*5)))
}

for (i in 12:22){
  assign(paste0("Dataset_",i), subset(data_B, workhours>=(0+(i-12)*5)))
}

for (i in 23:33){
  assign(paste0("Dataset_",i), subset(data_C, workhours>=(0+(i-23)*5)))
}

This works fine. However, is it possible to use merely 1 loop as opposed to 3?

EDIT:

solution:

for (i in 1:11){
    assign(paste0("Dataset_",i), subset(data_A, workhours>=((i-11)*5)))
    assign(paste0("Dataset_",i+11), subset(data_B, workhours>=((i)*5)))
    assign(paste0("Dataset_",i+23), subset(data_C, workhours>=((i)*5)))
}

another solution in can be found below

2
  • 1
    On your working example: 1. 0+ in all loops doesn't do anything. 2. Is it possible that in the first loop it should be i-1 instead of i-11? Commented Jan 28, 2020 at 9:24
  • that is correct :) Commented Jan 28, 2020 at 9:35

3 Answers 3

2

I think you can use lapply over list of dataframes and then use split with findInterval to split each dataframe into multiple dataframes.

bob <-list(data_A, data_B, data_C)
values <- seq(0, 50, 5)

temp <- unlist(lapply(bob, function(x) 
              split(x, findInterval(x$workhours, values))), recursive = FALSE)
names(temp) <- paste0('Dataset_', 1:33)

It is better to keep data in a list instead of polluting the global environment, however, if you still need them as separate dataframes we can use list2env.

list2env(temp, .GlobalEnv)
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Ronak, could you elaborate a bit on why it's not a good idea to pollute the global environment?
@Jeroen because lists are easier to manage. In the above case you need to manage only one object i.e temp instead of Dataset_1, Dataset_2, Dataset_3 ... etc.
1
for (i in 1:11){
    assign(paste0("Dataset_",i), subset(data_A, workhours>=((i-11)*5)))
    assign(paste0("Dataset_",i+11), subset(data_B, workhours>=((i)*5)))
    assign(paste0("Dataset_",i+23), subset(data_C, workhours>=((i)*5)))
}

Comments

1

Try

       j in names(bob)

Right now, your are looping over the whole list in the j loop.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.