How to use a for loop to extract columns from a data frame

Question

I am trying to use a for loop to extract columns from a dataframe (named table1) and create a new dataframe (named smalldata) with the latter having only these 3 columns (ID1, ID2, ID3). I have included my code below which does not work.

for (i in 1:3) {
  idlist[[i]] <- table1$ID[i]
}
smalldata <- do.call(cbind, idlist)
View(smalldata)

Can [i] be used with $ in a dataframe to extract these columns in the for loop?

Edit: The reason for doing a loop is that I my column names are sequentially named. For example: I have ID1-ID100, EVENT1-EVENT100, EXP1-EXP100. What I want to do in this example is create 100 data sets. First, I want to pull ID1, EVENT1, EXP1 and create a datasets and export. Then I want to pull ID2, EVENT2, EXP2 and export and so on. Any additional input is appreciated.

smalldata <- table1[1:3] achieves what you want without loops — Allan Cameron
– Allan Cameron, Commented May 20, 2022 at 19:58
Hi Allan. Thanks. Without going through additional subsequent steps, I need this to be done through a loop. I want to extract the columns form a big data frame with many variables and I want to repeat this process a few times for other variables that are sequentially ordered as well (example: EVENT1, EVENT2, .....) — Jay
– Jay, Commented May 20, 2022 at 20:12
There are lots of ways to do this without loops. In general, one should avoid loops in R where possible and use vectorised expressions. These mean that looping type operations are done much more quickly in the underlying compiled code. If you have a more complex problem than the one above, please feel free to edit your question so we can have a look at efficient solutions. — Allan Cameron
– Allan Cameron, Commented May 20, 2022 at 20:31
As well as more quickly, it promotes thinking in R where you have structured data and you let R USE the information already inherent in the structures to take care of all the finding. selecting, changing, and looping--yes often far, far more efficiently--under the hood. But to my mind, the important issue is keeping imperative step-by-step-by-step programming at a minimum and using functional and even O-O verbs do the imperative stuff. Until you make this cognitive change, you really miss most of the gains R is designed to give. — John Garland
– John Garland, Commented May 20, 2022 at 20:51
Thanks John and Allan. I am a SAS programmer with some R experience. I'll keep this in mind. — Jay
– Jay, Commented May 21, 2022 at 12:51

jordan · Accepted Answer · 2022-05-22 17:19:34Z

If you must do it with a for loop, you could work off this:

new <- list()      # construct as list -- data.frames are fancy lists
cols <- c(1, 5, 3) # use a vector of column indices
for (i in seq_along(cols)) {
  # append the list at each column
  new[[i]] <- mtcars[, cols[i], drop = FALSE]
}

new <- as.data.frame(new)      # make list into data.frame
identical(new, mtcars[, cols]) # check that this produces the same thing
#> [1] TRUE
head(new)
#>                    mpg drat disp
#> Mazda RX4         21.0 3.90  160
#> Mazda RX4 Wag     21.0 3.90  160
#> Datsun 710        22.8 3.85  108
#> Hornet 4 Drive    21.4 3.08  258
#> Hornet Sportabout 18.7 3.15  360
#> Valiant           18.1 2.76  225
str(new)
#> 'data.frame':    32 obs. of  3 variables:
#>  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ disp: num  160 160 108 258 360 ...

^{Created on 2022-05-20 by the reprex package (v2.0.1)}

Edits

With more information, the below should work. However, the for loops don't seem necessary and the apply family functions seem good enough. Hopefully if a for loop is necessary for you process then the combination of these will be enough to get you what you need.

data <- Reduce(
  cbind,
  lapply(
    1:20,
    function(i) {
      out <- data.frame(
        id = order(runif(5)),
        event = runif(5) < .5,
        other_col = runif(5)
      )
      colnames(out) <- paste0(colnames(out), i)
      out
    }
  )
)

# just a quick peak
str(data[, c(1:3, 9:12, 21:24)])
#> 'data.frame':    5 obs. of  11 variables:
#>  $ id1       : int  3 2 1 4 5
#>  $ event1    : logi  FALSE FALSE TRUE TRUE FALSE
#>  $ other_col1: num  0.617 0.951 0.511 0.185 0.667
#>  $ other_col3: num  0.6856 0.0524 0.5786 0.9265 0.2291
#>  $ id4       : int  4 2 1 5 3
#>  $ event4    : logi  TRUE TRUE FALSE FALSE FALSE
#>  $ other_col4: num  0.0849 0.8345 0.8465 0.1958 0.2534
#>  $ other_col7: num  0.656 0.353 0.604 0.973 0.381
#>  $ id8       : int  2 3 5 4 1
#>  $ event8    : logi  TRUE FALSE FALSE TRUE TRUE
#>  $ other_col8: num  0.646 0.693 0.534 0.624 0.625

result <- lapply(1:20, function(i) {
  # make pattern (must have letters before number)
  pattern <- paste0("[a-z]", i, "$") 
  
  # find the column indeces that match the pattern
  ind <- grep(pattern, colnames(data))
  
  # extract those indices
  res <- data[, ind, ]
  
  # optional: rename columns
  colnames(res) <- sub(paste0(i, "$"), "", colnames(res))
  res
})

head(result)
#> [[1]]
#>   id event other_col
#> 1  3 FALSE 0.6174577
#> 2  2 FALSE 0.9509916
#> 3  1  TRUE 0.5107370
#> 4  4  TRUE 0.1851543
#> 5  5 FALSE 0.6670226
#> 
#> [[2]]
#>   id event other_col
#> 1  3  TRUE 0.8261719
#> 2  4 FALSE 0.4171351
#> 3  1  TRUE 0.5640345
#> 4  5  TRUE 0.6825371
#> 5  2 FALSE 0.4381013
#> 
#> [[3]]
#>   id event  other_col
#> 1  4 FALSE 0.68559712
#> 2  3 FALSE 0.05241906
#> 3  2 FALSE 0.57857342
#> 4  1  TRUE 0.92649458
#> 5  5  TRUE 0.22908630
#> 
#> [[4]]
#>   id event  other_col
#> 1  4  TRUE 0.08491369
#> 2  2  TRUE 0.83452439
#> 3  1 FALSE 0.84650621
#> 4  5 FALSE 0.19578470
#> 5  3 FALSE 0.25342999
#> 
#> [[5]]
#>   id event other_col
#> 1  4 FALSE 0.8912857
#> 2  1 FALSE 0.1261470
#> 3  3 FALSE 0.7962369
#> 4  5  TRUE 0.3911494
#> 5  2 FALSE 0.6041862
#> 
#> [[6]]
#>   id event other_col
#> 1  4  TRUE 0.8987728
#> 2  2  TRUE 0.2830371
#> 3  5 FALSE 0.6696249
#> 4  3 FALSE 0.6249742
#> 5  1 FALSE 0.4754757

^{Created on 2022-05-22 by the reprex package (v2.0.1)}

Thanks. I added more explanation in my edits of what I'm trying to do with a loop.
I've added another example for pulling out columns based on the "id" appended at the end of the column name. The regular expression may have to be adapted a bit for your specific column names. Not sure a loop is really needed but happy to update the second example if necessary.

Evan Cutler Anway · Accepted Answer · 2022-05-21 14:43:17Z

After seeing your edits, here's an answer that doesn't directly answer your question but does solve your problem. Generally, I reformat your data into a long format and then export by each group.

df_main <- data.frame(
  id = 1:26, # you need a row ID so you can unpivot
  ID1 = sample(letters, 26),
  event1 = sample(1:26),
  ID2 = sample(letters, 26),
  event2 = sample(1:26)
)

library(tidyr)

df_pivot <- df_main |> 
  pivot_longer(
    # don't pivot the ID column
    cols = c(everything(), -id), names_to = c("type", "number"), 
    # transform values into lists so characters and integers can be in the same column
    names_pattern = "([A-z]+)(\\d+)", values_transform = as.list
  ) |> 
  pivot_wider(names_from = type, values_from = value)

library(dplyr)

df_nested <- df_pivot |> 
  group_by(number) |> 
  nest()

library(purrr)

export_data <- function(number, data) {
  # write.xlsx for exporting, maybe
  # could include the number in the file name
  print(number)
  print(data)
}

df_nested |> 
  with(
    walk2(number, data, export_data)
  )

Old:

Sounds like a good use case for dplyr::select

library(dplyr)

# character vector of column names
vec_column_names <- c("Species", "Petal.Width")

df_small <- iris |> 
  select(all_of(vec_column_names))

# or a vector of column positions
df_small <- iris |> 
  select(1:3)

Thanks evan. I added additional info on what I am trying to achieve.
Thanks again Evan. I definitely see you strategy. This batch of code gives me an error about combining character and integers.
the values_transform = as.list argument in pivot_wider should make both of those columns the same type: a list

Collectives™ on Stack Overflow

How to use a for loop to extract columns from a data frame

2 Answers 2

Edits

2 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Edits

2 Comments

3 Comments

Related