0

I want to select a column from a R dataframe through a function, for example :

my_new_df <- function(input_1, input_2) {
df <- input_1
col <- noquote(input_2)
df_new <- df$col
return(df_new)
 }

my_new_df(mtcars, "mpg")
-> NULL

Could you explain me please why "$" is not working in a function? Thanks

3
  • Try df_new <- df[ input2 ] Commented Oct 22, 2018 at 12:36
  • Error in [.data.frame(df, input2) : object 'input2' not found Commented Oct 22, 2018 at 13:46
  • noquote only controls how a string is printed on the console, not how it’s treated in code. You may be confusing it with the concept of quasi-quotations, which deal with the computation of unevaluated expressions. Commented Oct 22, 2018 at 15:27

3 Answers 3

1

Sorry I was sloppy in my comment so let's put it all together:

(1) If you want one column as a vector in return, use Soeren D.'s solution

(2) If you want a versatile solution that gives you the choice of getting a vector, or a data.frame with one or more columns back, use gpier's solution

(3) If you want one column as a data.frame in return, use

my_new_df <- function(input_1, input_2) 
{
    df <- input_1
    df_new <- df[ input_2 ]
    return(df_new)
}

You need the line return(df_new) only if you want to display the result on screen. If your goal is just to assign it to another variable, you could omit it.

Sign up to request clarification or add additional context in comments.

3 Comments

Your explanation of what return does is incorrect (in reality it does nothing here). In proper R style, the function would be written simply as {input_1[input_2]} (optionally omitting the redundant braces, and, of course, choosing appropriate parameter names).
@Konrad Rudolph - I would have expected that return() does nothing as you say, but try it out - define two functions, one with the line return(df_new) and one without. At least on my machine, the first one prints the column on screen, the other doesn't. That is true only when the return value is not assigned to another variable. Arguably such a function is not needed in the first place, but that's how the OP got started.
This has nothing to do with return. It’s purely due to the fact that assignment returns its result invisibly so if the last expression in your function is an assignment, then the function also returns an invisible result — same as writing invisible(df_new). But there’s no difference between return(df_new) and plain df_new. Or, while we’re at it, without the assignment (i.e. just having df[input_2] as the last expression).
1

You can use the [[ operator to select a column based on the name.

my_new_df <- function(input_1, input_2) {
df <- input_1

df_new <- df[[input_2]]
return(df_new)
}

Edit: df[["input_2"]] is equivalent to df$input_2 , both return a vector, not a data.frame. If returning a data.frame is required please refer to the other answer by gpier.

3 Comments

This gives him a vector, new_df sounds as if he wants a data.frame
@vaettchen: That is correct, but so does $. I've edited the answer accordingly.
They’re equivalent but (unfortunately) not exactly equal.
1

You could do this:

my_new_df <- function(input_1, input_2) {
  col <- which(colnames(input_1)%in%input_2)
  df_new <- input_1[,col]
  return(df_new)
}
my_new_df(data.frame(mtcars), "mpg")
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4

EDIT: if you'd like to keep the data.frame format you could use drop=FALSE

my_new_df <- function(input_1, input_2) {
  col <- which(colnames(input_1)%in%input_2)
  df_new <- input_1[,col, drop=FALSE]
  return(df_new)
}
my_new_df(data.frame(mtcars), "mpg")
                     mpg
Mazda RX4           21.0
Mazda RX4 Wag       21.0
Datsun 710          22.8
Hornet 4 Drive      21.4
Hornet Sportabout   18.7
Valiant             18.1
Duster 360          14.3
Merc 240D           24.4
Merc 230            22.8
Merc 280            19.2
Merc 280C           17.8
Merc 450SE          16.4
Merc 450SL          17.3
Merc 450SLC         15.2
Cadillac Fleetwood  10.4
Lincoln Continental 10.4
Chrysler Imperial   14.7
Fiat 128            32.4
Honda Civic         30.4
Toyota Corolla      33.9
Toyota Corona       21.5
Dodge Challenger    15.5
AMC Javelin         15.2
Camaro Z28          13.3
Pontiac Firebird    19.2
Fiat X1-9           27.3
Porsche 914-2       26.0
Lotus Europa        30.4
Ford Pantera L      15.8
Ferrari Dino        19.7
Maserati Bora       15.0
Volvo 142E          21.4

This would work with several column names as well.

my_new_df <- function(input_1, input_2) {
  col <- which(colnames(input_1)%in%input_2)
  df_new <- input_1[,col, drop=FALSE]
  return(df_new)
}
my_new_df(data.frame(mtcars), c("mpg", "cyl"))
                     mpg cyl
Mazda RX4           21.0   6
Mazda RX4 Wag       21.0   6
Datsun 710          22.8   4
Hornet 4 Drive      21.4   6
Hornet Sportabout   18.7   8
Valiant             18.1   6
Duster 360          14.3   8
Merc 240D           24.4   4
Merc 230            22.8   4
Merc 280            19.2   6
Merc 280C           17.8   6
Merc 450SE          16.4   8
Merc 450SL          17.3   8
Merc 450SLC         15.2   8
Cadillac Fleetwood  10.4   8
Lincoln Continental 10.4   8
Chrysler Imperial   14.7   8
Fiat 128            32.4   4
Honda Civic         30.4   4
Toyota Corolla      33.9   4
Toyota Corona       21.5   4
Dodge Challenger    15.5   8
AMC Javelin         15.2   8
Camaro Z28          13.3   8
Pontiac Firebird    19.2   8
Fiat X1-9           27.3   4
Porsche 914-2       26.0   4
Lotus Europa        30.4   4
Ford Pantera L      15.8   8
Ferrari Dino        19.7   6
Maserati Bora       15.0   8
Volvo 142E          21.4   4

1 Comment

Please don’t use the variables T and F, use the literal constants TRUE and FALSE. The former can be overridden by user code.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.