0

I am looking to run multiple ANOVAs in R, so I was hoping to write a function.

df = iris

run_anova <- function(var1,var2,df) {
  fit = aov(var1 ~ var1 , df)
  return(fit)
}

In the iris dataset, the column names are "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

Assuming that I want to use these columns in the equations, how do I pass them into the run_anova function? I have tried passing them in as strings

run_anova("Sepal.Width", "Petal.Length", df)

that doesn't work because this error appears: "In storage.mode(v) <- "double" :"

run_anova(Sepal.Width, Petal.Length, df)

When I just pass them in without the quotes, "not found". How can I pass these names of the df columns into the function?

Many thanks in advance for your help.

2
  • 2
    Would try building a formula from the variable name strings: fit = aov(as.formula(paste(var1, "~", var2)) , df) Commented Feb 24, 2020 at 2:46
  • 1
    or reformulate(var2, var1) for short Commented Feb 24, 2020 at 3:10

2 Answers 2

1

1) Use reformulate to create the formula. The do.call is needed to cause the Call: line in the output to appear nicely but if you don't care about that you can use the shorter version shown in (3).

run_anova <- function(var1, var2, df) {
  fo <- reformulate(var2, var1)
  do.call("aov", list(fo, substitute(df)))
}

run_anova("Sepal.Width", "Petal.Length", iris)

giving

Call:
   aov(formula = Sepal.Width ~ Petal.Length, data = iris)    

Terms:
                Petal.Length Residuals
Sum of Squares      5.196047 23.110887
Deg. of Freedom            1       148

Residual standard error: 0.3951641
Estimated effects may be unbalanced

2) Although the use of eval is discouraged, an alternative which also gives nice output is:

run_anova2 <- function(var1, var2, df) {
  fo <- reformulate(var2, var1)
  eval.parent(substitute(aov(fo, df)))
}

run_anova2("Sepal.Width", "Petal.Length", iris)

3) If you don't care about the Call line in the output being nice then this simpler code can be used:

run_anova3 <- function(var1, var2, df) {
  fo <- reformulate(var2, var1)
  aov(fo, df)
}

run_anova3("Sepal.Width", "Petal.Length", iris)

giving:

Call:
   aov(formula = fo, data = df)
...etc...
Sign up to request clarification or add additional context in comments.

1 Comment

This is really helpful, thank you so much for taking the time to respond
0

An alternative is to use rlang's quasi-quotation syntax

df = iris

library(rlang)
run_anova <- function(var1, var2, df) {
    var1 <- parse_expr(quo_name(enquo(var1)))
    var2 <- parse_expr(quo_name(enquo(var2)))
    eval_tidy(expr(aov(!!var1 ~ !!var2, data = df)))
}

This allows you to do use both strings and unquoted expressions for var1 and var2:

run_anova("Sepal.Width", "Petal.Length", df)
run_anova(Sepal.Width, Petal.Length, df)

Both expressions return the same result.

4 Comments

@G.Grothendieck True; but results are the same. The advantage here is that we can use quoted and unquoted expressions.
@G.Grothendieck Yes I understood your comment. It shows data = df in the call element of the return list but it does return the correct aov with whatever data.frame you gave for the df argument of run_anova. The only difference between the answers is that the call element is different. Results are the same.
@G.Grothendieck PS. To make things consistent with your results, I have swapped var1 and var2; I had originally var2 ~ var1, you have var1 ~ var2 (OP unfortunately had a typo with var1 ~ var1).
@G.Grothendieck That's not really what OP was asking though. OP tried passing columns names as strings and as unquoted expressions. My answer allows him to do both (or rather: either).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.