1

I'm quite new to R and I've been learning with the available resources on the internet. I came across this issue where I have a vector (a) with vars "1", "2", and "3". I want to use the count function to generate a new df with the categories for each of those variables and its frequencies. The function I want to use in a loop is this

b <- count(mydata, var1)

However, when I use this loop below;

for (i in (a)) {
    'j' <- count(mydata[, i])
    print (j)
}

The loop happens but the frequencies which gets saved on j is only of the categorical variable "var 3". Can someone assist me on this code please?

TIA!

1
  • In general the idea is not to use a loop here, but rather to pivot your table into long form, group by former columns, and count the result: mydata %>% pivot_longer(all_of(a), names_to = 'Var', values_to = 'Value') %>% group_by(Var) %>% count(Value). Commented Aug 13, 2021 at 17:14

1 Answer 1

3

In R there are generally better ways than to use loops to process data. In your particular case, the “straightforward” way fails, because the idea of the “tidyverse” is to have the data in tidy format (I highly recommend you read this article; it’s somewhat long but its explanation is really fundamental for any kind of data processing, even beyond the tidyverse). But (from the perspective of your code) your data is spread across multiple columns (wide format) rather than being in a single column (long form).

The other issue is that count (like many other tidyverse functions) expect an unevaluated column name. It does not accept the column name via a variable. akrun’s answer shows how you can work around this (using tidy evaluation and the bang-bang operator) but that’s a workaround that’s not necessary here.

The usual solution, instead of using a loop, would first require you to bring your data into long form, using pivot_longer.

After that, you can perform a single count on your data:

result <- mydata %>%
    pivot_longer(all_of(a), names_to = 'Var', values_to = 'Value') %>%
    count(Var, Value)

Some comments regarding your current approach:

  1. Be wary of cryptic variable names: what are i, j and a? Use concise but descriptive variable names. There are some conventions where i and j are used but, if so, they almost exclusively refer to index variables in a loop over vector indices. Using them differently is therefore quite misleading.
  2. There’s generally no need to put parentheses around a variable name in R (except when that name is the sole argument to a function call). That is, instead of for (i in (a)) it’s conventional to write for (i in a).
  3. Don’t put quotes around your variable names! R happens to accept the code 'j' <- … but since quotes normally signify string literals, its use here is incredibly misleading, and additionally doesn’t serve a purpose.
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.