1

I have 3 different dataset (from a longitudinal study), v1, v2 and v3. Each of them has a variable of "gender".

I'd like to plot the count of each gender from each dataset in same graph (indicated by point and connected by line), i.e. x axis will be "v1", "v2" and "v3", y axis will be the count by gender.

I know I can manually create a dataset including the values I need, but I'm wondering if there is a better way? Thank you!

The sample datasets:

a <- c("boy", "girl")
v1 <- data.frame(gender=rep(a, times=c(11,9)))
v2 <- data.frame(gender=rep(a, times=c(8,8)))
v3 <- data.frame(gender=rep(a, times=c(6,4)))
5
  • I think you'll probably have to do the data manipulation yourself. If you give a short minimal reproducible example I could take a whack at it. Commented Oct 6, 2021 at 19:18
  • Because there are three dataset, I think it's a bit complicated to make a reproducible sample. But I can make three fake datasets. I added the code Commented Oct 6, 2021 at 19:31
  • ;) I mean, you could have added a variable gender to the mre data ;) Commented Oct 6, 2021 at 19:44
  • I have. Sorry I don't know how to make my question more clear, I think it's about I don't have time variable in my dataset. But I tried to plot the count of genders according to time. What I did is to make a subdata in long format. But I'm wondering if I can just grab the variables from different dataset to make the plot (then there will be only the sex variable, while ggplot needs x and y. That's the issue) Commented Oct 6, 2021 at 19:57
  • 1
    I took the liberty of modifying your MRE a little bit to make it easier to handle (as you did it, each object was a data frame with a single column with a weird name, and the names were different across the data sets. Commented Oct 6, 2021 at 20:20

1 Answer 1

0

Summarize data set:

library(tidyverse)
dd <- bind_rows(lst(v1, v2, v3), .id="dataset") %>%
    count(dataset, gender)

Plot:

ggplot(dd, aes(x=dataset, y=n, colour=gender)) + 
   geom_point() + 
   geom_line(aes(group=gender))

It's conceivable that you could do the count() step within ggplot in a sensible way (using stat_count(), which is what's used internally by geom_bar()), but this seems pretty straightforward. (If you did use stat_count() you'd probably have to repeat it for the geom_point() and geom_line() geoms ... something I keep meaning to do is to write a geom_linespoints that will draw both points and lines, with the same set of position/stats/etc.)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! The .id option helps a lot. At least this will be much easier than I use the -dplyr- and -pivot- to do a lot transformation before I'm able to plot.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.