2

I am trying to plot 2 lines based on 2 variables using ggplot2 in R. Here is a piece from the complete Framingham data set that I am using:

df2 = read.table(text = " number smoker   BMI   sex
98      No 27.73   Men
99      No 24.35   Men
100     No 25.60   Men
101    Yes 24.33   Men
102    Yes 27.54   Men
299     No 24.62 Women
300     No 31.02 Women
301    Yes 21.68 Women
302    Yes 19.66 Women
303    Yes 26.64 Women", sep = "", header = TRUE)

I tried the following in ggplot and got a graph that I did not intend.

ggplot(df2, aes(smoker, BMI, color=sex)) + geom_line() + geom_point()

I want there to be two lines, one for Men and one for Women. I want the point in each of the smoker categories to represent the mean for that sex group.

Any idea how to do this using this data set? I found examples on stackoverflow that worked with other data sets.

1
  • facet_grid might be helpful. Commented Feb 21, 2015 at 23:01

2 Answers 2

1

The images of your charts helped a lot in understanding what you trying to do. Using ddply with summarize from the plyr package does the same calculation as tapply but returns the result in a data frame that ggplot can use directly. Given that different data is used in the two examples, the code below seems to reproduce your chart in R:

 library(plyr)
 df3 <- ddply(df2,.(sex, smoker), summarize, BMI_mean=mean(BMI))
 ggplot(df3,aes(as.numeric(smoker), BMI_mean, color=sex)) + geom_line() + 
       scale_x_discrete("Current Sig Smoker Y/N", labels=levels(df3$smoker)) +
       labs(y="Mean Body Mass Index (kg/(M*M)", color="SEX")

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

This is great! I see that ddply is much more direct than using tapply and then building the data frame. Thank you so much for your help!
1

I found a way to do it, but I am still looking for a smarter way if anyone can assist.

df3 <- with(df, tapply(BMI, list(smoker, sex), mean))
smoker <- c("No", "Yes", "No", "Yes")
sex <- c("Men", "Men", "Women", "Women")
BMI <- c(df3[1,1], df3[2,1], df3[1,2], df3[2,2])
df4 <- data.frame(smoker, sex, BMI)
ggplot(df4, aes(smoker, BMI, color=sex)) + geom_line(aes(group=sex)) + geom_point()

Correct R plot

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.