6

I have a data frame (see below) that shows sales by region by year. The final column calculates the sum of all the sales in the region over the three year period.

I am new to R and would like use ggplot to create a SINGLE scatter plot to analyze the data. The x-axis would be the three years and the y-axis would sales.

Ideally, each region would have its own line with points (other than a few NAs) in 2013, 2014, 2015, and 2016. I would then like to color each line based on its region. The sum column should not appear on the plot. Any ideas?

df <- structure(list(Region = structure(1:6, 
                                  .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", 
                                             "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"), 
                                  class = "factor"), 
               "2016" = c(8758.82, 25559.89, 30848.02, 8696.99, 3621.12, 5468.76), 
               "2015" = c(26521.67, 89544.93, 92825.55, 28916.4, 14004.54, 16618.38), 
               "2014" = c(NA, NA, 199673.73, 37108.09, 16909.87, 20610.58), 
               "2013" = c(27605.35, NA, 78794.31, 31824.75, 17990.21, 17307.11), 
               "Total Sales" = c(35280.49, 115104.82, 323347.3, 74721.48, 34535.53, 42697.72)), 
          row.names = c(NA, 6L), class = "data.frame") 

enter image description here

5
  • Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use str(), head() or screenshot)? You can use the reprex and datapasta packages to assist you with that. See also Help me Help you & How to make a great R reproducible example? Commented Oct 13, 2018 at 4:42
  • I tried to attach a picture, but it says I do not have permission. Is the data not visible? Commented Oct 13, 2018 at 4:47
  • 3
    Please read the links I posted above. Picture or screenshot is not helpful as we won't be able to copy and paste it to our R session Commented Oct 13, 2018 at 4:50
  • 1
    structure(list(Region = structure(1:6, .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"), class = "factor"), 2016 = c(8758.82, 25559.89, 30848.02, 8696.99, 3621.12, 5468.76), 2015 = c(26521.67, 89544.93, 92825.55, 28916.4, 14004.54, 16618.38), 2014 = c(NA, NA, 199673.73, 37108.09, 16909.87, 20610.58), 2013 = c(27605.35, NA, 78794.31, 31824.75, 17990.21, 17307.11), Total Sales = c(35280.49, 115104.82, 323347.3, 74721.48, 34535.53, 42697.72)), row.names = c(NA, 6L), class = "data.frame") Commented Oct 13, 2018 at 4:54
  • 2
    Very sorry I completely misunderstood your comment. Alas, it is difficult to hide being a nooby! Does what I provided in my previous comment work? If not, I will try another way. Commented Oct 13, 2018 at 4:55

1 Answer 1

8

Your data is in wide format so it's better to convert it to long format to work with ggplot. Here I use tidyr::gather() to do that

library(tidyr)
library(ggplot2)

df_long <- df %>% 
  gather(Year, Sales, -Region)
df_long
#>    Region        Year     Sales
#> 1       A        2016   8758.82
#> 2       B        2016  25559.89
#> 3       C        2016  30848.02
#> 4       D        2016   8696.99
#> 5       E        2016   3621.12
#> 6       F        2016   5468.76
#> 7       A        2015  26521.67
#> 8       B        2015  89544.93
#> 9       C        2015  92825.55
#> 10      D        2015  28916.40
#> 11      E        2015  14004.54
#> 12      F        2015  16618.38
#> 13      A        2014        NA
#> 14      B        2014        NA
#> 15      C        2014 199673.73
#> 16      D        2014  37108.09
#> 17      E        2014  16909.87
#> 18      F        2014  20610.58
#> 19      A        2013  27605.35
#> 20      B        2013        NA
#> 21      C        2013  78794.31
#> 22      D        2013  31824.75
#> 23      E        2013  17990.21
#> 24      F        2013  17307.11
#> 25      A Total Sales  35280.49
#> 26      B Total Sales 115104.82
#> 27      C Total Sales 323347.30
#> 28      D Total Sales  74721.48
#> 29      E Total Sales  34535.53
#> 30      F Total Sales  42697.72

Plot: specify color = Region and group = Region inside aes so ggplot knows how to pick color and draw lines

ggplot(df_long, aes(x = Year, y = Sales, color = Region, group = Region)) +
  geom_point() +
  geom_line() +
  scale_color_brewer(palette = 'Dark2') +
  theme_classic(base_size = 12)
#> Warning: Removed 3 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_path).

Can also use facet_grid()

ggplot(df_long, aes(x = Year, y = Sales, group = Region)) +
  geom_point() +
  geom_line() +
  facet_grid(Region ~., scales = 'free_y') +
  theme_bw(base_size = 12)
#> Warning: Removed 3 rows containing missing values (geom_point).
#> Warning: Removed 2 rows containing missing values (geom_path).

Created on 2018-10-12 by the reprex package (v0.2.1.9000)

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.