1. Introduction
ggplot is one of the most famous library in R and I use it very ofen in daily workflow. But there are three topics I seldomly touch before: legend, label and font size.
One reason is that they are not a necessity in out plot. But I believe it is good to be packed in our backpocket. Besides, in some business situation, they will be very useful.
In this article, we will use open data nycflights13::flights as an example.
For those who are not familiar with ggplot or tidyverse: ggplot is included in tidyverse, and because we will deal with some date data, we also use lubridate.
library(tidyverse) library(lubridate) (flights <- nycflights13::flights %>% filter(month %in% c(1), carrier %in% c("AA", "UA", "DL")))
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour minute time_hour <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dttm> 1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR IAH 227 1400 5 15 2013-01-01 05:00:00 2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA IAH 227 1416 5 29 2013-01-01 05:00:00 3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK MIA 160 1089 5 40 2013-01-01 05:00:00 4 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA ATL 116 762 6 0 2013-01-01 06:00:00 5 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR ORD 150 719 5 58 2013-01-01 05:00:00 6 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA ORD 138 733 6 0 2013-01-01 06:00:00 7 2013 1 1 558 600 -2 924 917 7 UA 194 N29129 JFK LAX 345 2475 6 0 2013-01-01 06:00:00 8 2013 1 1 558 600 -2 923 937 -14 UA 1124 N53441 EWR SFO 361 2565 6 0 2013-01-01 06:00:00 9 2013 1 1 559 600 -1 941 910 31 AA 707 N3DUAA LGA DFW 257 1389 6 0 2013-01-01 06:00:00 10 2013 1 1 559 600 -1 854 902 -8 UA 1187 N76515 EWR LAS 337 2227 6 0 2013-01-01 06:00:00 # ... with 11,111 more rows
2. Legend
2.1 use at least one aesthetic to make legend comes up
If we want a legend, the easiest way is to use at least one aesthetic.
That means in aes() we are not only giving x= and y=, we will also specify other argument like color=.
# example 1 flights %>% mutate(date=make_date(year, month, day)) %>% group_by(date, carrier) %>% summarise(delay=mean(dep_delay, na.rm=TRUE)) %>% ggplot() + geom_line(aes(x=date, y=delay, color=carrier))
It may be natural to write pipelines like above at exploring, but to make things clear, I equally change above code to below. That will have same result.
# it may be natural to write pipelines like above at exploring, # but before everything goes too crazy, let us do it separately. (delay <- flights %>% mutate(date=make_date(year, month, day)) %>% group_by(date, carrier) %>% summarise(delay=mean(dep_delay, na.rm=TRUE))) delay %>% ggplot() + geom_line(aes(x=date, y=delay, color=carrier))
Now let's have a look at other arguments.
For example, this is how linetype= will affect legend.(example2)
And good thing is they can be mixed and matched.(exmaple3)
# example 2 delay %>% ggplot() + geom_line(aes(x=date, y=delay, linetype=carrier))
# example 3 delay %>% ggplot() + geom_line(aes(x=date, y=delay, color=carrier, linetype=carrier))
2.2 manually change legend behaviour
We always want to change color or orders or positions of legend, how we can do that?
Most of them can be changed by scale_color_manual() or theme().
# change color delay %>% ggplot() + geom_line(aes(x=date, y=delay, color=carrier)) + scale_color_manual(values=c("black", "red", "orange"))
To make our code more clear, I equally change above code to below. They will have same result.
# or, equally base <- delay %>% ggplot() + geom_line(aes(x=date, y=delay, color=carrier)) base + scale_color_manual(values=c("black", "red", "orange"))
I often found changing color is useful than other thing, because some companies have their particular brand color.
Besides to that, people often require changing legend order. For example, "our company" at first line, our archrival at second line, and so on.
# change legend order base + scale_color_manual(values=c("black", "red", "orange"), breaks=c("DL", "UA", "AA"))
In above examples, legend text are created by ggplot and they are from original dataset. We can change that if we need.
# change legend text base + scale_color_manual(values=c("black", "red", "orange"), name="Airlines", # can set to NULL breaks=c("DL", "UA", "AA"), labels=c("Delta", "United", "American"))
The position of legend can be changed by below method.
# change legend position base + theme(legend.position="bottom", legend.title=element_blank()) # legend.title cannot set to NULL
2.3 For a solid shape(like bar-chart), use this one instead
We can use scale_fill_manual() inistead. All the other arguments are all the same as above.
base2 <- flights %>% group_by(carrier) %>% summarise(delay=mean(dep_delay, na.rm=TRUE)) %>% ggplot() + geom_col(aes(x=carrier, y=delay, fill=carrier)) base2 + scale_fill_manual(values=c("black", "red", "orange"))
3. Label
3.1 Add Pure Number
In Python, I wrote an article about how to add label on a matplotlib base plot. That's not so easy nor so straightforward.
But ggplot is very well designed that we can easily do what we want.
Most of time we only need geom_text().
# add number base <- delay %>% ggplot() + geom_line(aes(x=date, y=delay, color=carrier)) base + geom_text(aes(x=date, y=delay, label=round(delay, 0)))
This method works for bar-chart and other solid plot as well.
If we want some offset to make the number more clear, we can use nudge_y=.
# it also works for bar chart base2 <- flights %>% group_by(carrier) %>% summarise(delay=mean(dep_delay, na.rm=TRUE)) %>% ggplot() + geom_col(aes(x=carrier, y=delay, fill=carrier)) # use nudge_y=... to do some offset base2 + geom_text(aes(x=carrier, y=delay, label=round(delay, 0)), nudge_y=0.5)
3.2 Add Number with a small backborad
Just like geom_text(), there is another function called geom_label().
# add number with little backboard base + geom_label(aes(x=date, y=delay, label=round(delay, 0))) base2 + geom_label(aes(x=carrier, y=delay, label=round(delay, 0)), nudge_y=0.5)
3.3 dummy variables
geom_text() can also works with dummy variables. That means we can create a series most of them are NA, but only special one have values.
With this method we can easily put emphasis on a special point without showing all label at onces.
len <- length(delay$delay) null_labels <- rep(NA, len-1) null_labels <- c(null_labels, round(delay$delay[len-1])) base + geom_label(aes(x=date, y=delay, label=null_labels))
4. Font size(and colors)
Font size and color can be change with theme() functions.
4.1 Font size can be changed all together
# use theme_*() base + theme_grey(base_size = 17)
4.2 Or we can change font size individually
# use theme() base + theme(axis.title.x=element_text(color="orange", size=17), axis.text.x=element_text(color="blue", size=17))