Posts List

Labelling dataviz

I still remember how hard it was to learn {ggplot2} after only knowing a little about R1. Sure, the plots seemed pretty. But compared to the ways I had used R before, {ggplot2}’s syntax seemed almost counter-intuitive. Its pipe-like + workflow–building layer-by-layer– was like nothing I had ever used before. Not to mention, I was unfamiliar with central terms of art like “geoms” and “aesthetics”. But then again…the plots were really pretty.

Tick marks, variable names, and ggplot2

A popular workflow in R uses {dplyr} to group_by() and then summarise()1 variables. It’s an intuitive and easy way to aggregate and describe data, especially along multiple dimensions. The cost of being both powerful and user-friendly, however, is its arguably inconvenient default method for assigning names to summarized values. As the code illustrates below, users can provide their own names when using summarize(). ## explicitly named summarize variable mtcars %>% group_by(cyl) %>% summarize(mpg = mean(mpg)) #> # A tibble: 3 x 2 #> cyl mpg #> <dbl> <dbl> #> 1 4 26.