I think it's fair to say that most academics who learn about R do so in the process of training or
applying quantitative research methods. As a consequence, knowledge of R among academics tends to be limited to core
(base) R packages (R Core Team, 2018) and a small handful of speciality statistical packages, e.g., {lavaan}, {lme4},
{MASS}, {car}, etc. With this in mind, the goal of this post is to provide an overview of three things to know beyond
base R.
Recently I was asked if I could add to {rtweet} some basic functions for converting Twitter data
into network data objects. I thought this was a reasonable request and a good opportunity for me to learn more about
network analysis. But the task of converting Twitter data into network-friendly objects is something that has, at least
for me, been really slow and inefficient. So, for the past several weeks, I've been slowly working toward what I think
believe a simple but efficient solution.
This post describes how to download and perform a basic local install of R and Rstudio. The
instructions should work for both macOS and Windows users. Although not required, installation tends to work best when
operating systems are up-to-date. At the time of writing, this means R/Rstudio work best with macOS High Sierra and
Windows 10. R vs Rstudio R is a statistical computing language/environment. It is distinct from Rstudio, which is an
integrated development environment (IDE) or high- powered graphical user interface (GUI) optimized for working with the
R language.
I have long been a fan of R-bloggers, a content aggregating site focused on blog posts about R. It
serves a useful purpose1 and has considerable reach.2 But in the first version of this blog post, I actually wrote a
lengthy critique of the site where I concluded with a not-so-blunt suggestion that R-bloggers wasn't as good as it
should be. In retrospect, and after pleasant exchange about a draft of the post with Tal Galili (the creator and
operator of R-bloggers), I can confidently say my post was overly nit-picky and unrealistic in my expectations for a
benevolent blog-aggregating site like R-bloggers.
I still remember how hard it was to learn {ggplot2} after only knowing a little about R1. Sure, the
plots seemed pretty. But compared to the ways I had used R before, {ggplot2}'s syntax seemed almost counter-intuitive.
Its pipe-like + workflow–building layer-by-layer– was like nothing I had ever used before. Not to mention, I was
unfamiliar with central terms of art like “geoms” and “aesthetics”. But then again…the plots were really pretty.
A popular workflow in R uses {dplyr} to group_by() and then summarise()1 variables. It's an
intuitive and easy way to aggregate and describe data, especially along multiple dimensions. The cost of being both
powerful and user-friendly, however, is its arguably inconvenient default method for assigning names to summarized
values. As the code illustrates below, users can provide their own names when using summarize(). ## explicitly named
summarize variable mtcars %>% group_by(cyl) %>% summarize(mpg = mean(mpg)) #> # A tibble: 3 x 2 #> cyl mpg
#> <dbl> <dbl> #> 1 4 26.