I just got back from rOpenSci’s Unconf which was an absolutely amazing experience! I’m working on a full write up of the conference but while I was there I was able to hang out with Vanderbilt University Biostatistics PhD student Lucy D’Agostino McGowan. Lucy is the Co-founder of R-Ladies Nashville, the Co-founder of the fantastic Live Free or Dichotomize blog, and the Co-creator of papr, a Tinder-like Shiny app for bioRxiv preprints.

Lucy is notorious for using emojis in her Git commit messages, so I wanted to find out which emojis she uses most often. I combined the magic of the GitHub API and the tidyverse to find out!


# Get all of Lucy's personal repos
lucy_repos <- gh("/users/LucyMcGowan/repos", .limit = Inf) %>%
  map_chr(~ .x$name)

# Get all of Lucy's organizations
lucy_orgs <- gh("/users/LucyMcGowan/orgs", .limit = Inf) %>%
  map_chr(~ .x$login) %>%

# Get all of the repos that belong to Lucy's organizations
# Then filter out repos that Lucy hasn't contributed to
lucy_orgs_repos <- map(lucy_orgs, 
                       ~ gh(paste0("/orgs/", .x, "/repos"), .limit = Inf)) %>%
    map_chr(org, ~ .x$full_name)
  }) %>%
  unlist() %>%
  keep(~ "LucyMcGowan" %in% (
    gh(paste0("/repos/", .x, "/contributors"), .limit = Inf) %>%
              ifelse(is.list(user), user$login, "")

# Combine Lucy's personal repos and her org repos
lucy_repos <- setdiff(lucy_repos, str_extract(lucy_orgs_repos, "[^/]+$")) %>%
  paste0("LucyMcGowan/", .) %>%

# Download all of the repos Lucy contributes to
local_repos <- map_chr(str_split(lucy_repos, "/"), paste, 
                       collapse = .Platform$file.sep) %>%
  paste0("lucy", .Platform$file.sep, .)
walk(local_repos, dir.create, recursive = TRUE)

walk2(lucy_repos, local_repos,
      ~ clone(paste0("", .x, ".git"),
              local_path = .y))

# Get all commits from Lucy's repos
all_commits <- map(local_repos, repository) %>%
  map(~ as(.x, "data.frame")) %>%

# Filter out all commits except those made by Lucy
lucy_commits <- all_commits %>%
  filter(author %in% c("Lucy", "Lucy D'Agostino McGowan", "LucyMcGowan"))

# Get frequency of emoji use across commits
emoji_tbl <- lucy_commits$message %>%
  str_extract_all("[\\uD83C-\\uDBFF\\uDC00-\\uDFFF]+") %>% unlist() %>%
  str_split("") %>% unlist() %>% 
  table() %>% sort(decreasing = TRUE) %>%

colnames(emoji_tbl) <- c("Emoji", "Frequency")

# Plot emoji use!
ggplot(emoji_tbl, aes(Emoji, Frequency)) +
  geom_col() +
  theme_light(base_size = 30)



Strong showings by the see-no-evil monkey, the rooster, and the classic information desk woman. Next I wondered: What words in commit messages co-occur most often with certain emojis? Tidytext to the rescue!


lucy_tokens <- lucy_commits %>%
  mutate(Emoji = str_extract_all(message, 
                  "[\\uD83C-\\uDBFF\\uDC00-\\uDFFF]+")) %>%
  select(message, sha, Emoji) 

lucy_tokens <- lucy_tokens %>%
  select(-Emoji) %>%
  unnest_tokens(word, message) %>%
  left_join(lucy_tokens) %>%
  mutate(Emoji = map_chr(Emoji, ~ ifelse(length(.x) > 0, .x[[1]], ""))) %>%
  filter(!(Emoji %in% c("", "-")))

emo_shas <- lucy_tokens %>%
  select(sha, Emoji) %>%
  distinct() %>%
  rename(word = Emoji)
lucy_tokens %>%
  select(sha, word) %>%
  bind_rows(emo_shas) %>%
  pairwise_count(word, sha, sort = TRUE) %>%
  filter(item1 %in% unique(lucy_tokens$Emoji)) %>%
  group_by(item1) %>%
  slice(1:2) %>%
  ungroup() %>%
  filter(nchar(item2) > 2) %>%
  arrange(desc(n)) %>%
🐛 fix 5
🙈 spell 4
🙈 correctly 4
👖 add 3
💅🏼 add 3
🙊 update 3
🙊 ignore 3
🌻 some 2
🎉🐓 first 2
🎉🐓 commit 2
🐓 add 2
🐓 and 2
👆🏼 make 2
👆🏼 clickable 2
👌🏼 add 2
👌🏼 pca 2
👷 fix 2
👷 update 2
💁🏻 update 2
💁🏻 commit 2
💅🏼 update 2
🤓 fix 2
🙆🏻 add 2
🙆🏻 about 2
🚒 update 2
🚒 recommender 2

If you want to add emojis to your own commit messages Lucy has a great tip:

Update: Scooped!

It seems that statistician and wonderful R blogger Maëlle Salmon had the same idea before me and wrote a very similar blog post!