The broom Where it Happens / When are these #rcatladies gonna rise up?
I was raised listening to musicals and I’ve occasionally performed in a few, so naturally over the past year I’ve been listening to Hamilton on repeat to the point where the image of a five pointed star is burned into my iPhone screen. I’ve also been wanting to take the tidytext R package for a spin after seeing what creators Julia Silge and David Robinson have been able to do with it, not to mention that The Economist uses it!
Hamilton is a particularly good musical to analyze because the show is sung-through, meaning that the entirety of the plot is contained in the lyrics. We can get the lyrics by scraping a lyrics website with rvest. Let’s start by scraping a list of web pages that contain the lyrics for each song:
Now I can scrape each individual page for the lyrics:
|So so so so this is what it feels like to match wits with someone at your level.||satisfied|
|What the hell is the catch?||satisfied|
|It's the feeling of freedom of seeing the light.||satisfied|
|It's Ben Franklin with the key and the kite.||satisfied|
|You see it right?||satisfied|
Okay we’ve got the lyrics in a data frame! Now we can tidy this data frame even further so that each “token” (essentially each word) is ready for sentiment analysis.
Now that each song is tokenized we can start exploring the data. Let’s take a look at the most common words in the show:
I like how this forms a little song of its own: “I’m da Hamilton! Time? Wait! Don’t Burr, you’re shot sir! Hey Alexander!” Next let’s examine which pairs of words appear together most often in a verse:
“Alexander” and “Hamilton” are no surprise, but we can also see how the songs “My Shot,” “Non-Stop,” and “The Reynolds Pamphlet” dominate these common word pairings. We can visualize words that commonly appear with other words across different verses as a network:
It’s interesting to see how many of the major lyrical themes can be seen in this network, including how “shot <-> throwing” is related to “time <-> running.” Also notice how the theme of “coming/going home” links George Washington and Thomas Jefferson.
Now let’s perform the sentiment analysis. Sentiment spans a positive/negative axis, which we’ll map over the course of the show. We’ll evaluate sentiment on a song-by-song basis. A LOESS smoothing line is a good way to track the general sentiment of the show:
The first three peaks show the first major positive events in Hamilton’s life: meeting his friends, becoming Washington’s secretary, and then meeting and marrying Eliza. The fourth peek around the 15th song encompasses the events of the Revolutionary War, and the peak after the 20th song occurs at the same time as the song “Non-Stop,” one of the most energetic songs in the show. Around the 35th song there’s a big peak which is influenced by the fact that “One Last Time” is given a very high sentiment rating, but that peak quickly falls off with “Hurricane” which is the 36th song. The last peak shows Hamilton and and Eliza rebuilding their relationship, peaking with “It’s Quiet Uptown,” with the show then descending into the Hamilton-Burr duel.
Text and sentiment analysis seem to capture several of the major themes and movements in Hamilton’s plot, but under close examination some of the sentiment scores of certain songs don’t make sense in the context of the show. For example: although I think of “My Shot” as a positive song it has a sentiment score of -32, meanwhile “One Last Time” which is more melancholy has a score of 41. It would be interesting to get musical theater scholars and a computational sentiment experts to work together to “validate” particular sentiment values and techniques against a consensus of literary/theatrical meaning. It would also be very interesting to analyze the sentiment of the music in every song and to see how that correlates with the sentiment of the words.
Thank you again to Julia Silge and David Robinson for building these tools and for providing fantastic examples for their use.