First import the data:
legos <- read.csv("https://raw.githubusercontent.com/seankross/lego/master/data-tidy/legosets.csv", stringsAsFactors = FALSE)
We can look at the top of the data frame using the head()
function:
head(legos)
## Item_Number Name Year Theme Subtheme
## 1 10246 Detective's Office 2015 Advanced Models Modular Buildings
## 2 10247 Ferris Wheel 2015 Advanced Models Fairground
## 3 10248 Ferrari F40 2015 Advanced Models Vehicles
## 4 10249 Toy Shop 2015 Advanced Models Winter Village
## 5 10581 Ducks 2015 Duplo Forest Animals
## 6 10582 Animals 2015 Duplo Forest Animals
## Pieces Minifigures Image_URL
## 1 2262 6 http://images.brickset.com/sets/images/10246-1.jpg
## 2 2464 10 http://images.brickset.com/sets/images/10247-1.jpg
## 3 1158 NA http://images.brickset.com/sets/images/10248-1.jpg
## 4 898 NA http://images.brickset.com/sets/images/10249-1.jpg
## 5 13 1 http://images.brickset.com/sets/images/10581-1.jpg
## 6 39 2 http://images.brickset.com/sets/images/10582-1.jpg
## GBP_MSRP USD_MSRP CAD_MSRP EUR_MSRP Packaging Availability
## 1 132.99 159.99 199.99 149.99 Box Retail - limited
## 2 149.99 199.99 229.99 179.99 Box Retail - limited
## 3 69.99 99.99 119.99 89.99 Box LEGO exclusive
## 4 59.99 79.99 NA 69.99 Box LEGO exclusive
## 5 9.99 9.99 12.99 9.99 Box Retail
## 6 16.99 19.99 24.99 19.99 Box Retail
Now let’s look at the number of sets that came out each year by using the table()
function:
lego_table <- table(legos$Year)
lego_table
##
## 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
## 31 4 38 23 24 4 4 8 6 28 15 6 3 16 28
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
## 36 60 21 44 35 54 53 77 67 75 123 126 164 193 244
## 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
## 261 286 275 239 247 217 187 228 275 306 364 417 409 434 417
We can turn this table into a data frame pretty easily. The names()
function will extract the years from the table, while the as.vector()
function will extract the set counts from the table.
years <- names(lego_table)
counts <- as.vector(lego_table)
Now let’s make a new data frame. We’ll cast years
which is a vector of strings to be a vector of numbers using the as.numeric()
function:
lego_sets_per_year <- data.frame(
year = as.numeric(years),
counts = counts
)
head(lego_sets_per_year)
## year counts
## 1 1971 31
## 2 1972 4
## 3 1973 38
## 4 1974 23
## 5 1975 24
## 6 1976 4
Let’s make another data frame where we can store the number of sets released per decade:
decades <- seq(1970, 2010, by = 10)
decades
## [1] 1970 1980 1990 2000 2010
lego_sets_per_decade <- data.frame(
decade = decades,
counts = rep(0, times = length(decades))
)
Now we can write a for loop that will calculate the number of sets produced per decade and store that number in the data frame we just created.
for(d in decades){
rows_1970_or_after <- which(lego_sets_per_year >= d)
rows_1979_or_before <- which(lego_sets_per_year < d + 10)
# See explanation of intersect() below
rows <- intersect(rows_1970_or_after, rows_1979_or_before)
insert_in_row <- which(lego_sets_per_decade$decade == d)
lego_sets_per_decade[insert_in_row, 2] <- sum(lego_sets_per_year$counts[rows])
}
lego_sets_per_decade
## decade counts
## 1 1970 142
## 2 1980 257
## 3 1990 967
## 4 2000 2459
## 5 2010 2347
Let’s visualize this data frame as a bar plot:
barplot(lego_sets_per_decade$counts, names.arg = lego_sets_per_decade$decade)
The intersect()
function takes two vectors as arguments and returns the elements that are in both vectors:
intersect(c(2, 4, 6, 8, 10), c(1, 2, 4, 8, 16))
## [1] 2 4 8
for
loop abovefor(d in decades){
rows <- which(lego_sets_per_year$year >= d & lego_sets_per_year$year < d + 10)
insert_in_row <- which(lego_sets_per_decade$decade == d)
lego_sets_per_decade[insert_in_row, 2] <- sum(lego_sets_per_year$counts[rows])
}
lego_sets_per_decade
## decade counts
## 1 1970 142
## 2 1980 257
## 3 1990 967
## 4 2000 2459
## 5 2010 2347