First import the data:

legos <- read.csv("https://raw.githubusercontent.com/seankross/lego/master/data-tidy/legosets.csv", stringsAsFactors = FALSE)

We can look at the top of the data frame using the head() function:

head(legos)
##   Item_Number               Name Year           Theme          Subtheme
## 1       10246 Detective's Office 2015 Advanced Models Modular Buildings
## 2       10247       Ferris Wheel 2015 Advanced Models        Fairground
## 3       10248        Ferrari F40 2015 Advanced Models          Vehicles
## 4       10249           Toy Shop 2015 Advanced Models    Winter Village
## 5       10581              Ducks 2015           Duplo    Forest Animals
## 6       10582            Animals 2015           Duplo    Forest Animals
##   Pieces Minifigures                                          Image_URL
## 1   2262           6 http://images.brickset.com/sets/images/10246-1.jpg
## 2   2464          10 http://images.brickset.com/sets/images/10247-1.jpg
## 3   1158          NA http://images.brickset.com/sets/images/10248-1.jpg
## 4    898          NA http://images.brickset.com/sets/images/10249-1.jpg
## 5     13           1 http://images.brickset.com/sets/images/10581-1.jpg
## 6     39           2 http://images.brickset.com/sets/images/10582-1.jpg
##   GBP_MSRP USD_MSRP CAD_MSRP EUR_MSRP Packaging     Availability
## 1   132.99   159.99   199.99   149.99       Box Retail - limited
## 2   149.99   199.99   229.99   179.99       Box Retail - limited
## 3    69.99    99.99   119.99    89.99       Box   LEGO exclusive
## 4    59.99    79.99       NA    69.99       Box   LEGO exclusive
## 5     9.99     9.99    12.99     9.99       Box           Retail
## 6    16.99    19.99    24.99    19.99       Box           Retail

Now let’s look at the number of sets that came out each year by using the table() function:

lego_table <- table(legos$Year)
lego_table
## 
## 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 
##   31    4   38   23   24    4    4    8    6   28   15    6    3   16   28 
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 
##   36   60   21   44   35   54   53   77   67   75  123  126  164  193  244 
## 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 
##  261  286  275  239  247  217  187  228  275  306  364  417  409  434  417

We can turn this table into a data frame pretty easily. The names() function will extract the years from the table, while the as.vector() function will extract the set counts from the table.

years <- names(lego_table)
counts <- as.vector(lego_table)

Now let’s make a new data frame. We’ll cast years which is a vector of strings to be a vector of numbers using the as.numeric() function:

lego_sets_per_year <- data.frame(
  year = as.numeric(years),
  counts = counts
)

head(lego_sets_per_year)
##   year counts
## 1 1971     31
## 2 1972      4
## 3 1973     38
## 4 1974     23
## 5 1975     24
## 6 1976      4

Let’s make another data frame where we can store the number of sets released per decade:

decades <- seq(1970, 2010, by = 10)
decades
## [1] 1970 1980 1990 2000 2010
lego_sets_per_decade <- data.frame(
  decade = decades,
  counts = rep(0, times = length(decades))
)

Now we can write a for loop that will calculate the number of sets produced per decade and store that number in the data frame we just created.

for(d in decades){
  rows_1970_or_after <- which(lego_sets_per_year >= d)
  rows_1979_or_before <- which(lego_sets_per_year < d + 10)
  
  # See explanation of intersect() below
  rows <- intersect(rows_1970_or_after, rows_1979_or_before)
  
  insert_in_row <- which(lego_sets_per_decade$decade == d)
  lego_sets_per_decade[insert_in_row, 2] <- sum(lego_sets_per_year$counts[rows])
}

lego_sets_per_decade
##   decade counts
## 1   1970    142
## 2   1980    257
## 3   1990    967
## 4   2000   2459
## 5   2010   2347

Let’s visualize this data frame as a bar plot:

barplot(lego_sets_per_decade$counts, names.arg = lego_sets_per_decade$decade)

The intersect() function takes two vectors as arguments and returns the elements that are in both vectors:

intersect(c(2, 4, 6, 8, 10), c(1, 2, 4, 8, 16))
## [1] 2 4 8

Another approach to the for loop above

for(d in decades){
  rows <- which(lego_sets_per_year$year >= d & lego_sets_per_year$year < d + 10)
  
  insert_in_row <- which(lego_sets_per_decade$decade == d)
  lego_sets_per_decade[insert_in_row, 2] <- sum(lego_sets_per_year$counts[rows])
}

lego_sets_per_decade
##   decade counts
## 1   1970    142
## 2   1980    257
## 3   1990    967
## 4   2000   2459
## 5   2010   2347

Home