The dplyr package can be used for manipulating data frames. You can insatll it by entering install.packages("dplyr") into your R console. After installation use library(dplyr) to load the package.

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Retrieve the Lego set data since you’re familiar with it:

legos <- read.csv("https://raw.githubusercontent.com/seankross/lego/master/data-tidy/legosets.csv", stringsAsFactors = FALSE)

The Basics

You can cause the legos data frame to “behave” a little more nicely by wrapping it with the tbl_df() function. I’ll do this quickly:

legos <- tbl_df(legos)
#kable(head(legos))

Let’s use dplyr’s functions to rearrange the legos data frame. You can re-order the rows of a data frame using the arrange() function:

# Order `legos` by the number of pieces
#arrange(legos, Pieces)
legos_simple <- select(legos, Item_Number, Pieces, USD_MSRP)
legos_simple <- filter(legos_simple, !is.na(USD_MSRP))
legos_simple <- mutate(legos_simple, PPP = USD_MSRP / Pieces)
legos_simple <- arrange(legos_simple, desc(PPP))
legos_simple <- filter(legos_simple, PPP < .75)
plot(jitter(rep(0, length(legos_simple$PPP))), legos_simple$PPP,
     xlim = c(-.05, .05))

legos_simple %<>%
  select(Item_Number, Pieces, USD_MSRP) %>%
  filter(!is.na(USD_MSRP)) %>%
  filter(!is.na(Pieces)) %>%
  mutate(PPP = USD_MSRP / Pieces) %>%
  arrange(desc(PPP)) %>%
  
  tbl_df() %T>%
  print() %>%
  filter(PPP < .75) %>%
  print()
## Source: local data frame [5,088 x 4]
## 
##    Item_Number Pieces USD_MSRP       PPP
##          (chr)  (int)    (dbl)     (dbl)
## 1        10544     40    29.99 0.7497500
## 2         5947     40    29.99 0.7497500
## 3       852535     24    17.99 0.7495833
## 4         8519     20    14.99 0.7495000
## 5        71000      4     2.99 0.7475000
## 6        30132      4     2.99 0.7475000
## 7         8804      4     2.99 0.7475000
## 8         8321     27    20.00 0.7407407
## 9         4966     95    69.99 0.7367368
## 10        9130    125    91.00 0.7280000
## ..         ...    ...      ...       ...
## Source: local data frame [5,088 x 4]
## 
##    Item_Number Pieces USD_MSRP       PPP
##          (chr)  (int)    (dbl)     (dbl)
## 1        10544     40    29.99 0.7497500
## 2         5947     40    29.99 0.7497500
## 3       852535     24    17.99 0.7495833
## 4         8519     20    14.99 0.7495000
## 5        71000      4     2.99 0.7475000
## 6        30132      4     2.99 0.7475000
## 7         8804      4     2.99 0.7475000
## 8         8321     27    20.00 0.7407407
## 9         4966     95    69.99 0.7367368
## 10        9130    125    91.00 0.7280000
## ..         ...    ...      ...       ...

essential

arrange desc distinct filter mutate group_by joins n rename select slice summarize bind_rows bind_cols ```