dplyr
& magrittr
The dplyr
package can be used for manipulating data frames. You can insatll it by entering install.packages("dplyr")
into your R console. After installation use library(dplyr)
to load the package.
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Retrieve the Lego set data since you’re familiar with it:
legos <- read.csv("https://raw.githubusercontent.com/seankross/lego/master/data-tidy/legosets.csv", stringsAsFactors = FALSE)
You can cause the legos
data frame to “behave” a little more nicely by wrapping it with the tbl_df()
function. I’ll do this quickly:
legos <- tbl_df(legos)
#kable(head(legos))
Let’s use dplyr
’s functions to rearrange the legos
data frame. You can re-order the rows of a data frame using the arrange()
function:
# Order `legos` by the number of pieces
#arrange(legos, Pieces)
legos_simple <- select(legos, Item_Number, Pieces, USD_MSRP)
legos_simple <- filter(legos_simple, !is.na(USD_MSRP))
legos_simple <- mutate(legos_simple, PPP = USD_MSRP / Pieces)
legos_simple <- arrange(legos_simple, desc(PPP))
legos_simple <- filter(legos_simple, PPP < .75)
plot(jitter(rep(0, length(legos_simple$PPP))), legos_simple$PPP,
xlim = c(-.05, .05))
legos_simple %<>%
select(Item_Number, Pieces, USD_MSRP) %>%
filter(!is.na(USD_MSRP)) %>%
filter(!is.na(Pieces)) %>%
mutate(PPP = USD_MSRP / Pieces) %>%
arrange(desc(PPP)) %>%
tbl_df() %T>%
print() %>%
filter(PPP < .75) %>%
print()
## Source: local data frame [5,088 x 4]
##
## Item_Number Pieces USD_MSRP PPP
## (chr) (int) (dbl) (dbl)
## 1 10544 40 29.99 0.7497500
## 2 5947 40 29.99 0.7497500
## 3 852535 24 17.99 0.7495833
## 4 8519 20 14.99 0.7495000
## 5 71000 4 2.99 0.7475000
## 6 30132 4 2.99 0.7475000
## 7 8804 4 2.99 0.7475000
## 8 8321 27 20.00 0.7407407
## 9 4966 95 69.99 0.7367368
## 10 9130 125 91.00 0.7280000
## .. ... ... ... ...
## Source: local data frame [5,088 x 4]
##
## Item_Number Pieces USD_MSRP PPP
## (chr) (int) (dbl) (dbl)
## 1 10544 40 29.99 0.7497500
## 2 5947 40 29.99 0.7497500
## 3 852535 24 17.99 0.7495833
## 4 8519 20 14.99 0.7495000
## 5 71000 4 2.99 0.7475000
## 6 30132 4 2.99 0.7475000
## 7 8804 4 2.99 0.7475000
## 8 8321 27 20.00 0.7407407
## 9 4966 95 69.99 0.7367368
## 10 9130 125 91.00 0.7280000
## .. ... ... ... ...
arrange desc distinct filter mutate group_by joins n rename select slice summarize bind_rows bind_cols ```