R has four basic data structures: vectors, lists, matricies, and data frames.
Vectors contain data of a single type in 1 dimension. You can create vectors by using the c()
function.
c(1, 2, 3)
## [1] 1 2 3
Since vectors have types, you can use them with operators to get predictable results.
1 + c(1, 2, 3)
## [1] 2 3 4
c(1, 2, 3) + c(4, 5, 6)
## [1] 5 7 9
c(1, 2, 3) * c(4, 5, 6)
## [1] 4 10 18
TRUE & c(TRUE, FALSE)
## [1] TRUE FALSE
TRUE | c(TRUE, FALSE)
## [1] TRUE TRUE
c(TRUE, FALSE) & c(TRUE, FALSE)
## [1] TRUE FALSE
c(FALSE, TRUE) | c(TRUE, FALSE)
## [1] TRUE TRUE
As you can see the &
and |
operators work with vectors the way &&
and ||
work with individual values.
You can extract individual values from vectors by “indexing” them using single brackets ([]
).
a_vector <- c(6, 7, 8)
a_vector[1]
## [1] 6
a_vector[2]
## [1] 7
a_vector[3]
## [1] 8
To quickly create a sequence of integers, use the :
operator:
1:7
## [1] 1 2 3 4 5 6 7
-4:2
## [1] -4 -3 -2 -1 0 1 2
To create a vector that repeats a value several times use the rep()
function:
rep(5, times = 4)
## [1] 5 5 5 5
rep(3:4, times = 3)
## [1] 3 4 3 4 3 4
rep(3:4, times = 3, each = 2)
## [1] 3 3 4 4 3 3 4 4 3 3 4 4
rep(TRUE, times = 3)
## [1] TRUE TRUE TRUE
rep("four times!", times = 4)
## [1] "four times!" "four times!" "four times!" "four times!"
To create a sequence of numbers use the seq()
function:
seq(1, 5)
## [1] 1 2 3 4 5
seq(1, 2, by = 0.1)
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
seq(1, 2, length.out = 5)
## [1] 1.00 1.25 1.50 1.75 2.00
You can index vectors by using another vector to specify which values you want to extract. For example:
hundred <- 100:110
hundred[c(1, 3, 4)]
## [1] 100 102 103
hundred[5:7]
## [1] 104 105 106
You can also combine vectors by using c()
:
onetwothree <- c(1, 2, 3)
fourfivesix <- c(4, 5, 6)
c(onetwothree, fourfivesix)
## [1] 1 2 3 4 5 6
It’s also possible to change a value in a vector be re-assigning it:
onetwothree
## [1] 1 2 3
onetwothree[2] <- 100
onetwothree
## [1] 1 100 3
A matrix is like a two dimensional vector. Matricies contain data of a single type. You can create a matrix by using the matrix()
function.
matrix(1:9, nrow = 3)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
matrix(1:9, nrow = 3, byrow = TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
Operators work in a predictable way:
mat <- matrix(2:5, nrow = 2)
1 + mat
## [,1] [,2]
## [1,] 3 5
## [2,] 4 6
2 * mat
## [,1] [,2]
## [1,] 4 8
## [2,] 6 10
Notice that multiplying two matricies does not perform matrix multiplication according to matrix algebra:
mat * mat
## [,1] [,2]
## [1,] 4 16
## [2,] 9 25
To multiply matricies using linear algebra use the %*%
operator:
mat %*% mat
## [,1] [,2]
## [1,] 16 28
## [2,] 21 37
You can index a matrix with the syntax a_matrix[row,column]
.
big_mat <- matrix(1:9, nrow = 3)
big_mat[1, 2]
## [1] 4
You can also get a row or a column of a matrix as a vector by leaving out the row
or column
:
big_mat[1,]
## [1] 1 4 7
big_mat[,2]
## [1] 4 5 6
Indexing with vectors also works:
big_mat[1:2, 1:2]
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
big_mat[3, 2:3]
## [1] 6 9
big_mat[c(1, 3), c(1, 3)] # The four "corners" of the matrix
## [,1] [,2]
## [1,] 1 7
## [2,] 3 9
You can also re-assign values in a matrix:
big_mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
big_mat[2, 2] <- 47
big_mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 47 8
## [3,] 3 6 9
You can even re-assign an entire row or column:
big_mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 47 8
## [3,] 3 6 9
big_mat[, 2] <- c(12, 13, 14)
big_mat[1,] <- c(39, 40, 41)
big_mat
## [,1] [,2] [,3]
## [1,] 39 40 41
## [2,] 2 13 8
## [3,] 3 14 9
Lists are the most adaptable data structures in R. You can create a list with the list()
function. In its simplest form a list is a one dimensional data structure that can contain multiple types:
list1 <- list(1, TRUE, "three")
list1
## [[1]]
## [1] 1
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] "three"
Lists can be indexed with double brackets:
list1[[2]]
## [1] TRUE
list1[[3]]
## [1] "three"
It’s also possible to name each element in a list:
list2 <- list(first = 85, second = "a word")
list2
## $first
## [1] 85
##
## $second
## [1] "a word"
You can access named list elements with the syntax a_list[["name"]]
or a_list$name
:
list2[["first"]]
## [1] 85
list2$second
## [1] "a word"
You can add named elements to existing lists with a similar synatx:
list2[["third"]] <- FALSE
list2$fourth <- "four"
list2
## $first
## [1] 85
##
## $second
## [1] "a word"
##
## $third
## [1] FALSE
##
## $fourth
## [1] "four"
The most useful thing about lists is that an element of a list can be anything, a single value, a vector, a matrix, or even another list:
list3 <- list() # This creates an empty list
list3$a_vector <- c("I", "am", "a", "vector")
list3$a_matrix <- matrix(1:8, nrow = 4)
list3$a_list <- list(one = "1", two = "2", three = "3")
list3
## $a_vector
## [1] "I" "am" "a" "vector"
##
## $a_matrix
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
##
## $a_list
## $a_list$one
## [1] "1"
##
## $a_list$two
## [1] "2"
##
## $a_list$three
## [1] "3"
A data frame is a two dimensional data structure that can contain multiple types of data. You can create a data frame using the function data.frame()
. Unlike a list, a data frame can only contain single values, it cannot contain other data sctructures like a list can.
df <- data.frame(numbers = 9:12, truths = rep(TRUE, 4),
strings = c("Mary", "had a", "little", "lamb."),
stringsAsFactors = FALSE)
df
## numbers truths strings
## 1 9 TRUE Mary
## 2 10 TRUE had a
## 3 11 TRUE little
## 4 12 TRUE lamb.
Indexing data frames shares some similarities with matricies and lists:
df$numbers
## [1] 9 10 11 12
df$truths
## [1] TRUE TRUE TRUE TRUE
df[2,2]
## [1] TRUE
df[1,]
## numbers truths strings
## 1 9 TRUE Mary
df[2,]
## numbers truths strings
## 2 10 TRUE had a
df[,3]
## [1] "Mary" "had a" "little" "lamb."