Data Structures

R has four basic data structures: vectors, lists, matricies, and data frames.

Vectors

Vectors contain data of a single type in 1 dimension. You can create vectors by using the c() function.

c(1, 2, 3)

## [1] 1 2 3

Since vectors have types, you can use them with operators to get predictable results.

1 + c(1, 2, 3)

## [1] 2 3 4

c(1, 2, 3) + c(4, 5, 6)

## [1] 5 7 9

c(1, 2, 3) * c(4, 5, 6)

## [1]  4 10 18

TRUE & c(TRUE, FALSE)

## [1]  TRUE FALSE

TRUE | c(TRUE, FALSE)

## [1] TRUE TRUE

c(TRUE, FALSE) & c(TRUE, FALSE)

## [1]  TRUE FALSE

c(FALSE, TRUE) | c(TRUE, FALSE)

## [1] TRUE TRUE

As you can see the & and | operators work with vectors the way && and || work with individual values.

You can extract individual values from vectors by “indexing” them using single brackets ([]).

a_vector <- c(6, 7, 8)
a_vector[1]

## [1] 6

a_vector[2]

## [1] 7

a_vector[3]

## [1] 8

Quickly creating vectors

To quickly create a sequence of integers, use the : operator:

1:7

## [1] 1 2 3 4 5 6 7

-4:2

## [1] -4 -3 -2 -1  0  1  2

To create a vector that repeats a value several times use the rep() function:

rep(5, times = 4)

## [1] 5 5 5 5

rep(3:4, times = 3)

## [1] 3 4 3 4 3 4

rep(3:4, times = 3, each = 2)

##  [1] 3 3 4 4 3 3 4 4 3 3 4 4

rep(TRUE, times = 3)

## [1] TRUE TRUE TRUE

rep("four times!", times = 4)

## [1] "four times!" "four times!" "four times!" "four times!"

To create a sequence of numbers use the seq() function:

seq(1, 5)

## [1] 1 2 3 4 5

seq(1, 2, by = 0.1)

##  [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

seq(1, 2, length.out = 5)

## [1] 1.00 1.25 1.50 1.75 2.00

More about vectors

You can index vectors by using another vector to specify which values you want to extract. For example:

hundred <- 100:110
hundred[c(1, 3, 4)]

## [1] 100 102 103

hundred[5:7]

## [1] 104 105 106

You can also combine vectors by using c():

onetwothree <- c(1, 2, 3)
fourfivesix <- c(4, 5, 6)
c(onetwothree, fourfivesix)

## [1] 1 2 3 4 5 6

It’s also possible to change a value in a vector be re-assigning it:

onetwothree

## [1] 1 2 3

onetwothree[2] <- 100
onetwothree

## [1]   1 100   3

Matricies

A matrix is like a two dimensional vector. Matricies contain data of a single type. You can create a matrix by using the matrix() function.

matrix(1:9, nrow = 3)

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

matrix(1:9, nrow = 3, byrow = TRUE)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Operators work in a predictable way:

mat <- matrix(2:5, nrow = 2)
1 + mat

##      [,1] [,2]
## [1,]    3    5
## [2,]    4    6

2 * mat

##      [,1] [,2]
## [1,]    4    8
## [2,]    6   10

Notice that multiplying two matricies does not perform matrix multiplication according to matrix algebra:

mat * mat

##      [,1] [,2]
## [1,]    4   16
## [2,]    9   25

To multiply matricies using linear algebra use the %*% operator:

mat %*% mat

##      [,1] [,2]
## [1,]   16   28
## [2,]   21   37

You can index a matrix with the syntax a_matrix[row,column].

big_mat <- matrix(1:9, nrow = 3)
big_mat[1, 2]

## [1] 4

You can also get a row or a column of a matrix as a vector by leaving out the row or column:

big_mat[1,]

## [1] 1 4 7

big_mat[,2]

## [1] 4 5 6

Indexing with vectors also works:

big_mat[1:2, 1:2]

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5

big_mat[3, 2:3]

## [1] 6 9

big_mat[c(1, 3), c(1, 3)] # The four "corners" of the matrix

##      [,1] [,2]
## [1,]    1    7
## [2,]    3    9

You can also re-assign values in a matrix:

big_mat

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

big_mat[2, 2] <- 47
big_mat

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2   47    8
## [3,]    3    6    9

You can even re-assign an entire row or column:

big_mat

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2   47    8
## [3,]    3    6    9

big_mat[, 2] <- c(12, 13, 14)
big_mat[1,] <- c(39, 40, 41)
big_mat

##      [,1] [,2] [,3]
## [1,]   39   40   41
## [2,]    2   13    8
## [3,]    3   14    9

Lists

Lists are the most adaptable data structures in R. You can create a list with the list() function. In its simplest form a list is a one dimensional data structure that can contain multiple types:

list1 <- list(1, TRUE, "three")
list1

## [[1]]
## [1] 1
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] "three"

Lists can be indexed with double brackets:

list1[[2]]

## [1] TRUE

list1[[3]]

## [1] "three"

It’s also possible to name each element in a list:

list2 <- list(first = 85, second = "a word")
list2

## $first
## [1] 85
## 
## $second
## [1] "a word"

You can access named list elements with the syntax a_list[["name"]] or a_list$name:

list2[["first"]]

## [1] 85

list2$second

## [1] "a word"

You can add named elements to existing lists with a similar synatx:

list2[["third"]] <- FALSE
list2$fourth <- "four"
list2

## $first
## [1] 85
## 
## $second
## [1] "a word"
## 
## $third
## [1] FALSE
## 
## $fourth
## [1] "four"

The most useful thing about lists is that an element of a list can be anything, a single value, a vector, a matrix, or even another list:

list3 <- list() # This creates an empty list
list3$a_vector <- c("I", "am", "a", "vector")
list3$a_matrix <- matrix(1:8, nrow = 4)
list3$a_list <- list(one = "1", two = "2", three = "3")
list3

## $a_vector
## [1] "I"      "am"     "a"      "vector"
## 
## $a_matrix
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
## 
## $a_list
## $a_list$one
## [1] "1"
## 
## $a_list$two
## [1] "2"
## 
## $a_list$three
## [1] "3"

Data Frames

A data frame is a two dimensional data structure that can contain multiple types of data. You can create a data frame using the function data.frame(). Unlike a list, a data frame can only contain single values, it cannot contain other data sctructures like a list can.

df <- data.frame(numbers = 9:12, truths = rep(TRUE, 4), 
                 strings = c("Mary", "had a", "little", "lamb."),
                 stringsAsFactors = FALSE)
df

##   numbers truths strings
## 1       9   TRUE    Mary
## 2      10   TRUE   had a
## 3      11   TRUE  little
## 4      12   TRUE   lamb.

Indexing data frames shares some similarities with matricies and lists:

df$numbers

## [1]  9 10 11 12

df$truths

## [1] TRUE TRUE TRUE TRUE

df[2,2]

## [1] TRUE

df[1,]

##   numbers truths strings
## 1       9   TRUE    Mary

df[2,]

##   numbers truths strings
## 2      10   TRUE   had a

df[,3]

## [1] "Mary"   "had a"  "little" "lamb."