R is wonderful at generating data and conducting simulations of events. In my opinion there are four main functions for running simulations:

  1. sample()
  2. replicate()
  3. The r*() family of functions.
  4. The *apply() family of functions.

sample()

If we wanted to simulate flipping a coin or rolling a die, sample() is our go-to function. The sample() function in its leanest form can take any vector as an argument:

sample(1:6)
## [1] 6 4 1 3 5 2

The function randomly picks a number in the vector without replacing it, and then it continues to randomly pick numbers without replacing them until it runs out of numbers. To simulate rolling a die, we need to specify the replace argument and the size argument.

sample(1:6, size = 1, replace = TRUE)
## [1] 6

This simulates rolling a single die once. To simulate multiple dice rolls we can increase the size argument:

sample(1:6, size = 8, replace = TRUE)
## [1] 1 4 4 6 5 1 2 5

As you can see the number 4 and 5 are rolled multiple times because we are sampling elements from the vector with replacement.

replicate()

The replicate() function takes two arguments, one is an R expression, and the other is the number of times you want that R expression to be replicated. For example if you wanted to compute the mean of 100 rolls of a die you could do the following:

mean(sample(1:6, size = 100, replace = TRUE))
## [1] 3.5

Now imagine that you want to compute 100 means of 100 die rolls. You could accomplish this task using replicate():

replicate(100, mean(sample(1:6, size = 100, replace = TRUE)))
##   [1] 3.58 3.19 3.51 3.39 3.46 3.38 3.75 3.49 3.27 3.73 3.47 3.68 3.35 3.26
##  [15] 3.47 3.74 3.38 3.60 3.33 3.38 3.73 3.49 3.65 3.42 3.56 3.22 3.58 3.60
##  [29] 3.54 3.60 3.37 3.47 3.39 3.49 3.58 3.61 3.39 3.52 3.59 3.69 3.70 3.37
##  [43] 3.47 3.17 3.59 3.51 3.55 3.32 3.43 3.58 3.61 3.66 3.71 3.26 3.37 3.39
##  [57] 3.51 3.43 3.72 3.47 3.34 3.56 3.71 3.36 3.69 3.30 3.19 3.31 3.78 3.56
##  [71] 3.46 3.79 3.22 3.66 3.34 3.39 3.35 3.54 3.22 3.45 3.46 3.82 3.42 3.41
##  [85] 3.53 3.35 3.42 3.52 3.19 3.51 3.49 3.51 3.60 3.47 3.55 3.52 3.30 3.72
##  [99] 3.53 3.55

The r*() functions

r*() refers to functions like rnorm() and rbinom() which generate random variables from their respective distributions. These functions are discussed more in-depth in the the statistical functions notes. For a list of the distributions that come with R, enter help("Distributions") into the R console.

Below I’ll illustrate how you could use rbinom() to simulate trials of flipping a fair coin. The rbinom() function needs three arguments: the number of trials you wish to perform, the number of coins to flip per trial, and the probability of the coin landing on heads. In the example below I’ll simulate performing 10 trials, each trial flipping 9 coins, and the probability of heads will be 50%:

rbinom(10, 5, .5)
##  [1] 1 3 4 1 3 5 0 3 1 1

As you can see rbinom() returns a vector where each element of the vector represents the number of coins that landed on heads in that trial.

The *apply() functions


Home