Simple Linear Regression

For our exploration of linear modeling we’re going to use a dataset that comes packaged with R called trees. This dataset contains information about the girth, height, and volume of 31 trees. Use help("trees") for more infromation about this dataset. Let’s examine the relationship between the girth and the volume of these trees by plotting them:

plot(trees$Girth, trees$Volume, xlab = "Girth", ylab = "Volume", main = "Trees")

At first glace it looks like there’s a relationship! You could imagine drawing a straight line through most of the points on this graph, and even the points that you missed would be close to the line. This “strightness” is called a linear relationship. R provides functions for modeling linear relationships and other kinds of relationships between data. The most simple of these functions is the lm() function. To use the lm() function you need to provide a formula, which represents a relationship between variables, and a data frame that maps the variables in the formula to columns in the data frame. I’ll use lm() now to construct a model of the relationship between tree volume and girth:

tree_model1 <- lm(Volume ~ Girth, data = trees)
tree_model1
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Coefficients:
## (Intercept)        Girth  
##     -36.943        5.066

The model itself is an advanced data structure that resembles a list. When I examine the tree_model1 variable on the R console it prints the y-axis intercept of the regression line and the slope of the regression line. Let’s draw the regression line using abline().

plot(trees$Girth, trees$Volume, xlab = "Girth", ylab = "Volume", main = "Trees")
abline(tree_model1)

predictions1 <- predict(tree_model1, data.frame(Girth = 11:14))
predictions1
##        1        2        3        4 
## 18.78096 23.84682 28.91267 33.97853
plot(trees$Girth, trees$Volume, xlab = "Girth", ylab = "Volume", main = "Trees")
abline(tree_model1)
points(11:14, predictions1, col = "red")

plot(tree_model1$residuals, ylim = c(-10, 10))
abline(h = 0, col = "red")

Multiple Linear Regression

plot(trees$Height, trees$Volume, xlab = "Height", ylab = "Volume", main = "Trees")

tree_model2 <- lm(Volume ~ Girth + Height, data = trees)
tree_model2
## 
## Call:
## lm(formula = Volume ~ Girth + Height, data = trees)
## 
## Coefficients:
## (Intercept)        Girth       Height  
##    -57.9877       4.7082       0.3393
predictions2 <- predict(tree_model2, data.frame(Girth = c(12, 14, 16), 
                                                Height = c(70, 73, 76)))
predictions2
##        1        2        3 
## 22.25785 32.69193 43.12600
plot(trees$Girth, trees$Volume, xlab = "Girth", ylab = "Volume", main = "Trees")
points(c(12, 14, 16), predictions2, col = "red")
abline(tree_model1)

plot(trees$Height, trees$Volume, xlab = "Height", ylab = "Volume", main = "Trees")
points(c(70, 73, 76), predictions2, col = "red")

plot(tree_model1$residuals, ylim = c(-10, 10))
points(tree_model2$residuals, ylim = c(-10, 10), col = "blue")
abline(h = 0, col = "red")


Home