A Minimal Tutorial
Unlike Jeff Leek I do most of my software development on a MacBook Pro, but every once in a while I would like to be able to access a shell on Ubuntu without too much hassle. One of the main reasons I want to be able to access Ubuntu is because I test several software packages using Travis, which uses fresh Ubuntu instances every time my tests are run. Over the years I’ve tried using VirtualBox and Vagrant, but recently I’ve discovered that Docker absolutely wins this competition in terms of ease of use.
Docker is a sophisticated software project built on the Go programming language for creating distributed and networked web applications, but it’s also wonderful for just accessing an Ubuntu bash terminal. To get started with Docker you should first install it (you want the Community Edition), then pull up your command line.
Docker has several abstractions, but there are only two we need to worry about right now: images and containers. A Docker image is the blueprint for the mini computer-within-a-computer that Docker will provide us. The Docker organization has a website called Docker Hub which allows people to distribute their own Docker images. In this case we’re about to get an Ubuntu image from the official Ubuntu page on Docker Hub. A Docker container is mini computer actively running in your computer. Each container starts off as a live, running instance of a Docker image.
We can use docker pull
on the command line to get a Docker image. In order to
pull a specific image we need to provide the name of the Docker Hub repository
(in this case ubuntu
) and the tag for a specific image, which corresponds to
a version of Ubuntu. Let’s get Ubuntu 14.04 since it’s commonly used on Travis:
docker pull ubuntu:14.04
## 14.04: Pulling from library/ubuntu
## Digest: sha256:6a3e01207b899a347115f3859cf8a6031fdbebb6ffedea6c2097be40a298c85d
## Status: Downloaded newer image for ubuntu:14.04
We can look at all of the Docker images that we have available using
docker images
:
docker images
## REPOSITORY TAG IMAGE ID CREATED SIZE
## ubuntu 14.04 c69811d4e993 2 weeks ago 188MB
Now that we’ve downloaded a Docker image we can start up our first container:
docker run -it ubuntu:14.04
You should now get a shell prompt that is inside your Docker container! Open up a second terminal so we can look at all the currently running containers on our machine:
docker ps
## CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
## 4313a3f3f11d ubuntu:14.04 "/bin/bash" 2 minutes ago Up 27 seconds serene_albattani
Switching back to our Dockerized Ubuntu shell let’s make a small change by creating a simple text file:
cd
echo 'My first container!' > readme.txt
Now let’s stop the container with exit
, which will return us to the shell on
our host computer.
exit
We can look at all Docker containers (including stopped containers) by adding
the -a
flag to docker ps
:
docker ps -a
## CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
## 4313a3f3f11d ubuntu:14.04 "/bin/bash" 3 minutes ago Exited (0) 7 seconds ago serene_albattani
Let’s restart the container with the docker start
command using the -ai
flags
and providing the ID of the stopped container:
docker start -ai 4313a3f3f11d
We’re back in the Ubuntu shell! Let just check to make sure the file we created is still there:
cd
cat readme.txt
## My first container!
Looking good!
While we’re in our Ubuntu shell let’s create some changes to our
container. For example we might want to be able to download files in our
container, so let’s install curl
:
sudo apt-get update
sudo apt-get install curl
All I’ve done is install curl
, but you could imagine installing many more
tools and creating some files. In order to save the state of your live container
you need to create a new image. In this way Docker works like Pokemon Red and
Blue: if you mess up during “the game” (your Ubuntu session) you can reset your
instance by exit
ing the Ubuntu shell and spinning up a new container from a
saved image. In the host (non-Docker) shell let’s save this container as a new
image, which is essentially a fresh Ubuntu installation plus curl
, which we
have installed.
We can create an image from a container with docker commit
with
the ID for a container and a name for our new image:
# Get the container ID from `docker ps`
docker commit 4313a3f3f11d ubuntu-curl
In this case I called the image ubuntu-curl
, but you can call it whatever you
want. Let’s check docker images
to see our new image:
# Get the container ID from `docker ps`
docker images
## REPOSITORY TAG IMAGE ID CREATED SIZE
## ubuntu-curl latest 32847817ea58 2 minutes ago 220MB
## ubuntu 14.04 c69811d4e993 2 weeks ago 188MB
Now starting an Ubuntu container with curl
already installed is as easy as
docker run -it ubuntu-curl
.
As you can see in the table above our new ubuntu-curl
image is 220 MB, which
is sort of large to keep around just to avoid the step of installing curl
in
the future. We can delete an image with docker rmi
and the image name:
docker rmi ubuntu-curl
If we wanted to delete a container (so that it can’t be restarted in the future)
we can delete it with docker rm
and the ID of the container (which you can
get from docker ps -a
):
docker rm 4313a3f3f11d
Perhaps you’ve caught on to the pattern that Docker commands mimic standard
Unix commands, in which case it will come as no surprise to you that docker cp
allows you to get files in and out of containers. Let’s start up a brand new
container and create a file:
docker run -it ubuntu:14.04
# (Now we're in the Dockerized Ubuntu shell)
cd
echo "very important data" > some-big-file.dat
pwd
## /root
Now let’s exit
the container and go back to our host shell where we can use
docker cp
, the ID of the container, the path to the file we want to copy
to our host, and the path of the copied file on our host:
exit
# (Now we're in the host shell)
docker ps -a
## CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
## a6c228e05fca ubuntu:14.04 "/bin/bash" 2 minutes ago Exited (0) 29 seconds ago musing_borg
docker cp a6c228e05fca:/root/some-big-file.dat ~/
cd
cat some-big-file.dat
## very important data
It worked! Now let’s copy a file from our host into the container:
echo "- jhudatascience.org" > list-of-cool-websites.txt
echo "- shanellis.github.io" >> list-of-cool-websites.txt
echo "- pgbovine.net" >> list-of-cool-websites.txt
docker cp list-of-cool-websites.txt a6c228e05fca:/root/
docker start -ai a6c228e05fca
Now we should be in the Ubuntu shell. Did our file make it?
cd
cat list-of-cool-websites.txt
## - jhudatascience.org
## - shanellis.github.io
## - pgbovine.net
Success! One thing to keep in mind: you don’t have to stop a container in order
to copy files into or out of it, the container can be running and you can use
another terminal to do docker cp
.
One of my favorite Docker features is that you can work in a container and then easily share an image of that container. If you’ve been doing significant amounts of system configuration or software development and you want to share your carefully crafted environment with somebody else then Docker seems to be the perfect tool for you. I’m very excited to use this feature while teaching, since configuring computing environments often takes up a significant amount of valuable teaching time.
To start sharing your images you should create your own account on
Docker Hub. Once you have an account you can log in
on the terminal with docker login
, which will ask you for your Docker Hub
username and password. Logging in on the command line allows you to create and
update images on your Docker Hub profile.
After logging in let’s create a new image which features our list of cool websites:
docker commit a6c228e05fca cool-websites
After creating the image that we want to share, we need to tag
the image with
the name that we want the image to have on Docker Hub. This name should be in
the format of [Docker Hub username]/[name of image]
. My Docker Hub username
is seankross, so I’ll tag this image as seankross/cool-websites
:
docker tag cool-websites seankross/cool-websites
Now we can use the image tag to push
the image onto Docker Hub:
docker push seankross/cool-websites
After a minute of uploading (depending on the speed of your internet connection)
your image should now be
on Docker Hub! Other Docker
users can now easily get a copy of your uploaded image with docker pull
. For
example if you wanted to use my cool-websites
image, then you could use the
command docker pull seankross/cool-websites
to get the image.
With any of the Docker sub-commands (images
, tag
, rm
, etc) it’s not always
easy to remember what options are available. Thankfully adding the --help
flag
after any command will list that command’s options in a friendly and
understandable way (relative to other command line help programs). To list all
of the Docker sub-commands that are available you can use docker --help
, but
also check out docker run --help
and docker start --help
.
Like I briefly discussed before, I teach data science and programming courses often, and correctly configuring a computing environment for any specific programming task can be a challenge. Even configuring my own working environment can be time-consuming, but pre-made Docker images have really helped my setup workflow.
For example the Jupyter project has a Docker Hub account where they have several useful images. If you want to get started on a data science project, it can be as easy as running the following:
# Download the image
docker pull jupyter/datascience-notebook
# Run the notebook
docker run -it --rm -p 8888:8888 jupyter/datascience-notebook
The console will then provide you with a URL that you can copy and paste into
a web browser where your Jupyter notebook will be live! The container will
automatically destroy itself after it stops, but to save the container you can
remove the --rm
flag from the command above.
If - like me - RStudio is more your flavor of data science computing environment and you want a fresh instance then you should look at the Rocker project, a collaboration between R legends Carl Boettiger and Dirk Eddelbuettel. Rocker provides several Docker images which launch pre-configured RStudio instances. My favorite vintage among these images is rocker/tidyverse which includes all of RStudio & co’s Tidyverse packages. You can pull the image down and start it up with the following commands:
# Download the image
docker pull rocker/tidyverse
# Start up RStudio
docker run -d -p 8787:8787 rocker/tidyverse
After starting the container just navigate to localhost:8787
in your web
browser and log in with rstudio
as the default username and password. Once
you’re finished using RStudio you need to stop the container using docker stop
with the appropriate container ID, which you can find with docker ps
.
There’s a much bigger Docker world out there, but after getting into Docker a little bit I realized how all of its features could have big implications for doing reproducible science! If you’re doing data science with Docker I would love to hear about your experiences.