<img src = “https://www.docker.com/sites/default/files/Whale%20Logo332_5.png”, alt = “Docker whales are friends!”, style=“display:block; margin-left:auto; margin-right:auto;”/>
Intro
If you don’t know about Rocker yet, you should. It is slick collection of Docker images built specifically for the R community. They run RStudio Server out of the box and allow anyone to quickly spin up and share reproducible analysis environments.
The main motivation for this post is the Astrea Open Data Challenge, which requires a container as part of the result submission. Initially the challenge only had one option, a Python focused Anaconda image, so I reached out to the organizers and convinced them to add a comparable option for R users, the rocker/rstudio image.
Since I asked for the alternative R environment! I wanted to markdown the step by step instructions for getting rocker set up on macOS
. This walk through assumes you have done the first two parts of tutorial Getting Started with Docker and have a working knowledge of git
(nothing fancy).
Recomeneded: A GitHub repo for files
This part is separate but parallel to the Docker container set up. Docker is meant to contain all things computing environment but not really scripts and data. So it’s good practive to keep these in a separate folder system. That could be locally, but in name of version control and open-source, we are using GitHub to track and publish our code.
So if you want to do this part go to Github.com and make a new repository, I called mine “Hauncher”. Then get the Clone with HTTPS
link and run the following line on your computer in a new Terminal:
git clone https://github.com/nathancday/hauncher.git ~/Desktop/hauncher
This will create a cloned copy of the new repository as a new directory “hauncher” sitting on your Desktop. This is where we will be saving all of the files we use inside of the Docker container.
Get the image/container
Images are the recipes that build out containers. Containers are the actual system process that runs the code stuff.
I highly recommend using the rocker/ropensci
image over the base RStudio
image because it has a bunch of the required system level libraries and common R packages pre-installed, which makes life a lot easier.
docker pull rocker/ropensci
This takes a minute the first time you pull
so go get a cup of coffee or something.
If you really want to start from scratch and use the base image rocker/RStudio
, you can install external dependencies in your container with the intructions here, but full warning its a PIA.
Run that image as a container
Copy and replace the -v /local/path:/container/path
section and run the following in your terminal:
docker run -d -p 8787:8787 -v /Users/nathanday/Desktop/hauncher/:/home/rstudio/hauncher rocker/rstudio
Docker requires absolute paths, so you can abbreviate your home directed with ~/
, it’s gotta be /Users/your_name/
.
Now head to localhost:8787 in your favorite browser to see your new container in action.
You should be greeted with a login screen, enter “rstudio” for both the username and password. And wallah you in R Studio. And you should see two folders kitematic
and hauncher
in your file explorer pane.
Install packages / mod your container
Let’s make a new R script to install an extra packages we want and save it into our directory hauncher
. My script is called install.R
and is just this:
install.packages("pacman")
Save your container
Before you close the running container go back to the terminal and list which containers are running with:
docker ps
Find the container ID you want to save and copy the whole thing or remember the first few characters (two is usually good enough for me) and run this line, substituting your full or partial id for “54”.
docker commit -m "first commit; install.R" 54 nathancday/hauncher:latest
Note: The container ID will change each time you run an image, because its really a process ID not permanent attribute.
Using latest
as the tag name is nice because, you won’t need to type out the :$TAG
suffix for pull
or push
commands. But use other tag names for more desciptive saves.
Now run:
docker images
To check the available local images and you should see the new image you just saved.
Publish to Docker Hub
The last Docker thing to do is push your updated to image to Docker Hub, so other people can access it:
docker push nathancday/hauncher
This is considerably faster than the original pull
for us, because we are only uploading the modifications.
Now anyone can start with exact same environment as us, with the same pull
/run
commands, isn’t that great?
Update GitHub
The final step for a totally synced, containerized, reproducible analysis is to update our remote GitHub repo with the file changes we made locally, here its just the new install.R
file.
Back to the terminal we go:
git status
Should show our new file as “untracked”.
All we need to do it is add it, commit it, and push it.
git add install.R
git commit -m "install.R file"
git push
And that’s it!
Wrap up
I hope this helps you going with using Docker for reproducible projects. If you run into any problems, let me know I will do my best to help debug.
You are welcome to follow along with our project’s progress on GitHub repo and Docker image. We will be keeping our code open from start to finish, so updates are coming as we build our models in the next few weeks. Have fun rocker-ing!!!
TLDR
The code for you to rock right away:
docker run -d -p 8787:8787 -v /Users/you/Desktop/Dir:/home/rstudio/Dir rocker/ropensci
# run as daemon
# served at 'localhost:8787', use your browser
# mount local/dir:container/dir
If you’ve changed a running container and want to save it (and push to your Hub):
# lists runing containers, get CONTAINER_ID of interest
docker ps
# commit kinda like git
docker commit CONTAINER_ID user_name/repo_name # default tag is 'latest'
# must be logged in to docker hub to push it
docker push user_name/repo_name
If you want to reproduce our code so far:
# clone our GitHub repo
git clone https://github.com/nathancday/hauncher.git /Local/Path/Desktop/hauncher
# run our Docker image
docker run -d -p 8787:8787 -v /Local/Path/Desktop/hauncher:/home/rstudio/hauncer nathancday/hauncher