Using Amazon EC2 to run Bayesian Models

Chris Che-Castaldo & Ben Weinstein

March 29, 2016

Table of Contents

I. Introduction

II. Installing and configuring Git LFS locally

III. Using SSH with your Git repository

IV. Configuring and launching an Amazon EC2 instance

V. Accessing an Amazon EC2 instance in the browser

VI. Accessing an Amazon EC2 instance via SSH

VII. Cloning a Git repository on your Amazon EC2 instance

VIII. Installing and configuring the AWS Command Line Interface (CLI)

IX. Putting it all together

X. Final workflow

I. Introduction

While a pleasant side effect of running large Bayesian models that strain your laptop to its breaking point for hours on end is ample amounts of free time to pursue other worldly pursuits, there reaches a point where it is necessary to don one’s big person pants and move all this heavy computation to the cloud. The goal of this tutorial is to (1) configure the hardware and software of an Amazon EC2 instance that we then create, (2) connect to this instance in order to clone a Git repository, run an R script (hereafter called GenericEC2Run.R), and save the R output, then (3) publish these changes to your project’s remote Git repository as a new branch, before (4) terminating the instance. In our case, the R script GenericEC2Run.R will be an R markdown file that runs a JAGS model, but it could be anything you wish. Let’s get started!

II. Installing and configuring git LFS locally

You will need Git LFS if you plan to commit any files \(>=\) 100 MB in size. The MCMC output from models with a large number of parameters could easily exceed this size, making LFS a must. This section describes installing Git LFS locally. Later on we describe how to install Git LFS on your EC2 instance, as it does not come pre-installed with the AMI we will be working with. Remember, Git LFS is a subscription service, and you first need to set this up on your GitHub account. If you don’t need Git LFS then you can skip this section and the code below where we install Git LFS on our Amazon EC2 instance.

Getting Git LFS to work correctly on the mac can be confusing, since git is typically installed in /usr/bin, while homebrew installs Git LFS in usr/local/bin. Unfortunately, both Git and Git LFS need to be in the same directory to work. In addition, you need to set the PATH so that your shell looks in these directories when git is invoked. For a brief foray into PATH enlightenment see Chris Bednarski’s blog post. Lastly, you need to commit a .gitattributes file to your Git repository specifying which files should be handled using git LFS. Got all that? Here are the steps. Make sure you do not have any large files ($>=$100 MB) commits before doing this!

  1. Install homebrew from the terminal.

    bash /usr/bin/ruby -e "$(curl -fsSL
  2. Install Git using homebrew.

    brew install git
  3. Install Git LFS using homebrew.

    brew install git-lfs
  4. Make sure that Git and Git LFS are both located in the same directory, your /usr/local/bin.

  5. Add /usr/local/bin to your path.

    echo 'export PATH="/usr/local/bin:/usr/local/sbin:~/bin:$PATH"' >> ~/.bash_profile
  6. Type which git in the terminal to make sure your shell defaults to the git installation in /usr/local/bin.

  7. If using Tower, select this version of Git under Preferences, Git Config:

alt text

  1. Before you do any commits involving large files, either create or amend the .gitattributes file to associate a file type with Git FLS. In our case this will be all rda files so we type:

    git lfs track *.rda
  2. Commit this change to .gitattributes and push it to the remote Git repository.

III. Using SSH with your Git repository

We will now generate an SSH key (two files - a public key that you share with the world and a private key you keep safe) that we will associate with our Git account. This will allow us to clone our Git repository on an EC2 instance without having to manually type in your username and password or (worse yet) put your password in cleartext when using a script.

  1. In terminal create an SSH key, substituting your email address.

    SSH-keygen -t rsa -b 4096 -C
  2. Save the key to the default directory, ~/.ssh.

  3. Skip entering a pass-phrase.

  4. Check that the public and private key are in ~/.ssh by going to the directory and typing ls -l id_rsa*. You should see two files, the public key named and the private key named id_rsa.

    -rw-r--r--  1 coldwater  staff  3243 Mar 15 10:19 id_rsa
    -rw-r--r--  1 coldwater  staff   743 Mar 15 10:19
  5. From the terminal, make sure this private key is not publicly viewable.

    chmod 600 ~/.ssh/id_rsa
  6. Check that this worked by typing ls -l id_rsa*. Notice that now the private key can only be read and written to by me, while the public key can still be read by everyone.

    -rw-------  1 coldwater  staff  3243 Mar 15 10:19 id_rsa
    -rw-r--r--  1 coldwater  staff   743 Mar 15 10:19
  7. Go to the settings under your GitHub account and then click SSH keys and New SSH key.