Build Your City in the Cloud
Recently I’ve been using several of Amazon’s Web Services for computing, storage, and turning text into voice and the experience has been great! With AWS you can easily rent a super-computer or archive some of the huge files you have laying around. Many of these services are accessible through a series of R packages written by Thomas Leeper and his colleagues as part of the Cloudyr project. Their packages are well designed and easy to use, however the initial setup of configuring AWS credentials can be confusing for folks who have never used AWS before.
So let’s get started setting you up with Amazon Web Services. First you’ll need to log into your Amazon account by signing into https://aws.amazon.com/. If you already have an Amazon account that you use to buy daily necessities then you can log in with that account. After providing some basic information you should see the AWS Console. This page has a search bar that looks like this:
Type “IAM” into the search bar and click on the first result, Amazon’s Identity and Access Management system. On the left side of the IAM dashboard you should see a menu item called “Users”
Click on “Users” and then you should see yet another dashboard which will list the users you create. If you’re new to AWS then you shouldn’t see any user names listed! Click on the blue “Add user” button to create a new user:
In the user name field just enter a name that you can easily remember, and make sure to check the check box to give this user programmatic access:
After you’ve checked the box, you can click Next: Permissions, then you should see the screen below.
On this page you first need to select how you’re going to assign AWS permissions
to this user. Click Attach existing policies directly so that you can
specify which services you’re going to use. In this case, we’re going to take
aws.s3 package for a spin, so type “s3” into the search bar in the middle
of the page:
Amazon S3 is just a big hard drive that you can access over the internet (like a programmable Dropbox). Check the box next to AmazonS3FullAccess so that you can create and destroy files in Amazon S3. If you wanted to use a different web service, you could search for the name of that service here to provide even more access to this user. When you’re finished, click Next: Review. On the following screen click Create user in order to get your credentials. You should now see this screen:
In order to use AWS in R you need two strings: the access key ID and the secret access key. Click show under the secret access key heading in order to see the secret access key. Copy and paste each key into a file for safe keeping. Never share these keys since anyone with them can use your AWS account (the keys in this screenshot are fake). After you leave this page you will never be able to retrieve the secret access key, so make sure you’ve recorded it. Don’t worry if you lose your secret access key though, you can just create a new user to get a new key. After you have both keys you can click the Close button. You’re now ready to start using AWS in R!
Now let’s test out our big cloud hard drive with the
aws.s3 package which you
can install with
install.packages("aws.s3"). To use the package you’ll need to
set up three environmental variables like I have below by specifying the
access key ID, and the secret access key which we just set up, plus the AWS
region which specifies the Amazon server farm geographically closest to you. You
can get a list of all regions
for all services here,
and you click here
to see which S3 regions are available. Load up the library and set up your keys
and region like so:
Let’s create a fresh CSV file that we can store on Amazon’s servers:
Now let’s upload this file to S3. Amazon organizes S3 into buckets, which
are like named cloud hard drives. You have to come up with a hard drive name
that nobody else has used before, so we’re going to introduce some randomness
to come up with a unique bucket name. The
put_bucket() function creates the
Now we can put files in our bucket using
put_object() by specifying which
bucket we want to use:
We can then pull files out of the bucket given the name of the file with the
save_object() function. Let’s make sure the file we pulled out of the bucket
is the same as the file we put in!
##  TRUE
Looks like it works! Now let’s delete the hard drive we created since we won’t need to use it anymore.
Amazon Web Services has a pretty generous free tier, so as long as you always clean up after yourself during these experiments like these you shouldn’t incur any charges. Have fun playing in the clouds!