Python in the Cloud

Adrian Klaver

Issue #210, October 2011

A basic introduction to using the Python boto library to interact with AWS services and resources.

This article explores using the boto library to work with resources in the Amazon Web Services (AWS) cloud. For those wondering, the boto name refers to a species of freshwater dolphin found in the Amazon river. boto is maintained by Mitch Garnaat who works on the Eucalyptus team at Canonical. The library covers many, though not all, services offered by AWS (see the boto Web site for a current listing).

But, before I start on the use and care of boto, let me take a little diversion. It probably goes without saying, but you need to set up an AWS account if you want to play along. If you already have an Amazon account, you just need to register with Amazon at the AWS site (see Resources) to set up your AWS account. Otherwise, you need to set up an Amazon account first. As part of the setup for your AWS account, you will be issued some credentials (you will be using those later). Note, if you are a new user to AWS, check out the free tier offer (see Resources). This allows you to kick the tires at no charge for the most part. For this article, I try to stick to the offer restrictions as much as possible.

With the AWS setup out of the way, let's move on to installing boto. At the time of this writing, the current version is 2.0b4, which is what I use in this article. boto is available from the Python Package Index (PyPi), and you can install it with easy_install boto. Remember, if you want it to be a system-wide library, do the previous as a superuser. You also can go to either PyPi or the boto site and download a tarball, and then do a Python setup.py install. The PyPi site has the latest version only; the boto site has a variety of versions available.

Now that the housekeeping's done, it's time to play. As mentioned above, boto allows you to access many of the AWS services—in fact, many more than I have space to cover here. This article covers the Amazon Machine Image (AMI), Elastic Block Store (EBS), Elastic Compute Cloud (EC2), Simple Storage Service (S3) and Simple Notification Service (SNS). Where an AMI is a virtual machine, an EBS is a virtual hard drive, EC2 is the cloud you run an AMI/EBS combo in, S3 is key/object storage and SNS is a messaging system from the cloud. In all these cases, account information is needed for boto to access AWS. You'll need the AWS AccessKey and the AWS Secret Access Key. They were created when you signed up, but if you did not record them, don't fret. They are available in your AWS account page as Security Credentials.

To make things easier and more secure, there are options to include the information in the connection string. The first option is to create the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY ENV variables and set them equal to the respective keys. The other option is to create a .boto file in your home directory, and using ini style, set the keys (Listing 1).

Before going further, I should point out that AWS has a Web-based management tool for most of its services. It is called the AWS Management Console, and you can reach it by going to the AWS site and clicking on the Account tab at the top of the page (I make reference to and use this tool for some of what is to follow). The management console also is handy for confirming that the boto code actually is doing what it is supposed to be doing. Just remember to click the Refresh button at the top of the Console when in doubt.

What follows is a basic look at the interaction between boto and AWS. The code is kept simple both to conserve space and illustrate basic principles. To get started, I need to create an AMI of my own to work with. The easiest way to do that is to grab an existing public image and make it private. I use the Ubuntu images maintained by Canonical for my own use. The best place to find what is available is the Alestic site (see Resources). It has a handy table labeled “Ubuntu and Debian AMIs for Amazon EC2” with tabs for the AWS regions. Click on the region you want, and then select the appropriate image. For this article, I selected the Ubuntu 10.04 LTS 64-bit EBS AMI in the US-East-1 region with an AMI ID of ami-2ec83147 (the ID may be different as new images are built, see the EBS/S3 AMI Sidebar). Clicking on the arrow next to the image takes me to the AWS Management Console (after login) to start an instance from the image.

To make use of the free tier, I selected the micro instance. At this point, there is an instance of a public image running on my account. To make a private image, I could use the management console by right-clicking on the instance and selecting Create Image, but where is the fun in that? Let's use boto to do it instead (Listing 2). It's simple. Import the boto convenience function connect_ec2, and note the lack of access credentials in the connection code. In this case, they are in a .boto file in my home directory. Then, I use create_image() to create and register a private AMI using the running instance (i-c1315eaf), launched from the management console, with the name lj_test. The create_image function returns the AMI ID—in this case, ami-7eb54d17.

The next step is to put the image to use by launching an instance or instances of it. For this, see the line in Listing 2 with con.run_instances(). The AMI previously created is used as the template for the instances.

The min_count and max_count parameters need some explaining. AWS has a default limit of 20 instances that can be running per region (this can be increased by filling out a request form) per account. This is where the min/max count comes into play. If I already had 18 instances running in a region and I set the min_count at 2 and the max_count at 5, two new instances will be launched.

The key name is an SSH keypair generated by Amazon that is used for SSH access to the instance(s).

The instance_type is set to the EC2 micro instance, and it is slotted to run in the us-east-1d availability zone by the placement parameter. Let's take a brief side trip on AWS availability zones. Availability zones are relative to your account. In other words, my us-east-1d may not represent the same physical availability zone as your us-east-1d.

The final parameter disable_api_termination=True is a lock-out mechanism. It prevents the termination of an instance using API termination calls. An instance with this protection in force has to have that parameter first changed to False and then have a termination command given in order for it to go away. See the modify_instance_attribute() line for how to undo the termination protection on a running instance.

Assuming a successful launch, run_instances() will return a boto reservation class that represents the reservation created by AWS. This class then can be used to find the instances that are running, assuming it was captured and iterated over right then. A more generic approach is to use the get_all_instances() function to return a list of reservation classes that match the criteria given. In Listing 2, I use the filter parameter to limit my search of instances to those created from the AMI created above and that are actually running.

So, where do you find what filters are available? The boto documentation does not list the filters, but instead points you at the source, which is the AWS developer documentation for each resource (see Resources). Drilling down in that documentation gets you to an API Reference where the options to the various actions are spelled out. For this particular case, the information is in the EC2 API (see Resources) under Actions/DescribeImages. There is not a complete one-to-one correspondence between the boto function names and the AWS API names, but it is close enough to figure out.

Having run the function, I now have a list of reservations. Using this list, I iterate through the reservation classes in the list, which yields an instance list that contains instance classes. This, in turn, is iterated over, and various instance attributes are pulled out (Figure 1).

Figure 1. EC2 Instance Information

Having created an AMI and launched instances using that image, now what? How about backing up the information contained in the instance? AWS has a feature whereby it is possible to create a snapshot of an EBS volume. What is nice about this is that the snapshots are incremental in nature. Take a snapshot on day one, and it represents the state of the volume at that time. Take a snapshot the next day, and it represents only the differences between the two days. Furthermore, snapshots, even though they represent an EBS volume, are stored as S3 data. The plus to this is that although monthly charges for EBS are based on the size of the volume, space used or not, S3 charges are based on actual space used. So, if you have an EBS volume of 40GB, you are charged 40GB/month. Assuming only half of that is used, the snapshot will accrue charges of roughly 20GB/month. The final feature of a snapshot is that it is possible to create an EBS volume from it that, in turn, can be used to create a new AMI. This makes it relatively easy to return to some point in the past.

To make the snapshot procedure easier, I will create a tag on the instance that has the EBS volume I want to snapshot, as well as the volume itself. AWS supports user-defined tags on many of its resources. The create_tags() function is a generic one that applies a tag to the requested resource(s). The first parameter is a list of resource IDs; the second is a dictionary where the tag name is the key and the tag value is the dictionary value. Knowing the Name tag for the instance, I use get_all_volumes() with a filter to retrieve a volume class. I then use the volume class to create the snapshot, giving the snapshot a Description equal to the volume Name tag plus the string Snap. Though the create_snapshot() will return a snapshot ID fairly quickly, the snapshot may not finish processing for some time. This is where the SNS service comes in handy.

SNS is exactly that, a simple way of sending out notifications. The notifications can go out as e-mail, e-mail in JSON format, HTTP/HTTPS or to the AWS Simple Queue Service (SQS). I don't cover SQS in depth here; just know it is a more robust and featured way of sending messages from AWS.

The first step is to set up a notification topic. The easiest way is to use the SNS tab in the AWS management console. A topic is just an identifier for a particular notification. Once a topic is created, subscriptions can be tied to the topic. The subscriptions do not take effect until they are confirmed. In the case of subscriptions going to e-mail (what I use here), a confirmation e-mail is sent out with a confirmation link. Once the subscription is confirmed, it is available for use.

As I alluded to earlier, I demonstrate monitoring the snapshot creation process with a notification upon completion (Listing 2). The function check_snapshot() takes a snapshot class returned by create_snapshot and checks in on its progress at regular intervals. Note the snap.update(). The AWS API is Web-based and does not maintain a persistent connection. Unless the snapshot returns a status of “completed” immediately, the while loop will run forever without the update() method to refresh the status.

Once the snapshot is completed, a message is constructed using the snap attributes. The message then is published to the SNS topic, which triggers the subscription to be run, and an e-mail should show up shortly. The ARN shown stands for Amazon Resource Name. It is created when the SNS topic is set up and represents the system address for the topic. Note that the simple part of SNS is that there is no delivery confirmation. The AWS API will throw an error if it can't do its part (send the message), but receiving errors are not covered. That's why one of the sending options for SNS is SQS. SQS will hold a message and resend a message for either a certain time period or until it receives confirmation of receipt and a delete request, whichever comes first.

So, there you have it—a basic introduction to using the Python boto library to interact with AWS services and resources. Needless to say, this barely scratches the surface. For more information and to keep current, I recommend the blog at the Alestic site for general AWS information and Mitch Garnaat's blog for boto and AWS information.

Adrian Klaver (adrian.klaver@gmail.com), having found Python, is on a never-ending quest to explore just how far it can take him.