Introduction to OpenStack

Tom Fitfield

Issue #235, November 2013

You've probably heard of OpenStack. It's that cloud software that's getting a lot of attention from a lot of big names in the IT industry and major users like CERN, Comcast and PayPal. However, did you know that it's more than that? It's also the fastest-growing Open Source community in the world, and a very interesting collaboration among technology vendors.

OpenStack is really truly open. If you want to test it out, you can stop reading this article (come back after—we'll still be here), visit status.openstack.org/release, and get a real-time list of features that are under development. Select one, follow it to the code review system, and give your comments. This is an example of Open Development, just one of the “Four Opens” on which OpenStack was founded. Of course, you likely already know that OpenStack is fully released under the Open Source Apache 2 license, no bits reserved, but did you also know about the principle of “Open Design”?

The software is released on a six-month cycle, and at the start of each cycle, we host a Design Summit for the contributors and users to gather and plan out the roadmap for the next release. The Design Summit has become part of an increasingly large conference that also hosts workshops for newcomers, inspiring keynotes and some fairly amazing user stories. However, somewhere tucked away are rooms with chairs arranged in semicircular layouts ensconcing dozens of developers engaged in robust discussion, taking notes on a collaborative document displayed on a projector. This is where the roadmap of OpenStack is determined, pathways for implementations of features are aligned, and people volunteer to make it reality. This is the Open Design process, and we welcome your participation.

OpenStack may have started with just two organizations, NASA and Rackspace, but now there are hundreds, and with each additional member, the community grows stronger and more cohesive. Every morning the OpenStack developer awakes to a torrent of e-mail from the discussions in other countries, and the momentum is best described as “intense”.

Overall, OpenStack's standout feature is this strong community. It's extremely diverse, composed of very different technical backgrounds (Python developers to packagers to translators) and different philosophical backgrounds (free software evangelists to hard-core capitalists). It's also very widespread, with the OpenStack Foundation claiming membership from 130 countries as of October 2013. So, dear reader, chances are there is a place for you.

Becoming a Contributor

During a 12-month period, OpenStack typically has more than 1,000 software developers contributing patches. Despite this, we always need more.

You can find detailed instructions on how to get started on the wiki (wiki.openstack.org/HowToContribute), but let me run through a couple key aspects:

  • OpenStack uses Launchpad for bugs and GitHub for code hosting, but neither of those places for contributing code. That's right, no GitHub pull requests are accepted. Don't panic yet—this is for a good reason: all patches to OpenStack go through an extensive code review and testing process (https://wiki.openstack.org/wiki/Gerrit_Workflow).

  • Every code change in OpenStack is seen by at least three people (the owner and two of the core reviewers for the project), but often many more, and test cases and PEP8 compliance is required. The patches also are run through the continuous integration system, which effectively builds a new cloud for every code change submitted, ensuring that the interaction of that piece of software is as expected with all other parts of OpenStack. As a result of this quest for quality, many have found that contributing has improved their Python coding skills.

Of course, not everyone is a Python developer. However, don't worry; there's still a place for you to work with us to change the face of cloud computing:

  • Documentation: in many cases, the easiest way to become an OpenStack contributor is to participate in the documentation efforts. It requires no coding, just a willingness to read and understand the systems that you're writing about. Because the documentation is treated like code, you will be learning the mechanics necessary to make contributions to OpenStack itself by helping with documentation. Visit https://wiki.openstack.org/wiki/Documentation/HowTo for more information.

  • Translation: if you know how to write in another language, translations of OpenStack happen through an easy-to-use Web interface (https://www.transifex.com/projects/p/openstack). For help, contact the Internationalization team (https://wiki.openstack.org/wiki/I18nTeam).

  • File bugs: OpenStack is one of the few projects you will work on that loves to hear your angry rants when something does not work. If you notice something off, consider filing a bug report with details of your environment and what you experienced at docs.openstack.org/trunk/openstack-ops/content/upstream_openstack.html.

  • Asking questions: a StackOverflow-style board for questions about OpenStack is available at ask.openstack.org. Feel free to use it to ask yours, and if you have the ability, stick around and try to answer someone else's (or at least vote on the ones that look good).

  • Evangelism: do you think OpenStack is pretty cool? Help us out by telling your friends; we'd really appreciate it. You can find some materials to help at openstack.org/marketing, or join the marketing mailing list to find out about some cool events to attend.

  • User groups: join your local user group (https://wiki.openstack.org/wiki/OpenStackUserGroups). They're in about 50 countries so far. Attend to learn, or volunteer to speak. We'd love to have you.

Navigating the Ecosystem: Where Does Your OpenStack Journey Begin?

One of the unique aspects about OpenStack as an open-source project is that there are many different levels at which you can begin to engage with it. You don't have to do everything yourself.

Starting with Public Clouds, you don't even need to have an OpenStack installation to start using it. Today, you can swipe your credit card at eNovance, HP, Rackspace and others, and just start migrating your applications.

Of course, for many people, the enticing part of OpenStack is building their own private clouds, and there are several ways to do that. Perhaps the simplest of all is an appliance-style solution. You purchase a “thing”, unbox it, plug in the power and the network, and it just is an OpenStack cloud.

However, hardware choice is important for many applications, so if that applies to you, consider that there are several software distributions available. You can, of course, get enterprise-supported OpenStack from Canonical, Red Hat and SUSE, but take a look also at some of the specialized distributions, such as those from Rackspace, Piston, SwiftStack or Cloudscaling.

If you want someone to help guide you through the decisions from the hardware up to your applications, perhaps adding in a few features or integrating components along the way, consider contacting one of the system integrators with OpenStack experience like Mirantis or Metacloud.

Alternately, if your preference is to build your own OpenStack expertise internally, a good way to kick start that might be to attend or arrange a training session. The OpenStack Foundation recently launched a Training Marketplace (www.openstack.org/marketplace/training), where you can look for events nearby. There's also a community training effort (https://wiki.openstack.org/wiki/Training-manuals) underway to produce open-source content for training.

To derive the most from the flexibility of the OpenStack framework, you may elect to perform a DIY solution. In which case, we strongly recommend getting a copy of the OpenStack Operations Guide (docs.openstack.org/ops), which discusses many of the decisions you will face along the way. There's also a new OpenStack Security Guide (docs.openstack.org/sec), which is an invaluable reference for hardening your installation.

DIY-ing Your OpenStack Cloud

If, after careful analysis, you've decided to construct OpenStack yourself from the ground up, there are a number of areas to consider.

Storage

One of the most fundamental underpinnings of a cloud platform is the storage on which it runs. In general, when you select storage back ends, ask the following questions:

  • Do my users need block storage?

  • Do my users need object storage?

  • Do I need to support live migration?

  • Should my persistent storage drives be contained in my compute nodes, or should I use external storage?

  • What is the platter count I can achieve? Do more spindles result in better I/O despite network access?

  • Which one results in the best cost-performance scenario I'm aiming for?

  • How do I manage the storage operationally?

  • How redundant and distributed is the storage? What happens if a storage node fails? To what extent can it mitigate my data-loss disaster scenarios?

  • Which plugin do I use for block storage?

For many new clouds, the object storage and persistent/block storage are great features that users want. However, with OpenStack, you're not forced to use either if you want a simpler deployment.

Many parts of OpenStack are pluggable, and one of the best examples of this is Block Storage, which you are able to configure to use storage from a long list of vendors (Coraid, EMC, GlusterFS, Hitachi, HP, IBM, LVM, NetApp, Nexenta, NFS, RBD, Scality, SolidFire, Windows Server and Zadara).

Network

If this is the first time you are deploying a cloud infrastructure in your organization, after reading this section, your first conversations should be with your networking team. Network usage in a running cloud is vastly different from traditional network deployments, and it has the potential to be disruptive at both a connectivity and a policy level.

For example, you must plan the number of IP addresses that you need for both your guest instances as well as management infrastructure. Additionally, you must research and discuss cloud network connectivity through proxy servers and firewalls.

One of the first choices you need to make is between the “legacy” nova-network and OpenStack Networking (aka Neutron). Nova-network is a much simpler way to deploy a network, but it does not have the full software-defined networking features of Neutron, and it will be deprecated after 12–18 months.

Object Storage's network patterns might seem unfamiliar at first. Consider these main traffic flows:

  • Among object, container and account servers.

  • Between those servers and the proxies.

  • Between the proxies and your users.

Object Storage is very “chatty” among servers hosting data—even a small cluster does megabytes/second of traffic, which is predominantly “Do you have the object?”/“Yes I have the object.” Of course, if the answer to the aforementioned question is negative or times out, replication of the object begins.

Consider the scenario where an entire server fails, and 24TB of data needs to be transferred “immediately” to remain at three copies—this can put significant load on the network.

Another oft-forgotten fact is that when a new file is being uploaded, the proxy server must write out as many streams as there are replicas—giving a multiple of network traffic. For a three-replica cluster, 10Gbps in means 30Gbps out. Combining this with the previous high-bandwidth demands of replication is what results in the recommendation that your private network is of significantly higher bandwidth than your public need be. Oh, and OpenStack Object Storage communicates internally with unencrypted, unauthenticated sync for performance—you do want the private network to be private.

The remaining point on bandwidth is the public-facing portion. Swift-proxy is stateless, which means that you easily can add more and use HTTP load-balancing methods to share bandwidth and availability between them.

More proxies mean more bandwidth, if your storage can keep up.

“Cloud Controller”

To achieve maximum scalability via a shared-nothing/distributed-everything architecture, OpenStack does not have the concept of a “cloud controller”. Indeed, one of the biggest decisions deployers face is exactly how to segregate out all of the “central” services, such as the API endpoints, schedulers, database servers and the message queue.

For best results, acquiring some metrics on how the cloud will be used is necessary. Although of course, with a proper automated configuration management system, it will be possible to scale as operational experience is gained. Key questions to consider may include:

  • How many instances will run at once?

  • How many compute nodes will run at once?

  • How many users will access the API?

  • How many users will access the dashboard?

  • How many nova-API services do you run at once for your cloud?

  • How long does a single instance run?

  • Does your authentication system also verify externally?

When choosing the size of compute node hardware is mainly dependent on the types of virtual machines running, “central service” machines can be more difficult. Contrast two clouds running 1,000 virtual machines. One is mainly used for long-running Web sites, and in the other the average lifetime is more akin to an hour. With so much churn in the latter, it certainly will need a more heavyset API/database/message queue.

Scaling

Given that “scalability” is a key word in OpenStack's mission, it's no surprise that there are several methods dedicated to assisting the expansion of your cloud by segregating it—in addition to the natural horizontal scaling of all components.

The first two are aimed at very large, multisite deployments. Compute cells are designed to allow running the cloud in a distributed fashion without having to use more complicated technologies or being invasive to existing nova installations. Hosts in a cloud are partitioned into groups called cells. Cells are configured in a tree. The top-level cell (API cell) has a host that runs the API service, but no hypervisors. Each child cell runs all of the other typical services found in a regular installation, except for the API service. Each cell has its own message queue and database service, and it also runs the cell service, which manages the communication between the API cell and child cells.

This allows for a single API server to be used to control access to multiple cloud installations. Introducing a second level of scheduling (the cell selection), in addition to the regular nova-scheduler selection of hosts, provides greater flexibility to control where virtual machines are run.

Contrast this with regions. Regions have a separate API endpoint per installation, allowing for a more discrete separation. Users wishing to run instances across sites have to select a region explicitly. However, the additional complexity of a running a new service is not required.

Alternately, you can use availability zones, host aggregates or both to partition a compute deployment.

Availability zones enable you to arrange OpenStack Compute hosts into logical groups, and they provide a form of physical isolation and redundancy from other availability zones, such as by using separate power supplies or network equipment.

You define the availability zone in which a specified Compute host resides locally on each server. An availability zone is commonly used to identify a set of servers that have a common attribute. For instance, if some of the racks in your data center are on a separate power source, you can put servers in those racks in their own availability zone. Availability zones also can help separate different classes of hardware.

When users provision resources, they can specify from which availability zone they would like their instance to be built. This allows cloud consumers to ensure that their application resources are spread across disparate machines to achieve high availability in the event of hardware failure.

Host aggregates, on the other hand, enable you to partition OpenStack Compute deployments into logical groups for load balancing and instance distribution. You can use host aggregates to partition an availability zone further. For example, you might use host aggregates to partition an availability zone into groups of hosts that either share common resources, such as storage and network, or have a special property, such as trusted computing hardware.

A common use of host aggregates is to provide information for use with the compute scheduler. For example, you might use a host aggregate to group a set of hosts that share specific images.

Another very useful feature for scaling is Object Storage Global Clusters. This adds the concept of a Region and allows for scenarios like having one of the three replicas in an off-site location. Check the SwiftStack blog for more information (swiftstack.com/blog/2012/09/16/globally-distributed-openstack-swift-cluster).

Summary

If you've looked at the storage options, determined which types of storage you want and how they will be implemented, planned the network carefully (taking into account the different ways to deploy it and how it will be managed), acquired metrics to design your cloud controller, then considered how to scale your cluster, you probably are now an OpenStack expert. In which case, we'd encourage you to customize your deployment and share your findings with the community.

After learning about scalability in computing from particle physics experiments like ATLAS at the Large Hadron Collider, Tom led the creation of the NeCTAR Research Cloud. Tom is currently harnessing his passion for large-scale distributed systems by focusing on their most important part, people, as Community Manager at the OpenStack Foundation. Tom is a co-author of the OpenStack Operations Guide, a documentation core reviewer and a contributor in a small way to Compute (Nova), Object Storage (Swift), Block Storage (Cinder), Dashboard (Horizon) and Identity Service (Keystone).