At the Forge

Multitenant Sites

Reuven M. Lerner

Issue #248, December 2014

One server and one program can service many sites. Here's an introduction to “multitenant” applications.

For some time now, there has been tremendous growth in the world of Web applications. It's quite amazing to see what you can do just via a Web browser—not only can you buy just about anything, but also a growing number of sites offer “software as a service”, often abbreviated as SaaS. The idea is that in exchange for a monthly service fee, you get access to a service. Many thousands of such services exist that take care of anything from Git repositories (for example, GitHub and BitBucket), e-mail services (for example, AWeber and MailChimp), invoicing systems, time-tracking systems, calendar systems, e-commerce systems, e-learning systems—you name it.

As Web developers, you can create your own SaaS applications. That's right—with little more than a Linux box, a database, a programming language and a Web framework, you're positioned to create a new SaaS application. With a good idea, some hard work and good marketing, you'll be on your way to having a successful business.

There are numerous models for how SaaS can work. Sometimes, you have a user name on a system, and you're simply interacting with your view of the world. But sometimes, an SaaS app gives you what appears to be an entirely new domain. So if I get an account on SuperDuperSaas.com, everything I do will be under lerner.SuperDuperSaas.com.

Programs allowing for this are known as “multitenant” applications. It's possible, of course, that each new subdomain involves the rollout of a new virtual machine. But there also are ways that you can make a single computer, with a single instance of the application, provide the same illusion of an infinite number of domains. Moreover, doing so is not nearly as difficult as you might think.

In this article, I look at several techniques that make it possible for you to create and maintain such multitenant applications. These techniques can be used in an SaaS product or any other application in which the software can and should respond differently to a variety of hostnames or domain names.

It's All Thanks to HTTP

HTTP, the Hypertext Transfer Protocol, is so ubiquitous that most people barely give it any thought. Even someone like me, who works nearly every day on Web applications, knows that HTTP exists and what it does—and yet, I don't think about it too much. However, multitenant applications owe their existence to growth in the earliest days of the Web.

The first version of HTTP that I encountered, back in 1993, was described as version 0.9. That version was a simpler protocol than the one we know today, but it already included the basic GET and POST actions—that is, you could connect to an HTTP server on port 80, and say:

GET /

The server would, if all went well, send the contents of its home page (typically formatted with HTML) back to the HTTP client. At that point, the connection would close.

Although HTTP 0.9 worked well for many simple cases, the explosive growth of the Web meant that it wasn't good enough for many complex ones. One particularly common, and particularly painful, case was that of Web hosting companies: HTTP 0.9 required that each Web site have its own IP address. If you set up a Linux-based server with a single IP address but multiple hostnames, it wasn't possible for the HTTP server to distinguish between them.

This changed when HTTP 1.0 was released and required that a “Host” header be sent along with the action and pathname. Now, a simple request looked like:

GET / HTTP/1.0
Host: lerner.co.il

The first line changed, such that it incorporated the version number of HTTP that was being used. This was done so as to have backward compatibility with HTTP 0.9 clients. The second line was defined to be the first of several “request headers”, name-value pairs that could be sent from the client to the server.

These request headers have grown in scope through the years, and now include everything from the hostname to cookies to content type to caching information. But for my purposes in this article, the most important part of this request was the “Host” request header. Given that a server now could distinguish between different hosts, even on the same IP address, it was possible to have a single server provide Web hosting capabilities for any number of different domains and hostnames.

In other words, it was now possible to have the same Web server provide hosting to CompanyA.com and CompanyB.com, without either knowing of or seeing each other. The Web server would know to route requests for CompanyA.com to one directory of programs and HTML files, and CompanyB.com to a second, completely separate directory of programs and HTML files.

This might be obvious to anyone who knows about domains, hostnames and DNS, but from the perspective of the server, it didn't matter if it had to distinguish CompanyA.com from CompanyB.com, or abc.CompanyA.com from def.CompanyA.com. That is, different hostnames within the same domain were treated similarly to different domains. True, DNS and HTTP server configuration files made it easier to send *.CompanyA.com to the same location, but at the end of the day, your HTTP server sees different hostnames and, thus, can react differently.

“Virtual hosts”, as they became known, shared an IP address and a computer, and so from the perspective of a programmer or IT manager, they were all under the same umbrella. From the perspective of the outside world, these were completely different Web sites. Perhaps they shared an IP address, and thus a hosting provider, but that was the only thing they had in common.

Multitenant

Today, it's trivial to service different hostnames under the same HTTP server. As I indicated previously, you simply tell Apache (or nginx, or whatever HTTP server you use) that the two hosts exist in different directories, and that they should be treated differently. With such a configuration in place, there is no connection whatsoever between the different hostnames. This actually makes it easier to move Web sites from one machine to another. You scoop up the virtual host's configuration file and move it to another machine, along with the programs and static assets—that is, HTML files and images.

Indeed, a huge industry of cheap, on-demand Web hosting perhaps has made this the most common way servers are allocated and used. Even my own personal server has five to ten different virtual hosts on it at any given time, between personal projects and demos of client applications.

A multitenant application turns this idea on its head. Rather than using a single server, with a single IP address, to service a large number of different applications, each with its own hostname, you will have many different instances of the same application. That is, you'll have both CompanyA.com and CompanyB.com point not only to the same IP address, but also to the same instance of your Web application.

This might sound strange, until you consider that because modern versions of HTTP always pass a “Host” header, and because all of the HTTP request headers are available to a Web application, you can write a single application that will work on multiple hosts. Consider that BigCompany.com has two different divisions and a separate Web site for each division. The site should be completely identical in both cases, except that the contact phone number and address should reflect the coast that the user has reached.

You can use the “Host” request header in an “if” statement inside the application, and thus display the information that is appropriate. This is a classic example of multitenant sites, although it's certainly not the most complex of them.

Multitenant with Sinatra

Let's implement the above scenario using Sinatra, a very small and lightweight Web application framework written in the Ruby language. In the July 2014 issue of LJ, I covered a similarly small framework, known as Flask, written in Python. Such frameworks often are perfect for simple sites and example code.

There are a number of ways that you can create a Sinatra application. My preference is to do so in a directory, along with a Gemfile and a config.ru file. This took less than five minutes for me to set up on my own computer. First, I created a directory called “multiatf”. In that directory, I created a file called “Gemfile”, which is where I will name the Ruby gems I'll be using for this application:

source 'https://rubygems.org'
gem "sinatra", :require => "sinatra/base"
gem 'shotgun'

The first line says that I want to retrieve gems from Rubygems.org, the official and standard location. The second line says that I want to use the “sinatra” gem, but that I don't want to require “sinatra”, but rather “sinatra/base”. Finally, I name the “shotgun” gem, which provides for automatic reloading of Sinatra apps—precisely the sort of thing I want when I'm developing an application.

Before continuing, I then run bundle install, which ensures that all of the gems named in the Gemfile have been installed. It creates a file named “Gemfile.lock”, which lists the precise names and versions of each gem I'll be using in my application. This list includes those gems I have named explicitly and those upon which my named gems depend. It is worth taking a look at Gemfile.lock sometime; it may well give you insights into how your Sinatra and Rails applications work.

Next, I write a “config.ru” file, sometimes known as a “rackup file”, which tells Rack—Ruby's standard interface between HTTP servers and applications—where my application's code is located and how to execute it. The file looks like this:

require 'bundler'

Bundler.require

require './multiatf.rb'
run Sinatra::Application

The first line loads the “bundler” gem. Bundler is an increasingly indispensable gem in the Ruby world, in that it manages the versions of your gems for you, ensuring that they will not require conflicting versions of a gem. After loading Bundler, you then use the “require” class method, which reviews your Gemfile.lock and loads the gems named within.

Next, the “require” statement reads a Ruby file named “multiatf.rb” in the current directory. That is the actual application code, and it's the file I will be writing and modifying most of all. Loading it means that Ruby will read the contents of the code. In the case of my Sinatra app, that means taking the various “get” and “post” declarations and turning them into the appropriate routing map, such that the appropriate code block is executed for each URL.

Then, once the application has been loaded, config.ru invokes Sinatra::Application. That starts the application up and running.

The final step in putting the application together is the multiatf.rb file. This also consists of very little code, but potentially could be quite large:

require 'sinatra'

get '/' do
  "Hello from server '#{request.host}'"
end

The first line loads the Sinatra code. Next is something that looks vaguely like a method definition, but isn't. Rather, it tells Sinatra that if someone makes a request to the / URL, it should return a string. In this case, the string isn't static, but rather contains a dynamic portion, including the value of “request.host”. As you can imagine, this value will vary according to the hostname you are using.

To start this up on my development machine, I ran:

shotgun multiatf.rb

This produces output telling me that Shotgun is now running my application on port 9393, using Ruby's built-in WEBrick server. I can now go to my Web browser, and load up http://localhost:9393, and because of the get / declaration in my Sinatra file, that method is fired. I get a nice message telling me:

"Hello from server 'localhost'"

But, what if it isn't localhost? What if I go to another server name? For example, I added the following two lines to my /etc/hosts file:

127.0.0.1 atf1
127.0.0.1 atf2

In other words, when I tell my Web browser to go to host “atf1”, it'll go to 127.0.0.1, and it will send, in the “Host” HTTP request header, the server name “atf1”. The output then will be:

Hello from server 'atf1'

The same will be true for “atf2”.

Showing Different Content

Thus, you've seen how you can have different output, based on the value of the server name. This seemingly simple fact opens the door to the entire world of multitenant systems. For example, you could imagine a company doing business under a variety of names, which would want to have the same Web application running, but showing the current domain name. All you have to do is change your strings, or your templates, to reflect the current hostname.

In many cases, showing different hostnames isn't enough. You may want to show a different business name or a different address. In order for that to happen, you'll need some additional data. The best and most scalable way to do this is a relational database, but you can simulate one with a Ruby hash that will be good enough for the purposes of this article.

In this case, let's define the hash such that it contains two keys, one for each of the hosts to recognize. Then, let's pull out the company's name from the hash, based on the key.

I thus change multiatf.rb to read as follows:

require 'sinatra'

hosts = {'atf1' => {name: 'First ATF site',
                    address: '111 Main Street'},
         'atf2' => {name: 'Second ATF site',
                    address: '222 Elm Street'}
        }

get '/' do
  "Welcome to '#{hosts[request.host][:name]}', located at
    ↪'#{hosts[request.host][:address]}'!"
end

The idea here is simple, but the effects are profound. This is how each domain can appear different, even if the content is the same. You can imagine going even further than this, pulling in a different CSS stylesheet to an HTML page based on the hostname, or having it show different pictures.

If you are using a relational database, you can enter each new tenant site in a table, giving each a unique ID number. You then can use that ID as a foreign key, adding (for example) this “site_id” value in a table describing merchandise. For example, one of my clients manages about 30 different sites, each with its own set of real-estate offerings. These 30 sites are actually running on a single Web application, with a single database. However, based on the hostname through which the user enters the site, the software displays a different set of properties. This has made the site and the software easy to manage, scale and grow. Each time a new site needs to be added, the biggest task is to update the SSL certificate, such that it includes the new hostname. Otherwise, the system works automatically, with the (nontechnical) company managers able to create new sites within several minutes, merely by filling out an HTML form. That form allows them to add a new entry into the “sites” table. The hostname is used to look up the site ID, whose value is then used to display properties.

Next month, I'll continue with this topic, discussing not only how you can have the same site produce similar content, but how you can configure it such that different users can manage their own sites, without interfering with the overall software and functionality.

Reuven M. Lerner is a Web developer, consultant and trainer. He recently completed his PhD in Learning Sciences from Northwestern University. You can read his blog, Twitter feed and newsletter at lerner.co.il. Reuven lives with his wife and three children in Modi'in, Israel.