UpFront

diff -u: What's New in Kernel Development

Zack Brown

Issue #210, October 2011

Linus Torvalds has decided at last to release Linux version 3.0. For a long time, it seemed as though he might never bump the major version number, because there was no longer any meaning associated with that number. And, indeed, his explanation for the change now ran something along the lines of, Linux is entering its third decade, so why not?

Along with the version bump, Linus has decided to do away with the whole three-numbered versioning system that the kernel has used since time immemorial. So from now on, it'll just be 3.0, 3.1, 3.2 and so on.

This is great news for the stable tree maintainers, who were getting tired of version numbers like 2.6.38.4, which as Willy Tarreau said, look more like IP numbers than version numbers.

But, with the kernel going from a three-numbered system to a two-numbered system, a lot of support scripts are breaking. It's Linux's own Y2K bug. Everyone thought 2.6 was going to be the prefix for the rest of time and wrote their scripts accordingly. So along with the version number changes, a lot of fixes are going into the various support scripts.

As soon as the rumor started floating around, Joe Pranevich emerged from a seven-year absence, to announce the “Wonderful World of Linux 3.0” at www.kniggit.net/wwol30. It covers the vast array of changes that occurred throughout the 2.6 time frame, leading up to 3.0.

Matt Domsch announced that Dell was discontinuing the digest forms of the linux-kernel and linux-scsi mailing lists. Although this would affect a few hundred subscribers, he said that changes at the hardware and software level of their mail servers meant that certain features wouldn't be re-implemented, and digests were one of those.

Dan Rosenberg initiated a fascinating discussion about a particular security problem: how to deal with attackers who based their attacks on knowing the location, in RAM, of vulnerable parts of the kernel.

His original idea was to have the system migrate the kernel to a new randomized memory location during boot. But over the course of discussion, it turned out there were many hard problems that would have to be solved in that case.

For one thing, it wasn't always clear where to get enough entropy for random number generation—an important issue if one wants to relocate the kernel to a random place in RAM. Also, the 64-bit kernel would load into memory in a different way from the 32-bit kernel, and so it would have to be handled differently by Dan's code. Also, if the kernel were in a random location, something would have to be done to oops report generation to make sure the memory references would make sense to someone reading them. Even more dangerous was the fact that other parts of the system already would be in memory at the time the kernel was being relocated, and there was a real danger of clobbering those parts, which would kill the system. Hibernation also was an issue, because the existing hibernation code in the kernel made assumptions about the awakening system that Dan's code violated.

Eventually, it became clear that although Dan's goal was a good one—making it more difficult to predict where in RAM the vulnerable parts of the kernel could be found—there were just too many technical difficulties to make it feasible in the way he was planning to do it.

Linus Torvalds and H. Peter Anvin each came up with alternative approaches that might be easier to implement, while still accomplishing essentially the same goal.

Linus' idea was to relink the kernel binary with a random piece of data to offset the kernel randomly in RAM that way.

H. Peter's idea was more radical. He wanted to convert the core kernel code into a set of kernel modules. At that point, the init code could load the various modules anywhere it wanted, even in noncontiguous RAM. So, he set out to implement that in the syslinux bootloader.

Although no clear direction emerged for what would ultimately go into the kernel, it seems as though a number of good ideas will be pursued. Almost certainly, the kernel's location in RAM will be randomized in some way, before too long.

ClearOS

Shawn Powers

Issue #210, October 2011

All-in-one Linux-based network servers aren't a new concept. Distributions like Clark Connect have been around for many years and fit their niche quite well. Lately, however, there seems to be a new batch of all-in-one solutions that offer a similar business model.

A couple months ago, we reviewed Untangle, which is a commercial distribution offering a feature-limited free version. Recently, one of our readers, Tracy Holz, pointed me to a similar project, ClearOS. Although Untangle is largely a firewall and network services system, ClearOS attempts to do more. Using a combination of open-source and commercial tools, it can be a one-stop server platform for many networks.

ClearOS has a unique modular system that seamlessly includes local server applications and cloud-based services to end users. You can purchase appliance devices or install ClearOS on an existing server. Much like Untangle, ClearOS's free features are limited, but it doesn't feel crippled if you stick to just the free stuff.

The features and add-ons are too numerous to list here, but if you're looking for a commercially backed all-in-one server solution for your network, check out ClearOS: www.clearfoundation.com. Tell 'em Tracy sent you.

Non-Linux FOSS

Shawn Powers

Issue #210, October 2011

Many Windows or Macintosh users are perfectly happy to download their podcasts with iTunes or something similar. Here at Linux Journal, however, we like to offer open-source alternatives. Enter Juice. Juice is a cross-platform, open-source application for downloading podcasts.

Juice is fast, efficient and very feature-rich. Our favorite feature is the built-in directory with thousands of podcast feeds from which to choose. Add things like auto cleanup, centralized feed management and accessibility options, and you have an awesome tool for getting your audio information fix. Check it out for Windows, Mac OS or Linux at juicereceiver.sourceforge.net.

Google Plus

Shawn Powers

Issue #210, October 2011

The early years of the 21st century forever will be known as the age of social media. I don't know if that's something we should be proud of, but nonetheless, here we are. During the past decade, we've seen things like Friendster, Pownce, Twitter, Wave, Facebook, Tumblr, Buzz, Gowalla, Brightkite, Foursquare, Loopt, Plurk, Identi.ca, LinkedIn, Yammer and now Google Plus.

Google hasn't had a great track record when it comes to social networking, with both Wave and Buzz being largely unsuccessful. Google Plus, or G+, seems to be its most appealing offer so far. At the time of this writing, it's still very early in the beta stages, but it already seems to have a cleaner and simpler interface than its direct competitor: Facebook.

Google offers unique features like group video chats called “hangouts” and “circles” of friends to help organize your following/followers. G+'s integration with other Google services may be the kill shot. Gmail, Picasa, YouTube and Blogger easily can be integrated directly by Google, making it simple for those folks already using Google apps to get their Plus on. Is the third time a charm for Google, or will G+ be another unfortunate carcass in the pile of outdated social media platforms? Only time will tell.

Kickstarter for Open-Source Projects?

Shawn Powers

Issue #210, October 2011

The Web site www.kickstarter.com is an interesting place. Basically, it's a site that allows people to invest in various projects, giving people real money to develop an idea. Those ideas vary from film-making to programming video games, but the concept is the same regardless of the project.

What is the motivation for investing in someone's idea? That's the beauty; it depends on the project. Maybe it's an M.C. Frontalot album you want to see created, so you give money to the project so the album is produced. Perhaps it's a video game you'd really like to play, so you give money to the developer to make the game. Perhaps the developer gives a copy of the game to all investors. Perhaps not. There are no rules, just collaboration.

Recently, we've seen open-source projects use Kickstarter, and it seems like a great idea. If you see a program idea you like, send money, and if the creators reach their goals, they'll create the programs. Because it's open source, the benefit is obvious: you get to use the program when it's complete.

Granted, it's not a perfect system. It certainly would be possible to abuse it. It seems that actually funding open-source developers is a good idea though. Perhaps this method of funding is a fad, or maybe it's the start of something great—paying developers to develop free software. If it works, it seems like everyone wins.

Big-Box Science

Joey Bernard

Issue #210, October 2011

A few months ago, I wrote a piece about how you can use MPI to run a parallel program over a number of machines that are networked together. But more and more often, your plain-old desktop has more than one CPU. How best can you take advantage of the amount of power at your fingertips? When you run a parallel program on one single machine, it is called shared-memory parallel programming. Several options are available when doing shared-memory programming. The most common are pthreads and openMP. This month, I take a look at openMP and how you can use it to get the most out of your box.

openMP is a specification, which means you end up actually using an implementation. It is implemented as an extension to a compiler. So, in order to use it in your code, you simply need to add a compiler flag. There is no linking in of external libraries. openMP directives are added to your program as special comments. This means if you try to compile your program with a compiler that doesn't understand openMP, it should compile fine. The openMP directives will appear just like any other comment, and you will end up with a single-threaded program. Implementations for openMP are available under C/C++ and FORTRAN.

The most basic concept in openMP is that only sections of your code are run in parallel, and for the most part, these sections all run the same code. Outside of these sections, your program will run single-threaded. The most basic parallel section is defined by:

#pragma omp parallel

in C/C++, or:

!OMP PARALLEL

in FORTRAN. This is called a parallel openMP pragma. Almost all of the other pragmas that you are likely to use are built off this.

The most common pragma you will see is the parallel loop. In C/C++, this refers to a for loop. In FORTRAN, this is a do loop. (For the rest of this piece, I stick to C/C++ as examples. There are equivalent FORTRAN statements you can find in the specification documentation.) A C/C++ loop can be parallelized with:


#pragma omp parallel for
for (i=0; i<max; i++) {
   do_something();
   area += i;
   do_something_else();
}

The pragma tells the openMP subsystem that you want to create a parallel section defined by the for loop. What happens is that the defined number of threads get created, and the work of the loop gets divided among these threads. So, for example, if you had a quad-core CPU and had to go through 100 iterations in this for loop, each CPU core gets 25 iterations of the loop to do. So, this for loop should take approximately one-fourth the time it normally takes.

Does this work with all for loops? No, not necessarily. In order for the openMP subsystem to be able to divide up the for loop, it needs to know how many iterations are involved. This means you can't use any commands that would change the number of iterations around the for loop, including things like “break” or “return” in C/C++. Both of these drop you out of the for loop before it finishes all of the iterations. You can use a “continue” statement, however. All that does is jump over the remaining code in this iteration and places you at the beginning of the next iteration. Because this preserves iteration count, it is safe to use.

By default, all of the variables in your program have a global scope. Thus, when you enter a parallel section, like the parallel for loop above, you end up having access to all of the variables that exist in your program. Although this is very convenient, it is also very, very dangerous. If you look back at my short example, the work is being done by the line:

area += i;

You can see that the variable area is being read from and written to. What happens now if you have several threads, all trying to do this at the same time? It is not very pretty—think car pile-up on the freeway. Imagine that the variable area starts with a value of zero. Then, your program starts the parallel for loop with five threads and they all read in the initial value of zero. Then, they each add their value of i and save it back to memory. This means that only one of these five actually will be saved, and the rest essentially will be lost. So, what can you do? In openMP, there is the concept of a critical section. A critical section is a section of your code that's protected so that only one thread can execute it at a time. To fix this issue, you could place the area incrementing within a critical section. It would look like this:


#pragma omp parallel for
for (i=0; i<max; i++) {
   do_something();
#pragma omp critical
   area += i;
   do_something_else();
}

Remember that in C, a code block is defined by either a single line or a series of lines wrapped in curly braces. So in the above example, the critical section applies to the one line area += i;. If you wanted it to apply to several lines of code, it would look like this:


#pragma omp parallel for
for (i=0; i<max; i++) {
   do_something();
#pragma omp critical
   {
   area += i;
   do_something_else();
   }
}

This leads us to a more subtle way that multiple threads can abuse global variables. What if you have a nested for loop and you want to parallelize the outside loop? Then:


#pragma omp parallel for
for (i=0; i<max1; i++) {
   for (j=0; j<max2; j++) {
      do_something();
   }
}

In this case, every thread is going to have access to the global variable j. They will all be reading from and writing to it at completely random times, and you will end up with either more than max2 iterations happening or less than max2. What you actually want to see happen is that each thread does everything within each iteration of the outside loop. What is the solution? Luckily, the openMP specification has the concept of a private variable. A private variable is one where each thread gets its own private copy to work with. To privatize a variable, you simply need to add to the parallel for pragma:

#pragma omp parallel for private(j)

If you have more than one variable that needs to be privatized, you can add them to the same private() option, comma-separated. By default, these new private copies will act just like regular variables in C code on Linux. This means their initial values will be whatever junk are in those memory locations. If you want to make sure that each copy starts with the value of the original value that existed on entering the parallel section, you can add the option firstprivate(). Again, you enter the variables you want treated this way in a comma-separated list. As an example that doesn't really do anything useful, this would look like:


a = 10;
#pragma omp parallel for private(a,j) firstprivate(a)
for (i=0; i<max1; i++) {
   for (j=0; j<max2; j++) {
      a += i;
      do_something(a*j);
   }
}

So, you have a program. Now what? The first step is to compile it. Because it is an extension to the compiler itself, you need to add an option to your compilation command. For gcc, it would simply be -fopenmp. You do need to be careful about the compiler version you are using and what it supports. The openMP specification is up to version 3.0 right now, with support varying across the gcc versions. If you want to look at the support in detail, check the main gcc page at gcc.gnu.org. The latest versions are starting to include support for version 3.0 of openMP.

Once you have it compiled, you need to run it. If you simply run it at the command line, without doing anything else, your program will check your machine and see how many CPUs you have (a dual-core processor looks like two CPUs, in case you were wondering). It then will go ahead and use that number as the number of threads to use in any parallel sections. If you want to set the number of threads that should be used explicitly, you can set it using an environment variable. In bash, you would use this to set four threads:

export OMP_NUM_THREADS=4

You can set more threads than you have CPUs. Because they are actual threads of execution, Linux has no problem scheduling them on the available CPUs. Just remember if you have more threads than available CPUs, you will see a slowdown in the execution speed of your code, as it will be swapping with itself on the CPUs.

Why would you do this? Well, when you are testing a new piece of code, you may have bugs that don't present themselves until you reach a certain number of threads. So, in testing scenarios, it may make sense to run with a large number of threads and a small input data set. The ideal situation is to be the only process running on the machine and running one thread for each CPU. This way, you maximize usage and minimize swapping.

All of this has been only the briefest introduction. I haven't covered generic parallel sections, functional parallelism, loop scheduling or any of the other more-advanced topics. The specifications are at www.openmp.org along with links to tons of tutorials and other examples. Hopefully, this introduction has given you some ideas to try and provides a small taste of what may be possible. I will leave you with one last hint. If you want to start to play with parallel programs without having to think about it, add the option -ftree-parallelize-loops. This will try to analyze your code and see if it can parallelize any sections. It won't be able to catch all of the sections that can be parallelized, because it can't understand the context of your code and what it is trying to do. But, for the time it takes to add the option and recompile and test the timing, it definitely would be worthwhile.

They Said It

Be as smart as you can, but remember that it is always better to be wise than to be smart.

—Alan Alda

Being an intellectual creates a lot of questions and no answers.

—Janis Joplin

Failure is simply the opportunity to begin again, this time more intelligently.

—Henry Ford

Genius is more often found in a cracked pot than in a whole one.

—E. B. White

It's not that I'm so smart, it's just that I stay with problems longer.

—Albert Einstein

Man is the most intelligent of the animals—and the most silly.

—Diogenes

The surest sign that intelligent life exists elsewhere in the universe is that it has never tried to contact us.

—Bill Watterson

Tech Tip

Bill Zimmerly

Issue #210, October 2011

By combining three useful command-line tools (less, watch and xdotool) along with two xterm windows, you can create an automatically scrolling reader.

Say you have a good book in text-file form (“book.txt”) that you just downloaded from Project Gutenberg.

Open one xterm and do the usual thing you do when you want to read that book with less:

$ less book.txt

Look at the first few characters in the title line of that xterm's window. (In mine, it was bzimmerly@zt, which is my user ID and the name of the machine I was working on.)

Open another xterm, issue this command, and watch (pun intended) the magic:

$ watch -n 1 xdotool search --name bzimmerly@zt key ctrl+m

The watch command will (every second) issue a “Return” (Ctrl-m) keystroke to the window that has “bzimmerly@zt” as a title, and it will stop only when you interrupt it with Ctrl-c! I think this is neato daddyo! (What can I say? I'm a child of the '60s!)

LinuxJournal.com

Katherine Druckman

Issue #210, October 2011

Have you visited us at LinuxJournal.com lately? You might be missing out on some great information if you haven't. Our on-line publication's frequent, Web-exclusive posts will provide you with additional tips and tricks, reviews and news that you won't find here, so make sure to visit us regularly at LinuxJournal.com.

In case you missed them, here are a few of the most popular recent posts to get you started: