Tweaking Tux, Part 2

Marcel Gagné

Issue #0, linuxjournal.com

It seems like forever since we were last together, doesn't it? For the last couple of weeks, I've been on holiday. Part of that holiday took place in Chicago, at Chicon 2000, the World Science Fiction Convention. I was impressed by the number of Linux hats, shirts and buttons I saw floating around. Then, there were the people, including the one and only Eric S. Raymond who said that the plans for world domination were going along right on schedule. So there you have it, my only bit of Linux journalism from Chicon. I had a great time there, saw a lot of friends, partied late and had some wonderful Chicago deep dish pizza. Take it from me folks, the Chicago deep dish is different from what they serve back home and call Chicago deep dish pizza.

Now that I've got this time off thing out of my system, let's get back to some of the business at hand, namely tweaking old Tux for fun, performance and a better knowledge of the innards of your Linux system. In the last installment, I gave you some ideas for tuning network parameters in the /proc filesystem. For many /proc tweaks, it is simply a matter of writing the appropriate value in the appropriate file and great things happen. As a refresher, here is what we did to turn on IP forwarding, via /proc rather than at boot time or through other menus.

   echo "1" > /proc/sys/net/ipv4/ip_forward

To make sure that these parameters take, we would add this line to our rc.local boot script and call it done. This brings me to a couple of questions I received from readers while I was away. Somebody e-mailed to tell me that they could not find rc.local on their Debian system. Sorry about that. While I try to make my columns non-release specific, I sometimes slip into the “I'm running this here system” trap. To get the scoop on rc.local, I need to give you the scoop on run levels and a handful of scripts that run each time your system comes up. What gets executed at bootup is partly defined by symbolic links located in an rc#.d file. The “#” represents a number corresponding to the run level.

What is this run level? In your /etc/inittab file, you will find an entry that says something like this.

    id:3:initdefault:

Note the “3” at the beginning of the line. This tells me that when the system comes up, it will, by default, switch to run level 3 (which is full multiuser mode with a command line login). If your system says “5”, this tells me you are booting directly to the graphical desktop. What starts at each of these run levels will be found in an accompanying /etc/rc#.d directory, in my case, rc3.d. Yeah, it's true. I'm still a command line guy who starts his X desktop with the startx command.

Anyhow, back to this rc.local file. On a Red Hat (or Mandrake) system, you'll find it hiding under /etc/init.d. I hear someone in the back yelling, “Wait a minute and stop right there! Didn't you just say that it would be in the rc3.d directory?” It is. More or less.

If you change directory to /etc/rc3.d , you'll see a number of script files either starting with an “S” or a “K”. Do an ls -l and you will notice they are all symbolic links pointing back to a directory somewhere else. On a SuSE system, the rc#.d directories are under /sbin/init.d, but you will still find those “S” or “K” files and they point to /sbin/init.d. In the case of my Red Hat system, they point back to the /etc/rc.d/init.d directory.

    lrwxrwxrwx   1 root  root  11 Jul 12 16:09 /etc/rc.d/rc5.d/S99local -> ../rc.local

On a Debian system, these scripts point back to /etc/init.d which is where I would create my rc.local file. On my own system, it turns out that rc.local is executed by a call to S99local. On a Debian system, for instance, look for (or create) an S99local file under the appropriate run level directory. My use (or Red Hat's) of S99local is (to some degree) convention, but you could, if you wanted to, be somewhat more arbitrary. The first part of that name, the “S”, means “start” (a “K” means “kill”) and the 99 is simply a high enough number that it is likely the last thing your system executes on boot. The “local” part is just a name that means something to me. You might call it “rclocal” or “systemlocal” or “iceberg”. So, if I want this file started with my run level 3 on that Debian system, I would create a symbolic link like this.

    ln -s /etc/init.d/rc.local /etc/rc3.d/S99local

Make sure (of course) that the script is executable. Now, let's get back to some of those tweaks.

What I gave you last time were all network tweaks. This time around, I want to show you a few file system tricks. In past lives (working with other UNIXes), the systems I administered ran complex databases, often with hundreds of users. I'm fond of the following tweaks because they represent parameters that require a kernel rebuild if you find yourself starting to run low. You made your best guess, but invariably, it would be kernel rebuild time soon enough. With Linux, these parameters are simple /proc tweaks. If you are running a busy database system with a large number of users, this is one you might run into. The “file-max” parameter defines the maximum number of open files on your system at any given time. For most, the default “4096” is plenty. For a busier system, you might want to push that limit up somewhat. As an example, let's double that number.

    echo "8192" > /proc/sys/fs/file-max

If you get errors stating that you are running out of file handles, it's definitely time to change that number, but don't wait for the users to start ringing. Without waiting for errors, you can take a look under the hood and see when this limit is approaching. (Preventative maintenance. What a concept.) If you do a cat on /proc/sys/fs/file-nr, you will get three numbers. The third will be your file-max. The first and second are the number of allocated file handles and the number of actual used file handles. Why the two numbers? When the Linux kernel allocates a file handle, it does not release it. If you do increase the file-max value, then you should also increase inode-max as well. Considering that each open file requires an inode for stdin, stdout (and possibly, a network socket) this needs to be somewhat higher than your file-max. Take your file-max value, triple it and write it back out to inode-max.

    echo "24576" > /proc/sys/fs/inode-max

Busy web server? News server? Here's another tweak for your files, and this one has nothing to do with /proc. One of the options for the mount command is “noatime”. In other words, do not (don't you even think about it) update the access time on visited files. Each time a file is read, the access time is updated which can yield useful information about file usage (with the find command, for instance). Odds are, you may not need that information. In the case of a web server getting a few thousand hits a day (an hour?), this little change can make a difference. Historically, this option was a suggestion for directories on news servers. Today, we are usually talking web servers. This is an environment where small files are accessed over and over again ( vs. a database environment which traditionally has comparatively few, large files).

To mount a filesystem noatime, use the “-o” flag, like this. In this example, we'll use the pretend drive “hda5”.

     mount -o noatime /dev/hda5 /data1

If you wanted this to happen automatically, you could also edit your /etc/fstab file so that you have an entry similar to this one.

     /dev/hda5      /data1          ext2    defaults,noatime      1 2

Enough file tweaking for one day. Before we wrap up, I want to talk about the need for tweaking. Another way to look at it is this: how do you know that you might be hitting a kind of wall? One of the surest ways is simply to monitor your system's performance through the various tools your system already has. The most basic of these is a little program called uptime which most of us use to drive our Windows-using friends crazy. “Ah, I see that you've rebooted twice today already. Let me run uptime and see what I get.

   # uptime
    1:21pm  up 127 days,  6:02,  4 users,  load average: 0.31, 0.29, 0.26

”My, my, my. Would you look at that? 127 days, 6 hours and 2 minutes without a reboot.“

Before we get into too much trouble, let's see what else the program tells you. There are four users logged in. The load average is 0.31 over the last minute, 0.29 over the last 5 minutes and 0.26 over the last 15 minutes. Load average indicates roughly the number of processes in the CPU's run queue; that is, the number of processes active or waiting to execute. If it helps any, you might think of it as the number of patients in the waiting room to see the doctor. In this case, I had an average of one third of one process waiting to be dealt with. The higher the number for load average, the more likely your system is starting to suffer under an excessive load. As the saying goes, your mileage may vary, but I tend to think of anything under four as acceptable. Any higher and it starts feeling slow. I've seen systems running around 15 to 20 and let me tell you, it's ugly.

If those numbers are high, the very next question is ”Why?". Other questions follow. What is it that is holding things up? If I am running out of something, how do I know what that something is? And those are just the types of questions that I want to consider when next we meet here at the corner. Until then, give Tux a tweak. You both might enjoy it.