Memory Footprint of Processes

System administrators want to understand the applications that run on their systems. You can't tune a machine unless you know what the machine is doing! It's fairly easy to monitor a machine's physical resources: CPU (mpstat, top), memory (vmstat), disk IO (iotop, blktrace, blkiomon) and network bandwidth (ip, nettop).

Logical resources are just as important—if not more important—yet the tools to monitor them either don't exist or aren't exactly "user-friendly". For example, the ps command can report the RSS (resident set size) for a process. But how much of that is shared library and how much is application? Or executable code vs. data space? Those are questions that must be answered if a system administrator wants to calculate an application's memory footprint.

To answer these questions, and others, I describe extracting information from the /proc filesystem. First, let's look at terminology relevant to Linux memory management. If you want an exhaustive look at memory management on Linux, consider Mel Gorman's seminal work Understanding the Linux Virtual Memory Manager. His book is an oldie but goodie; the hardware he describes hasn't changed much over the intervening years, and the changes that have occurred have been minor. This means the concepts he describes and much of the code used to implement those concepts is still spot-on.

Before going into the nuts and bolts of the answers to those questions, you first need to understand the context in which those questions are answered. So let's start with a high-level overview.

Linux Memory Usage

Your computer system has some amount of physical RAM installed. RAM is needed to run all software, because the CPU will fetch instructions and data from RAM and nowhere else. When a system doesn't have enough RAM to satisfy all processes, some of the process memory is written to an external storage device and that RAM then can be freed for use by other processes. This is called either swapping, when the RAM being freed is anonymous memory (meaning that it isn't associated with file data, such as shared memory or a process's heap space), or paging (which applies to things like memory-mapped files).

(By the way, a process is simply an application that's currently running. While the application is executing, it has a current directory, user and group credentials, a list of open files and network connections, and so on.)

Some types of memory don't need to be written out before they can be freed and reused. For example, the executable code of an application is stored in memory and protected as read-only. Since it can never be changed, when Linux wants to use that memory for something else, it just takes it! If the application ever needs that memory back again, Linux can reload it from the original application executable on disk. Also, since this memory is read-only, it can be used by multiple processes at the same time. And, this is where the confusion comes in regarding calculating how much memory a process is using—what if some of that memory is being shared with other processes? How do you account for it?

Before getting to that, I need to define a few other terms. The first is pinned memory. Most memory is pageable, meaning that it can be swapped or paged out when the system is running low on RAM. But pinned memory is locked in place and can't be reused. This is obviously good for performance—the memory never can be taken away, so you never have to wait for it to be brought back in. The problem is that such memory can never be reused, even if the system is running critically low on RAM. Pinned memory reduces the system's flexibility when it comes to managing memory, and no one likes to be boxed into a corner.

Simple Example

I made reference above to read-only memory, memory that is shared, memory used for heap space, and so on. Below is some sample output that shows how memory is being used by my Bash shell (I want to emphasize that this output has been trimmed to fit into the allotted space, but all relevant fields are still represented. You can run the two commands you see on your own system and look at real data, if you wish. You'll see full pathnames instead of "..." as shown below, for example):

Each line of output represents one vm_area. A vm_area is a data structure inside the Linux kernel that keeps track of how one region of virtual memory is being used inside a process. The sample output has /bin/bash on the first three lines, because Linux has created three ranges of virtual memory that refer to the executable program. The first region has permissions r-xp, because it is executable code (r = read, x = execute and p = private; the dash means write permission is turned off). The second region refers to read-only data within the application and has permissions r--p (the two dashes represent write and execute permission).

The third region represents variables that have been given initial values in the application's source code, so it must be loaded from the executable, but it could be changed during runtime (hence the permissions rw-p that shows only execute is turned off). These regions can be any size, but they are made of up pages, which are each 4K on Linux. The term page means the smallest allocatable unit of virtual memory. (In technical documentation, you'll see two other terms: frame and slot. Frames and slots are the same size as pages, but frames refer to physical memory and slots refer to swap space.)

You know from my previous discussion that read-only regions are shared with other processes, so why does "p" show up in the permissions for the first region? Shouldn't it be a shared region? You have a good eye to spot that! Yes, it should. And in fact, it is shared. The reason it shows up as "p" here is because there are actually 14 different permissions and room only for four letters, so some condensing had to be done. The "p" means private, because while the memory is currently marked read-only, the application could change that permission and make it read-write, and if it did make that change and then modified the memory, you would not want other processes to see those changes! That would be similar to one process changing directory, and every other process on the system changing at the same time! Oops! So the letter "p" that marks the region as private really means copy-on-write. All of the memory starts out being shared among all processes using that region, but if any part of it is modified in the future, that one tiny piece is copied into another part of RAM so that the change applies only to the one process that attempted the write. In essence, it's private, even though 99% of the time, the memory in that region will be shared with other processes. Such copying applies on a page-by-page basis, not the entire vm_area. Now you can begin to see the difficulty in calculating how much memory a process actually consumes.

But while I'm on this topic, there's a region in the list that has an "s" in the permission field. That region is a memory-mapped file, meaning that the data blocks on disk are mapped to the virtual memory addresses shown in the listing. Any reference the process makes to the memory addresses are translated automatically into reads and writes to the corresponding data blocks on disk. The memory used by this region is actually shared by all processes that map the file into memory, meaning no duplicated memory for file access by those processes.

Just because a region represents some given size of virtual memory does not necessarily mean that there are physical frames of RAM for every virtual page. In fact, this is often the case. Imagine an application that allocates 100MB of memory. Should the operating system actually allocate 100MB right then? UNIX systems do not—they allocate a region of virtual memory like those above, but no physical RAM. As the process tries to access those virtual addresses, page faults will be generated, and the operating system will allocate the memory at that time. Deferring memory allocation until the last possible moment is one way that Linux optimizes the use of memory, but it complicates the task in trying to determine how much memory an application is using.

Recap So Far

A process's address space is broken up into regions called vm_areas. These vm_areas are unique to each process, but the frames of memory referred to by the pages within the vm_area might be shared across processes. If the memory is read-only (like executable code), all processes share the frame equally. Any attempt to write to virtual pages that are read-only triggers a page fault that is converted into a SIGSEGV and the process is killed. (You may have seen the message pop up on your terminal screen, "Segmentation fault." That means the process was killed by SIGSEGV.)

Memory that is read/write also can be shared, such as shared memory. If multiple processes can write to the frames of the vm_area equally, some form of synchronization inside the application will be necessary, or multiple processes could write at the same time, possibly corrupting the contents of that shared memory. (Most applications will use some kind of mutex lock for this, but synchronization and locking is outside the scope of this article.)

Adding Up the Memory Actually Used

So, determining how much memory a process consumes is difficult. You could add up the space allocated to the vm_areas, but that's virtual memory, not physical; large portions of that space could be unused or swapped out. This number is not a true representation of the amount of memory being used by the process.

You could add up only the frames that are used by this process and not shared. (This information is available in /proc/pid/smaps.) You might call this the "USS" (Unique Set Size), as it defines how much memory will be freed when an application terminates (shared libraries typically stay in RAM even when no processes are currently using them as a performance optimization for when they are needed again). But this isn't the true memory cost of a process either, as the process likely uses one or more shared libraries. For example, if an application is executed and it uses a shared library that isn't already in memory, that library must be loaded—some part of that library should be allocated against the new process, right?

The ps command reports the "RSS" (Resident Set Size), which includes all frames used by the process, regardless of whether they're shared. Unfortunately, this number is going to inflate the memory size when all processes are summed up—adding up this number for all processes running on the system will count all shared libraries multiple times, greatly inflating the actual memory requirement.

The /proc/pid/smaps file includes yet another memory category, PSS (Proportional Set Size). This is the amount of unique memory just for one process (the USS), plus a proportion of the memory that is shared by other running processes. For example, let's assume the USS for a process is 2MB and it uses another 4MB of shared libraries, but those shared libraries are used by three other processes. Since there are four processes using the shared libraries, they should each only be accounted for 25% of the overall library size. That would make the PSS of the process 2MB + (4MB / 4) = 3MB. If you now add together the PSS values of all processes on the system, the shared library memory will be totally accounted for, meaning the whole is equal to the sum of its parts.

It's not perfect—when one of those processes terminates, the memory returned to the system will be USS, and because there's one less process using the shared libraries, the PSS of all other processes will appear to increase! A naïve system administrator might wonder why the memory usage on the remaining processes has suddenly spiked, but in truth, it hasn't. In this example, 4MB/4 becomes 4MB/3, so any process using the shared libraries will see an adjusted PSS value that goes up by .33MB.

As the last step, I'm going to demonstrate a command that performs these calculations.

Automating the Work

The one-line command shown below will accumulate all of the PSS values for all processes on the system:

Note that stderr is redirected to /dev/null. This is because the shell replaces the wildcard string with a list of all filenames that match and then executes the awk command. This means that by the time the awk command is running, some of those processes already may have terminated. That will cause awk to print an error message about a non-existent file, hence redirecting stderr to avoid that error. (Astute readers will note that this command line will never factor in the memory consumed by the awk command itself!)

Many of the processes that the awk command is going to be reading will not be accessible to an unprivileged account, so system administrators should consider using sudo to run the command. (Inaccessible processes will produce error messages that are then redirected to /dev/null, thus the command will report a total of the memory used by all processes that are accessible—in other words, those owned by the current user.)

Summary

I've covered a lot of ground in this article, from terminology (pages, frames, slots) and background information on how virtual memory is organized (vm_areas), to details on how memory usage is reported to userspace (the maps and smaps files under /proc). I've barely scratched the surface of the type of information that the Linux kernel exposes to userspace, but hopefully, this has piqued your interest enough that you'll explore it further.

Resources

My favorite source for technical details is LWN.net if I'm looking for discussion and analysis, but I frequently will go straight to the Linux source code when I'm looking for implementation details. See "ELC: How much memory are applications really using?" for the discussion around adding PSS to smaps, and see "Tracking actual memory utilization" for a discussion of memory used by a process but that belongs to the kernel (something this article doesn't touch upon).

About the Author

Frank Edwards has been a programmer since the days of the Zilog Z-80 in the TRS-80 computer, circa 1978. For some people, programming is a hobby, or a job or a career—for him, it's an obsession. He once disassembled an operating system just to see how it worked. Most of his early life was spent in C, but he has branched out considerably since then (Java, Python, Perl, Swift and UNIX shell being where he spends most of his time).