Writing Secure Programs

Cal Erickson

Issue #115, November 2003

Your new code may be secure, but what about the large project you inherited? Use a tool to find and prioritize possible security issues.

The main focus of most writing about security is network security and physical security. Not much is written about writing secure programs. A lot of what you need to write secure programs is common sense, but due to time constraints and design shortcuts, it is rarely used. Any good programmer knows the concepts but usually does not have the time to implement them; there is a lot of pressure to produce a lot of code and get the project done.

In the early 1970s, the concept of structured programming was alive. Not only was the program structured but the whole project had structure; there were technical specifications, design specifications, detail design specifications, design walk-throughs and code walk-throughs. This made projects bigger and longer, but when finished, the code was debugged easily and often worked with few changes. Some of these projects took many years to produce. However, there was little external influence from networks, the Web and time to market.

Today, a lot of the structured development process has literally disappeared. But security starts with the design of the program or application and depends on coding standards established by the organization where the work is being done.

It is very unlikely that any code will be 100% secure; no code ever is. But, what can be done to make sure code is solid and secure? This article offers some ideas on what to consider and explains three tools to help write secure code. When designing and implementing an embedded system, more care is required for the coding. With the assistance of the tools in the Resources section, a lot of coding errors can be checked. The ultimate judge of secure code is left up to the implementer of the code and the ability of the implementer to understand what is secure.

Check for Errors

Every function returns some type of status, returned either directly or as part of errno. Checking these should be simple. In C++, the exception-handling capability is easy to use but can be complicated to set up. Exception handling has improved greatly over the past few years, once the C++ standard was finalized. When practical, it should be used. Previous practice had been to ignore errors, because it was thought the data being passed was valid. This has been proven to be a bad assumption.

Data buffer overflows have led to many security fixes in the past years. When writing for an embedded system, checking for error returns is important. Decisions need to be made about whether the error is benign and can be ignored. If the error is not benign, maybe it can be corrected. If it cannot be corrected, does the system perform a soft reset or a hard reset? In some cases a soft reset, causing the action in error to be restarted, is all that is required. This is the basis for some fault-tolerant systems. Depending on the type of device, a hard reset may not be a bad thing. Other times, some form of recovery is a must.

Looking for Strings

Instead of using sprintf, strcpy and strcat, use functions like strncpy and strncat. These functions make sure the buffer does not overflow and discard any excess. Do not use fgets when reading data, as this allows overflows. These may seem like simple changes, but they are easy to forget, as is string handling, one of the most exploited areas of programs. Automated test programs check for these problems quite nicely, but the tests can be misleading. Some uses of the string function may be flagged as a problem but prove to be fine in the context where used. This is where the ability and knowledge of the implementer plays an important role. The logs generated by the tools need to be scanned to determine what code has been flagged and needs to be changed.

Memory Leaks and Buffer Overflows

Memory leaks in and of themselves do not necessarily create security risks. However, they can be exploited if the memory is shared by several procedures and structures.

Buffer overflows are by far the most common security issue. If a buffer is allocated on the stack, it can be overflowed to wipe out or change the return address of a function. When a function returns then, it returns to the new address instead of to the proper address. Some buffer attacks also can occur on the heap. These are more difficult to create, but they still can be done. Programs written in C are most vulnerable to these attacks, but any language that provides low-level memory access and pointer arithmetic can be problematic. Pointer arithmetic is one area that should have bounds checking.

The GNU C compiler has an extension available, which needs to be included when the compiler is built, that implements bounds checking. It is used as an option that adds code to the program. During testing, the code can be turned on and used. During deployment the code would not be present. The reason the code should be turned off is it prints messages when the bounds are breached. If the system in place is a workstation, the messages can be left on, but an embedded system typically has no console.

An idea that might occur here is all the buffers should be statically allocated; then the problem goes away. In truth, the notion that a buffer is of fixed length can be exploited. The data being moved to the buffer still can be longer than the buffer. When it is moved it overflows, and the same problem happens. To lower the risk, the data movement should move up only to the maximum allowed for the buffer. Dynamic reallocation of strings permits programs to handle inputs of arbitrary sizes. The problem with this is the program could run out of memory. On an embedded system, such a mistake is fatal. On a workstation, the virtual memory system may start to thrash and create a performance bottleneck. In C++, the std::string class has the dynamic growth approach. If the class' data is turned into a char * pointer, a buffer overflow can happen. Other string libraries may not have these problems, but the implementer needs to be aware of the limitations.

Validate the Input

If a program is receiving data on which it must operate, there should be some type of validation that the data is correct, does not exceed the maximum size and is free of non-valid types. For instance, if the data is limited to uppercase letters from A to Z, the function should reject anything else. It also should check to make sure the length of the data is valid. Many years ago, everyone thought of data as 80 characters, the size of a punch card. Today, data literally can be any size; it can be text or binary or encrypted. It still has some type of limit though. This should be checked, and if it fails, reject it.

Not only should you check for the maximum size of a record or piece of data but, in some cases, check for a minimum size. Strings should be checked for legal values or legal patterns. If the data being checked contains binary data that needs to be kept that way, it may be better to use the common escape character to signal that the data is binary. If the data is numeric, range checking should be done. If it is any positive integer, check if it is less than zero. If there is a maximum value, check for that. The file limits.h defines the maximum and minimum values for most values, so it is easy to check for system limits.

Help Tools

The dilemma most developers get into is the code already exists, and there is little time and manpower to spend checking for potential security issues. After all, the code is not broken, so why fix it? This attitude prevails in a lot of organizations. Once the code has been found susceptible, however, fixing it becomes a high priority, as does assigning blame.

What can be done to find potential problems short of code inspection? I have learned of three tools that are capable of finding potential problems and flagging them in a report. These tools could be used on an embedded system, but most development is done in a cross-hosted environment. Do the heavy work on the host workstation, and leave the fine-tuning to the target. The information on where to get the tools is listed in the Resources section.

Flawfinder, RATS and ITS4 are three packages that scan the source tree and display a report about potential problems. The display is a list of what is wrong, in which source module and at what line. All of this information also is weighted as to its degree of vulnerability. Listing 1 shows a snippet from a Flawfinder execution on the sample code. The severity level is from 0 to 5, with 0 being very little risk and 5 being high risk.

Even though several messages are returned, the implementers can choose to fix or ignore the potential problems. Some developers might argue that these tools should change the code, but it is much better to change code selectively rather than to make wholesale edits un-aided. The Flawfinder program uses an internal database called a ruleset. This ruleset is a list of the common security flaws. These flaws are general issues that can have an impact on C/C++ and a number of specific problematic runtime functions.

Conclusion

Writing secure code can be easy. Thinking about what is being written and how it can be exploited has to be part of the design criteria. Testing methods should be devised to check for various types of attacks or misuse. Fully automating these tests is a luxury that can go a long way to getting a superior product to the consumer. The techniques and tools discussed here are only helpers. The development of secure programs still rests in the hands and minds of the developers.

Cal Erickson (cal_erickson@mvista.com) currently works for MontaVista Software as a senior Linux consultant. Prior to joining MontaVista, he was a senior support engineer at Mentor Graphics Embedded Software Division. Cal has been in the computing industry for more than 30 years, with experience at computer manufacturers and end-user development environments.