







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Computer Security Writing Secure Codes, Lecture Notes - Computer Science - Prof. David Wagner, University of California (CA) - UCLA, United States of America (USA), Prof. David Wagner, Computer Science, Computer Security, Writing Secure Codes, Modularity, Defensive Programming, Selecting a Programming Language, Process
Typology: Study notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








This lecture discusses implementation techniques to avoid security holes when you write code. We will describe many good practices. Many of these have a strong overlap with software engineering and general software quality, but the demands of security place a heavier burden on programmers.
In security applications, we must eliminate all security-relevant bugs, no matter how unlikely they are to be triggered in normal execution, because we are facing an intelligent adversary who will gladly interact with our code in abnormal ways if there is any profit in doing so. Compare to software reliability, where we normally focus on the bugs that are most likely to happen; bugs that only come up under obscure conditions might be ignored if reliability is the goal, but they cannot be ignored when security is the goal. Dealing with malice is much harder than dealing with mischance.
In these notes, we’ll especially emphasize three fundamental techniques: (1) modularity and decomposition for security; (2) formal reasoning about code using invariants; (3) defensive programming. At the end, we also discuss programming language-specific issues and integrating security into the software lifecycle.
A well-designed system will be decomposed into modules, where modules interact with each other only through well-defined interfaces. Each module should perform a clear function; the essence is conceptual clarity of what it does (what functionality it provides), not how it does it (how it is implemented).
The granularity of modules is dependent on the system and language. A module typically has state and code. For instance, in an object-oriented language like Java, a module might consist of a class (or a few closely related classes). In C, a module might be in its own file and contain some clear external interface, along with many internal functions that are not externally visible or callable.
Module design is as much about interface design as anything else. The interface is the contract between caller and callee; hopefully, it should change less often than the implementation of the module itself. A caller should only need to understand the interface. Modules should interact only through the defined interface; for instance, you shouldn’t use global variables to communicate information from caller to callee. Think of a module as a blob; the interface is its surface area, and the implementation is its volume. Thoughtful design is often characterized by narrow and conceptually clean interfaces and modules with a low surface area to volume ratio.
When you decompose the system into modules, here are some suggestions that will improve security:
havior to deviate from what was expected by the programmer. Plan for failure: think in advance about what the consequences of a compromise of each module might be, and structure the system to reduce these consequences. For instance, a monolithic architecture that places all modules in a common address space is an un- necessary security risk, because if one module is compromised then all others can be penetrated as well. Some languages (e.g., Java) provide mechanisms for isolating modules from each other using type-safety; with legacy languages (like C), you may need to place each module in its own process to protect it.
Example: A network server that listens on a port below 1024 might be broken up into two pieces: a small start-up wrapper, and the application itself. Because binding to a port in the range 0–1023 requires root privileges, the wrapper could run as root, bind to the desired port to some file descriptor, and then spawn the application and pass it the file descriptor. The application itself could then run as a non-root user, limiting the damage if the application is compromised. The wrapper can be written in only a few dozen lines of code, so we should be able to validate it quite thoroughly.
Example: A web server might be structured as a composition of two modules. One module might be responsible for interacting with the network; it could handle incoming network connections and parse them to identify the requested URL. The second module might translate the URL into a filename and read it from the filesystem. Note that the first module can be run with no privileges at all (assuming it is started by a root wrapper that binds to port 80). The second module might be run as some special userid (e.g., www), and we might ensure that only documents intended to be publicly visible are readable by user www. This then leverages the file access controls provided by the operating system so that even if the second module is penetrated, the attacker cannot cause any harm to the rest of the system.
Often functions make certain assumptions about their arguments, and it is the caller’s responsibility to make sure those assumptions are valid. These are often called preconditions. A precondition for f() is an assertion (a logical proposition) that must hold at input to f(). The function f() is supposed to behave correctly and produce meaningful output as long as its preconditions are met. If any precondition is not met, all bets are off. Therefore, the caller must be sure to call f() in a way that will make these preconditions true. In short, a precondition imposes an obligation on the caller, and the callee may freely assume that the obligation has been met.
Here is a simple example of a function with a precondition:
/* Requires: p != NULL */ int deref(int *p) { return *p; }
requirement is that each statement’s postcondition must match (or imply) the precondition of any statement that follows it. Thus, at every point between two statements, you write down an invariant that should be true any time execution reaches that point. The invariant is a postcondition for the preceeding statement, and a precondition for the next statement.
It is pretty straightforward to tell whether a statement in isolation fits its pre- and post-conditions. For instance, a valid postcondition for the statement “v=0;” would be v = 0 (no matter what the precondition is). Or, if the precondition for the statement “v=v+1;” is v ≥ 5, then a valid postcondition would be v ≥ 6. As another example, if the precondition for the statement “v=v+1;” is w ≤ 100, then w ≤ 100 is also a valid postcondition (assuming v and w do not alias).
This leads to a very useful concept, that of loop invariants. A loop invariant is an assertion that is true at the entrance to the loop, on any path through the code. The loop invariant has to be true before every iteration to the loop. To verify that a condition really is a valid loop invariant for the loop, you treat the condition as both a pre-condition and a post-condition for the loop body.
Let’s try an example. Here is some code that computes the factorial function:
/* Requires: n >= 1 */ int fact(int n) { int i, t; i = 1; t = 1; while (i <= n) { t *= i; i++; } return t; }
A prerequisite is that that input must be at least 1 for this implementation is not correct. Suppose we want to prove that the value of fact(.) is always positive. We’ll annotate the code with invariants:
/* Requires: n >= 1 Ensures: retval >= 0 / int fact(int n) { int i, t; / n>=1 / i = 1; / n>=1 && i==1 / t = 1; / n>=1 && i==1 && t==1 / while (i <= n) { / 1<=i && i<=n && t>=1 <-- loop invariant */ t = i; / 1<=i && i<=n && t>=1 / i++; / 2<=i && i<=n+1 && t>=1 / } / i>n && t>=1 */
return t; }
How do we verify that the invariants are correct? This might look pretty complicated, but don’t get discouraged—it’s actually pretty easy if you just take the time to look at each step. Notice that the function’s precondition implies the invariant at the beginning of the function body. Also, the invariant at the end of the function body implies the function’s postcondition. Thus, if each statement matches the invariant immedi- ately before and after it, everything will be ok. The only non-trivial reasoning is in the loop invariant. First, we must prove that at the entrance to the first iteration of the loop, the loop iteration will be true, and this follows since the logical proposition n ≥ 1 ∧ i = 1 ∧ t = 1 implies 1 ≤ i ≤ n ∧ t ≥ 1 (e.g., if i = 1, then certainly i ≥ 1). Also, we must prove that if the loop invariant holds at the beginning of any iteration of the loop, then it will hold at the beginning of the next iteration, if there is another iteration. This is true, since the invariant at the end of the loop body (2 ≤ i ≤ n + 1 ∧ t ≥ 1) together with the loop termination condition (i ≤ n) implies the invariant at the beginning of the loop body (1 ≤ i ≤ n ∧ t ≥ 1). It follows by induction on the number of iterations that the loop invariant is always true on entrance to loop body. The conclusion is that fact() will always make the postcondition true, so long as the precondition is established by its caller.
To give you some more practice, we’ll show another example implementation of fact(), this time using recursion. Here goes:
/* Requires: n >= 1 */ int fact(int n) { int t; if (n == 1) return 1; t = fact(n-1); t *= n; return t; }
Do you see how to prove that this code always outputs a positive integer? Let’s do it:
/* Requires: n >= 1 Ensures: retval >= 0 / int fact(int n) { int t; if (n == 1) return 1; / n>=2 / t = fact(n-1); / t>=0 */ t = n; / t>=0 */ return t; }
Before the recursive call to fact(), we know that n ≥ 1 (by the precondition), that n 6 = 1 (since the if statement didn’t follow its then branch), and that n is an integer. It follows that n ≥ 2, or that n − 1 ≥ 1.
that all is well. The bad news is that, even with practice, reasoning about your code still does take time and energy—however, it seems to be worth it for code that needs to be highly secure.
While we have presented this in a fairly formal way, in practice good programmers often do the same kind of reasoning without bothering with the formal notation. Also, good programmers sometimes omit the obvious parts of the invariants and write down only the parts that seem most important. Often, we think about data structures and code in terms of the invariants it ought to satisfy first, and only then write the code.
This kind of formal reasoning can be formalized very carefully using the tools of mathematical logic. In fact, there has been a lot of research into tools that use automated theorem provers to try to mathematically prove the validity of a set of alleged pre- and post-conditions (or even to help infer such invariants). You could take a whole course on the topic, but for reasons of time, we won’t go any further in this course. Our goal was merely to show you enough to get started on your own, and maybe stir you to investigate further on your own.
By the way, you may have noticed how useful it is to be able to “speak mathematics” fluently. Now you know one reason why we make you take Math 55 or CS 70 as part of your computer science education.
Defensive programming is like defensive driving: the idea is to avoid depending on anyone else around you, so that if anyone else does something unexpected, you won’t crash. Defensive programming is about surviving unexpected behavior by code, rather than by other drivers, but otherwise the principle is similar.
Software engineering normally focuses on functionality: if the code is given meaningful inputs, then it should produce useful and correct outputs. For security, we often care more about what happens when the program is given invalid, unexpected, or ridiculous inputs: the program had better not crash, cause undesirable side-effects, or produce dangerous outputs even when the inputs are nonsensical. Defensive programming involves applying this idea at every interface and every security perimeter, so that each module will remain robust even if all other modules that interact with it misbehave. The general strategy is to assume that an attacker is in control of the inputs to your module, and make sure that nothing terrible happens.
The simplest situation is where we are writing a module M that provides functionality to a single client. Then M should strive to provide useful responses as long as the client provides valid inputs. If the client provides an invalid input, then M is no longer under any obligation to provide useful output; however, M must still protect itself (and the rest of the system) from being subverted by malicious inputs.
A very simple example:
char charAt(char *str, int index) { return str[index]; }
This function is too fragile. First, charAt(NULL, any) will cause the program to crash. Second, charAt(s, i) can create a buffer overrun situation if i is out-of-bounds (too small or too large) for the string. Neither can be easily fixed without changing the function interface.
Another made-up example:
char *double(char *str) { size_t len = strlen(str); char p = malloc(2len+1);
strcpy(p, str); strcpy(p+len, str); return p; }
This function has many flaws:
A slightly trickier example: Consider a Java sort routine, which accepts an array of objects that implement the interface Comparable and sorts them. This means that each such object has to implement the method compareTo(), and x.compareTo(y) must return a negative, zero, or positive integer, according to whether x is less, equal, or greater than y in their class’s natural ordering (e.g., strings might use lexico- graphic ordering, say). Implementing a defensive sort routine is actually fairly tricky, because a malicious client might supply objects whose compareTo() method behaves unexpectedly. For instance, calling x.compareTo(y) twice might yield two different results (if x or y are malicious or misbehaving). Or, we might have x.compareTo(y) == 1 , y.compareTo(z) == 1 , and z.compareTo(x) == 1 , which is nonsensical. If we’re not careful, the sort routine could easily go into an infinite loop or worse.
Here is some general advice:
char *username = getenv("USER"); char *buf = malloc(strlen(username)+6); sprintf(buf, "mail %s", username);
So far we have considered very simple cases where we only have to think about a single client. More generally, suppose we are writing a module M that provides some functionality to multiple clients , who each call M to benefit from its functionality. It is important for M to defend itself against malicious clients. It is also important for M to ensure that one malicious client cannot disrupt other clients. Thus, when M is performing some function on behalf of a client, there are two cases:
Of course, M might in turn invoke other utility modules, relying upon them, so that M is itself a client of those other modules. The same requirements will apply.
There is a special case where we do not have to worry about multiple clients. Suppose M computes a pure function, with no internal state and performing no I/O, so that its output depends deterministically on its input. In this case, we do not need to worry about one client disrupting another client or corrupting M ’s state. Thus, functional programming can simplify the task of defensive programming.
How does defensive programming relate to the use of preconditions? Of course, whenever we want to make some assumption about the calling context, we can either express this as a precondition and leave it to the caller to ensure it is true, or we can explicitly check for ourselves that the condition holds (and abort if it
does not). How should we decide between these two strategies? Perhaps the most sensible approach is to use preconditions to express constraints that honest clients are expected to follow. So long as the client meets the documented preconditions (whether formal or informal), then the module is obligated to return correct and useful results to the client. If the client departs from the documented contract, then the module is no longer under any obligation to return useful results, but it still must protect itself and other clients. Thus, for interfaces exposed to clients, we might (a) use documented preconditions to express the intended contract and (b) use explicit checking for anything that could corrupt our internal state, cause us to crash, or disrupt other clients. For internal helper functions that can only be invoked by code in the same module, we might not worry about the threat of being invoked with malicious inputs, and we could freely choose between implicit checking (preconditions) and explicit checking.
If you are lucky, you may have the opportunity to choose a programming language, libraries, operating system, or development environment. Here is some advice for how to make a choice that is best for security.
Of course, you will not always have the opportunity to choose the language on the basis of what is best for security. For instance, other considerations may dominate, or you may be forced to maintain legacy code. A corollary of the above comments is that if you are programming in a language that isn’t optimal for security, you need to be extra careful. If you don’t know the language extremely well, it would be good to try to learn it better, and to avoid the more obscure corners of the language and stick to the core that you know best. If you are forced to program in a language without automatic bounds-checking, extra caution is warranted. You may want to force yourself to stick a rigid discipline where you insert a manual bounds-check anywhere any array or pointer operation is performed, or you may wish to write your code in a way so you can prove (using the formal reasoning methods outlined earlier) that out-of-bounds accesses are impossible.
Here is some advice specific to C programming:
to make the same erroneous assumption when reviewing her own code as when she wrote it, while someone else may spot the error immediately. Knowing that someone else will review your code also helps keep you honest and motivates you to avoid dangerous shortcuts, because most people prefer not to be embarassed in front of their peers.