Why is some software more secure than others?

How do you measure software security?

Here's my definition on what is secure software.

Intro

I get really tired of seeing these kinds of comments every time some widely used software has security holes:

While they may be partially true, I think they're also very misleading and disparages the hard work that some secure software authors have done.

Simplicity Is Security

The difference between secure and insecure software is really the coding techniques being used by it's authors. Authors of secure software do everything they can to prevent accidental mistakes from ever happening. Authors of insecure software just fixes the accidental mistakes. There are very few secure software authors.

Auditing insecure software doesn't make it secure. Sendmail is a good example of this. It's been audited countless times by competent people. The simplest mistakes were cought easily long time ago, but a few very difficult to find vulnerabilities were found only recently.

How do secure software authors then avoid the kind of security holes that are difficult to find? By keeping the code simple. The code doesn't get secure by polluting it with tons of security checks. It gets secure by keeping the security checks in as few places as possible.

Complex things of course require complex code which may be difficult to get right. If the code is security related - such as implementation of some authentication or crypto protocol - it's understandable that in some situations your code doesn't work as specifications require and causes a security hole. That's however not an excuse to make the complex code unreadable or introduce generic security holes such as buffer overflows.

Auditing secure software for generic security holes is easy. You can just quickly browse through most of the sources without having to stop and look at it carefully. Everything just looks clean, simple and correct. vsftpd is a good example of this.

Sure, it's still possible that secure software has some security holes occasionally. It just happens a lot less often (if ever) and usually the problems are less critical. For example none of the security holes in Postfix have lead to arbitrary code execution or being able to read other peoples mails. Denial of Service attacks are nothing compared to them.

I think the goal should be to have only application-specific security holes. Generic security holes that could theoretically be found using some automation shouldn't exist.

Examples

Standard Buffer Overflows

The right fix: Use APIs that make it difficult if not impossible to overflow buffer. Not just strings. All buffers. Writing to buffers directly should be an exception, not the norm. Make sure such exceptions are carefully audited and preferrably mark them so others can quickly find and audit them as well.

The wrong fix: Add tons of checks throughout the code to verify that it doesn't write past the buffer. You are going to miss some checks.

Bonus: Integer overflow and truncation problems are no longer a generic security issue if you write everything through secure buffer APIs.

Cross Site Scripting and SQL Injections

The right fix: Use APIs that make it difficult if not impossible to cause XSS or SQL injections. For example: sql_exec(sql, "SELECT * FROM table WHERE username = %s", username); where sql_exec() escapes username automatically.

The wrong fix: Add tons of calls to validate user input.

Input Validation

The correct way: Input validation is done primarily to make sure user can't easily (or accidentally) consume too much resources (CPU, memory). It's done secondarily to provide meaningful error messages. Don't rely on it to prevent security holes.

The wrong way: Add tons of checks for all user input and afterwards rely on it being safe. You're going to miss some check some day. Or someone may later decide your checks are too restrictive and cause security holes by loosing them.

Recent OpenSSH Hole

Background: fatal() function was called pretty much everywhere. It was used as if it was assert(). The problem was that fatal() also called a number of cleanup functions which may have accessed some objects in inconsistent state. For example size of a buffer was set larger than it was really allocated.

Any software using atexit() function may have similiar problem.

The right fix: Separate hard failures (eg. out of memory) from soft failures. Handle soft failures by returning a failure from function. Handle hard failures with a function that can be safely called anywhere inside the code. Try to avoid hard failures by validating user input early. Call the cleanup functions only from a few places where object states are known to be consistent.

The wrong fix: Fix the few obvious problem cases and hope for the best. This is the state with OpenSSH 3.7.1 (which was released right after 3.7 to fix a few missed cases). Next release apparently does it better.