Thursday, May 15, 2008

Defensive Programming

Enough negativity after the last post, let me get on to a more interesting topic. When I was in college, I learned how important it was to code defensively. I'm sure many folk who studied computer science was also exposed to this way of thinking. In case you don't know what I mean, I am referring to the practice of checking values, typically parameters that are passed into methods, so that the method does not blow up if it encounters a null, for example. 

void renderShapes(Shape s) {
if (null != shape)
s.draw();
}

So this is a very trivial example, but when this effect is compounded, adding all the checks for validity, generates a considerable percentage of extra code, non of which is actually necessary, in fact, just the opposite, it adds complexity and makes programs harder to read and maintain. Its all about trust, the correct place to check for validity is at the entrypoint to your system, whether it be via computer-computer or human-computer interface. Doing this, results in data validity being checked only once, but its then valid all the time.

Consider an object's contract. If the contract contains a method that states give me a valid X, and I will produce you a valid Y, then anything less than a valid X and the caller is to blame.

I found this great description from years back -

3.13 Do not program "defensively"

A defensive program is one where the programmer does not "trust" the input data to the part of the system they are programming. In general one should not test input data to functions for correctness. Most of the code in the system should be written with the assumption that the input data to the function in question is correct. Only a small part of the code should actually perform any checking of the data. This is usually done when data "enters" the system for the first time, once data has been checked as it enters the system it should thereafter be assumed correct.

Example:

%% Args: Option is all|normal
get_server_usage_info(Option, AsciiPid) ->
Pid = list_to_pid(AsciiPid),
case Option of
all -> get_all_info(Pid);
normal -> get_normal_info(Pid)
end.


The function will crash if Option neither normal nor all, and it should do that. The caller is responsible for supplying correct input.

2 comments:

Brad Wiederholt said...

So, I agree with the complexity trust issue, but I'm having a bit of a think about "entrypoint" into "the system." Where does one draw the line around the circle of trust? What is "the system'?

For example, last week we had IE crash in one of our apps. The root cause turned out to be data that was stored in a table with a null value. If I can't trust data in the database, I guess you are saying I should consider that an entrypoint into my system, and I should check it when I pull it out?

I'd really, really like to assume that stored data is correct, but, and here's a key issue, I don't trust the system that stored the data in the first place. It would be nice to trust the system, but I can't.

Also, maybe when this circle of trust breaks down, you don't see errors immediately. With the IE crash, this bad data passed through about 4 different subsystems before it went into someone's black box and brought part of the system down. For all I know there was probably a time aspect to this as well; the bad data could have been generated and stored many, many years ago. It may be possible to trust current data but not older data that may have been subjected to some sort of "data entropy."

I dunno, I find this topic interesting, and I want to trust folks and their systems, but I don't most of the time.

Andrew said...

Brad, that is a good example. I guess it really comes down to system's relationship with the database in this instance. If the same system is responsible for inserting the bad data, then it should have never gotten in there in the first place. This assumes of course that there is a contract that states that the data inserted into the database must be of a certain value or range, anything outside of those, breaks the contract.

The other way to look at it is as a pure integration point. If another system inserted the null data, we could claim that we should check the value for validity and while I probably wouldn't argue the point, I would still ask, is it valid as part of the system's contract, that null be inserted? If not, the system that inserted the data has a bug and should be changed.

Of course, there are other reasons why it might not be practical to change things. Third party code might be responsible for the bad data and it absolutely cannot be changed in a timely fashion - in which case I would be tempted to put in place some kind of simple data cleansing to deal with it.

If null is valid data however, then your program needs to be changed to cope with it as a completely normal and expected value - I assume this is not the case.

Its still crucial to address the root cause of the problem though and not the symptoms.