Handling errors

Errors are something we have to face in the real world so everything you write must have something to handle errors. Different languages have different mechanisms for handling errors but the same concepts will apply to them all.

Basic concepts

Coverage

You must make sure that you are able to respond to any error anywhere in the program. It can be very tempting to take a short cut and leave the error handling out of a short piece of code where "nothing can possibly go wrong". Resist this temptation and make sure that you've made provision for errors no matter where they may crop up and how unlikely they are.

Recursion

The error handler cannot handle errors inside itself. The first action of an error handler routine should always be to turn off or redirect the error handling. If this is overlooked then any error in the error handler code will call the error handler and run into the same error again, and again, and again until something in the internal structure of your program overflows.

If you have to turn error handling off completely inside the handler then any error in the error handler leads to a messy crash. This is not good but at least it will crash immediately without doing any damage elsewhere.

Robustness

At the risk of stating the obvious, the error handler has got to work first time, every time. Keep it as simple as possible. This means that the handler must be completely self-contained. If it uses your bespoke user interface object to display a message for the user then the handler will lock up if the error happens to be in that very object. Use the simplest means of output available - typically a Windows messagebox.

Levels of response

Not all errors are equally serious. Your error handler should have at least four levels of response:

Silent

Log the error to disk but do not interrupt processing or say anything to the user. You can use this level of response for diagnostic records that will give you advance warning of future problems. Use it as a trace mechanism to profile the typical usage of the system. If you are worried about the speed of a particular operation then create a log entry at the start and end of the process. This will tell you how long it really takes for a typical user to process real data on an everyday workstation. It will also tell you how often the situation occurs. All very useful information when you are trying to decide which problem needs to be fixed first.

Try again

Most of the errors in this category relate to the interface with the outside world. They might be caused by a problem with the hardware or by a mistake from the user. The cause might be a printer that's empty of paper, a drive with no disk, or a user asking for the impossible. The error handler should give the following information to the user:

  • What has happened
  • What are the implications
  • How they can recover

It should also give them the opportunity of trying again or of abandoning the operation.

Avoid

These too are typically caused by the failure of something outside the application but this time it's something over which the user has no control. An example might be a problem with a communications link which means that data cannot be imported from another site. The error handler should tell the user what has happened and confirm that the rest of the application is still running as normal.

Abandon

Finally there are the serious errors which mean that the application cannot continue to run safely. There are two sub-divisions here:

  • problems with the installation such as the failure of a network drive
  • problems with the application itself such as a Divide By Zero error.

In either situation, the error handler should tell the user what has happened and then shut the application down as safely as possible.

Messages to the user

The user rarely cares what has gone wrong and only needs two items of information:

  • What damage has been done.
  • Can I do anything to repair it.

The user should never see a raw error message from the programming language. Something like 'Invalid File Offset' will be utterly meaningless to most users. If the user has to report the error to an internal Help Desk then display a message with an error number - something like:


Error 42
Please phone the Help Desk on extension 1234.


The number might be the raw error code from the language or you may want to draw up your own list of codes.

Error log

An error log is invaluable if you are supporting an application from off site. The log should record time, date, user, and workstation so that you know who had the problem and it should record the name of the program file and the line number so that you know where the problem occurred. Rather than having to listen to exaggerated tales of problems you can read the log and see exactly what has been happening.

Consider having two types of error log; a simple log that merely records the facts listed above and a verbose log that records far more detail such as the amounts of free disk and memory space, the version and sub-version of the operating system, and chain of programs and operator actions that lead to the error. The verbose log will make the application run more slowly and will generate a large log file but sometimes that will be the only way of investigating a problem.

The error log can be recorded as a simple text file but if you store it in a table then you will be able to analyse it and extract information such as the most common types of error and the frequency of these errors.