<msgno.h>

Managing Error Codes and Associated Messages Across Separate C Libraries

The Source

Dealing with error conditions in C can be a cumbersome process. The standard C library provides minimal support for error handling and is fraught with deficiencies. If libraries are involved, extracting useful information is often neglected alltogether and is at the heart of users dismay over error messages like:

Error 3
What is Error 3? This message is perfectly useless!

The msgno package is a refinement on the popular com_err(3), errno(3), perror(3), and similar functionality that performs two independantly useful tasks. One task of msgno is to dynamically generate error codes that are globally unique. This allows one set of message handling functions to be used regardless of where error codes may have come from. The other task is provided by a variety of non-intrusive output macros that dispatch informative formatted messages. If the msgno macros and conventions are used throughout a project, stack traces (not real stack traces) like the below output can be generated even from deep within libraries. Messages collected within library code are not dispatched unless one of the top-level output macros are called:

[miallen@nano c]$ ./sql moo
sock.c:21:sock_open: Failed to bind interface: host=moo,port=900
  db.c:45:db_open: Cannot establish network connection with database: host=moo
  sql.c:36:main: Failed to open database: name=moo
On the contrary, in the steady state a high degree of message handling is not necessary. A properly functioning program should probably not be actively spewing debugging information. However, most libraries and programs are not static works. A good program deserves to be rewritten or refactored a few times. Enabling diagnostic information is extremely useful for maintainance, helping troubled users, and particularly for debugging purposes.

Two Problems with msgno

There are two problems with this package that should be explained before going any further. One problem with msgno is that error code macros are not constants. They are defined by referencing a member in a struct.

#define DB_CONN_FAILED  db_codes[1].msgno
This means that error code macros for use with msgno cannot be used with switch statements:
switch (dberrno) {
   case DB_CONN_FAILED: /* COMPILER ERROR
                         * case label does not reduce to an integer constant
                         */
   ...
This is a price paid for easy generalized cross library error codes. Looking forward, it is possible that a way to designate a message list, and thus error codes that are constants, at compile time will be developed however the burden of being forced to use the alternative if, else if, else statements has not been compelling enough.

The other problem with msgno is that some of the output macros use the gcc variatic macro extension. C99 supports variatic macros, however I do not know if many mainstream compilers support them. In a few years, this should not be an issue.

Existing Support for Error Handling in C

The errno package uses a global shared integer to communicate an error code to a caller. The possible errno codes have a limited number of meanings predefined by the local C implementation (the standard C library provides for only EDOM and ERANGE). Using the errno mechanism (really more of a convention) is simply a matter of setting it to zero before a function call and testing to see if it is something other than zero after the call completes.
#include <errno.h>
#include <math.h>
...
   errno = 0;
   y = sqrt(x);
   if (errno)
      fprintf(stderr, "sqrt failed: %e\n", x);
In practice there are a few more permutations of this. A function may communicate to the caller that the operation failed with a return value such as NULL or -1 at which point a message may be printed like above or the strerror and perror functions might be used to extract something more informative:
#include <string.h>
#include <stdio.h>
...
   fd = fopen(argv[1], "r");
   if (fd == NULL)
      perror(argv[1]);
This last example might print:
[miallen@nano c]$ ./etest yoyo.txt
No such file or directory: yoyo.txt

The Deficiencies with errno, strerror and perror

These are measly offerings from the standard C library. They may be adequate for small programs but there are a few serious issues.

The perror and strerror functions are useless for shared libraries. It would be considered poor practice to write a library that unexpectedly wrote error messages to stderr or similar. Instead, the library should communicate an error to the caller just as the function that failed did for the library. This will allow error codes to propogate up to a higher level part of the program that knows what to do about the error condition. In this situation it is common to see a shared global variable like errno to communicate an error code indicating the reason for the error specifically for use by that library and it's users. This technique is quite common however it fails to communicate useful context. Instead you get a message like Error 3 which is all but useless.

The error messages associted with errno or other shared globals used by libraries are not extensible. There is no facility to add error codes with associated messages in a way that works across libraries. Certainly a library can define some simple constants with an array of associated messages, however the library user must know how to extract these messages for each library (e.g. strerror, ldap_err2string, ssl's ERR_reason_error_string, pam_strerror, dlerror, regerror, ... the list is long).

Error codes and associated messages alone do not provide enough information. A text message that is simply mapped to an error code does not provide any more context than the error code itself. It is much more useful to get specific data such as the errant hostname or the SQL statement that failed.

Error Handling with the com_err Package

The com_err package provides for both error code handling accross libraries as well as the means to deliver associated text messages to a user defined hook (stderr by default). The mechanism works by defining an error table which is then preprocessed by another program into C source for inclusion into your program or library. It hashes a 4 letter id to generate unique error codes. The com_err function is then used to dispatch an error code's message along with a context specific message:
#include <et/com_err.h>
...
   if (retval && retval != KRB5_SENDAUTH_REJECTED) {
      com_err(argv[0], retval, "while using sendauth");
This example would generate a message such as:
[miallen@nano c]$ ./krbtest
krbtest: Authentication failed while using sendauth
The package is widely available and is used by Kerberos and the BSDs. If it provides the functionality you seek, I highly recommend it. On the surface msgno appears to be very similar to com_err. Indeed they are very similar in purpose. However the implementation is very different which is reflected in it's usage in subtle ways. One not so subtle way is that msgno error codes are registered at runtime; it is not necessary to preprocess an error table definition file with the compile_et program. There are quite a few other differences that will be discussed in the next section.

com_err was written in 1987 by Ken Raeburn and MIT's Student Information Processing Board (SIBP)
http://www.mit.edu/afs/sipb/project/discuss/dist/source/et/

Managing Error Codes and Associated Messages Across Separate C Libraries with the msgno Package

The msgno package is very similar in purpose to com_err in that they provide both for unique message numbers (traditionally error codes) and their associated messages across C libraries and modules as well as means to deliver those messages to a stream or user defined hook. The message macros are simple and non-intrusive. Both message code management and output macros are independantly useful but most effective when used together. If errno values are used in place of the msgno parameter, the messages returned by strerror will be generated.

A "Hello, World" Example

This example is so trivial it defeats the purpose of using msgno however it will quickly illustrate how to use the package. A more elaborate example with multiple modules of code is explored later. The code of interest is bold.

/* hello.c - a trivial example that uses msgno
 */

#include <stdlib.h>
#include <stdio.h>
#include <msgno.h>

struct msgno_entry hello_codes[3] = {
    { 1, "That world is too busy at the moment" },
    { 0, "That planet has no atmosphere" },
    { 0, NULL }
};

int hello_errno;

int
say_hello(const char *planet)
{
	msgno_add_codes(hello_codes);

	if (!(*planet & 1)) {
		hello_errno = HELLO_WORLD_BUSY;
		return -1;
	}
	if (*planet != 'e') {
		hello_errno = HELLO_NO_ATMOSPHERE;
		return -1;
	}
	return printf("hello, %s\n", planet);
}

In the C file of the module or library, initialize an array of msgno_entry structures to define a list of message numbers (error codes) and their associated text messages. This msgno_entry structure is just:
struct msgno_entry {
    unsigned int msgno; /* message number */
    const char *msg;    /* message */
}
The last message must be NULL and the list must contain at least one entry (not including the NULL message). The 1 for the first error code instructs msgno to generate error codes starting with 1 rather than 0.

Error code macros such as HELLO_WORLD_BUSY are defined in the header file (see hello.h directly below). Before they can be used, the msgno_add_codes function must be called with the message list. In practice there is usually only one or a few places withing a library or module, such as an initialization routine, where this needs to be performed. Redundant calls to the msgno_add_codes function will be ignored.


/* hello.h - a trivial example that uses msgno
 */

#ifndef HELLO_H
#define HELLO_H

#define MSGNO

#include <msgno.h>

extern struct msgno_entry hello_codes[];

#define HELLO_WORLD_BUSY    hello_codes[0].msgno
#define HELLO_NO_ATMOSPHERE hello_codes[1].msgno

extern int hello_errno;

int say_hello(const char *planet);

#endif /* HELLO_H */

Before compiling a program that uses msgno the macro MSGNO must be defined or all output macros will be ignored and the resulting code will not generate any output. This is to ensure that libraries and modules are shipped with a default behavior that users unfamilar with msgno would expect which is not to inadvertantly generate output to stderr or elsewhere. In this example the MSGNO macro is hard coded before the msgno.h file is included to perminately enable the msgno output macros. Alternatively one might have just as easily used a compiler flag such as perhaps:
[miallen@nano c]$ gcc -Wall -DMSGNO -lmsgno hello.c -o run_hello run_hello.c
To define the error code macros, simply reference the msgno member in each msgno_entry structure. Notice the hello_codes list is referenced by the code in these macros so an extern declaration is required.

/* run_hello.c - run the hello.c example
 */

#include <stdlib.h>
#include "hello.h"

int
main(int argc, char *argv[])
{
	if (argc < 2) {
		MSG("Must provide the name of a planet");
		return EXIT_FAILURE;
	}
	if (say_hello(argv[1]) == -1) {
		MNF(hello_errno, ": planet=%s", argv[1]);
	}
	return EXIT_SUCCESS;
}

In the upper level, non-library code area of a program, generate messages with the msgno ouput macros. These will call the user definable msgno_hdlr function which is by default set to msgno_hdlr_stderr.

Use the MSG macro to generate a simple text message. Notice this macro does not reference any error codes here so we can safely use it even though msgno_add_codes has not yet been called.

The MNF macro takes a message number and generates the associated message followed by an additional context string.

There is also an MNO macro that takes only a message number and generates the associated message.

The output of this "Hello, World" example might look like:

[miallen@nano c]$ ./run_hello pluto
hello.c:15:main: That world is too busy at the moment: planet=pluto
[miallen@nano c]$ ./run_hello mars 
hello.c:15:main: That planet has no atmosphere: planet=mars
[miallen@nano c]$ ./run_hello earth
hello, earth

Output Macros

The msgno output macros dispatch a message to the msgno message handler. The msgno_hdlr_stderr handler is used by default but this can be specified to any function matching the msgno_hdlr prototype. There are three main macros used in the top-level glue of a program. There descriptions follow.

The Main Output Macros
MSG(fmt, args...) Generates a simple formatted message.
MNO(msgno) Generates the message associated with the message number msgno.
MNF(msgno, fmt, args...) Generates both the message associated with the message number msgno followed by a formatted context message.

The above macros will be dispatched as soon as they are invoked. Naturally, should not be used within shared libraries because libraries generally do not generate output other than that associated with it's function. Instead, the msgno package provides separate macros for placing within libraries that will not dispatch messages to the msgno_hdlr unless one of the main output macros are called. So output will not be generated unless triggered by the library user.

These macros build a deferred sequence of messages. There are two sets of macros that are identical to the main three however one set is for initiating a primary message whereas the other set appends additional messages to a sequence. Use the macros that start with P when initiating an error condition and use the macros that begin with A when you know a deferred sequence is being constructed. These macros are what generate the stack trace like output:

[miallen@nano c]$ ./sql moo
sock.c:21:sock_open: Failed to bind interface: host=moo,port=900
  db.c:45:db_open: Cannot establish network connection with database: host=moo
  sql.c:36:main: Failed to open database: name=moo

There may be situations where this is not obvious. In this case it safer to simply call the P macros and initiate a new message. This process is not analygous to exception handling; the macros only expand to trivial sprintf statements that append stings into a shared global buffer.

The Primary Output Macros for Deferred Sequences in Libraries
PMSG(fmt, args...) Sets the primary message in a deferred sequence to a simple formatted message.
PMNO(msgno) Sets the primary message in a deferred sequence to the message associated with the message number msgno.
PMNF(msgno, fmt, args...) Sets the primary message in a deferred sequence to both the message associated with the message number msgno followed by a formatted context message.
The Appending Macros for Deferred Sequences in Libraries
AMSG(fmt, args...) Appends a simple formatted message to a deferred sequence of messages.
AMNO(msgno) Appends the message associated with the message number msgno to a deferred sequence of messages.
AMNF(msgno, fmt, args...) Appends both the message associated with the message number msgno followed by a formatted context message to a deferred sequence of messages.

To be continued ...

Sun Dec 2 03:04:17 EST 2001