Handling Backend Communications Errors

Architectural Discussion December 2001 Proposed/Reviewed, Linas Vepstas, Dave Peticolas

Updated and adapted for general QOF usage, May 2005 Neil Williams <linux@codehelp.co.uk>

Problem:

What to do if a serious error occurs in a backend while QOF is being used? For example, what happens if the connection to a SQL server is lost, because the SQL server has died, and/or because there is a network problem (unplugged ethernet cable, etc.) With the QSF backend, what happens if the write operation fails? (disk full, permission failure, etc.)

The "Generic Handler, Report it to the User" idea:

Go ahead and close the connection / clean up, but then return to QOF in some nice way, use qof_session_get_error to report the error to any GUI using program-specific handlers and then allow the user to initiaite a new session (or maybe try to do it automatically): and do all this without deleting any data.

I like this for several reasons:

its generic, it can handle any backend error anywhere in the code. You don't have to second-guess based on whether some recent query may or might not have completed.
I believe that reconnect will be quicker, because you won't need reload piles of accounts and transactions.
If the user can't reconnect, then they can always save to a file. This can be a double bonus if done right: e.g. user works on laptop, saves to file, takes laptop to airport, works off-line, and then syncs her changes back up when she goes on-line again.

Discussion:

Should the backend try reconnecting first, or just go ahead and return an error condition immediately? If the latter, then the current backend error-handling can just stay as it is and the gui codes need to add checks in several places, right?

The problem with automatic reconnect from within the backend is that you don't know quite where to restart... or rather, you have trouble getting to the right place to restart.

You can't just re-login, and reissue the commit. You really need to rewind to the begining of the subroutine. How can you do this?

Alternative 1) wrap the routine and retry three times.

Alternative 2) throw an error, let some much higher layer catch it.

Well, approach 1) seems reasonable... until you think about what happens if three retries doesn't cut it: then you have to throw an error anyway, and hope the higher layer deals with it. So even if you implement 1), you *still* have to implement 2) anyway.

So my attitude is to skip doing 1 for now (maybe we can add it later) and just make sure that when we "throw" the error, it really does behave like a throw should behave, and short-cuts its way up to where its caught. The catcher should probably be a few strategic places in the GUI, like wherever a QofQuery() is issued, and wherever an object is edited.

What's the point of doing 2 cleanly? Because I suspect that most network / filesystem errors won't be automatically recoverable. Most likely, either someone tripped over an ethernet cable, or the server crashed or the disc is full and you gotta call the sysadmin on the phone or clear out some files, etc. The goal is not to crash the client when the backend reports an error, but rather let the user continue to work.

How to Report Errors to the GUI

How would the engine->GUI error reporting happen? A direct callback? Or having the GUI always check for session errors?

We should use the session error mechanism for reporting these errors. Note that the API allows a simple 'try-throw-catch' style error handling in C. Because we don't/can't unwind the stack as a true 'throw' would, we need to make sure that when we "throw" the error, it emulates this as best it can: it short-cuts its way up and out of the engine, to where its caught in the GUI.

If there are a *lot* of places where these calls are issued, simplify things by implementing your own callback mechanism.

Generated on Thu Jan 31 22:50:27 2008 for QOF by

1.5.4