Why QOF?

Summary:

Desktop application development is not particularly easy and desktop application development for multi-user, client-server apps is so hard that almost no one attempts these. In fact, this is the primary reason for the explosion of web applications: If you need to develop a multi-user, SQL-backed application, you can develop it orders of magnitude more quickly by using Apache and PHP (or Java) than you can by using Gnome or KDE. This is great for the web, but terrible for anyone wanting to develop distributed, multi-user desktop applications. There are many reasons why this is so and there are many ways in which one can attempt to change this situation. One can boil the ocean with this type of discussion; I won't attempt to do that here. Instead, lets try to justify an approach like QOF in a broader Free Software framework.

 

Data sharing

A number of applications become much more interesting to their users when they have access to large quantities of data, or are hooked into databases on the net in some way, or are multi-user or collaborative in some way. In fact, any large-data application is going to be multi-user, since having that much data is valuable and leads to a tremendous pressure to share. There's a huge number of apps that fit this profile: everything from shopping catalogs, where the store 'shares' the product data with the customer, to Bugzilla, where the bug status is 'shared' between developers and users, to p2p networks for sharing mp3/ogg files. Today, certain types of 'basic' sharing are still incredibly hard: for example, syncing the address book on my cell phone with the address book on my home computer with the address book on my work computer(s). Never mind sharing parts of my address book with friends. (See Pilot-QOF for an example of how QOF enables users to select parts of a Palm address book (amongst other databases) for export and sharing.)

Data queries

What does one need to build that kind of application? Well, one needs at a minimum a query mechanism, so that the user can query, sort, report on and manipulate the data. Even good single-user apps need this ability: personal photo and music organizers will have this kind of interface, as well as address-books, etc. Blogging tools could benefit from a generalized query mechanism. P2P already has query built-in, although its of a rather poor sort. Indeed, these types of find & query considerations are driving the Gnome Storage project (with whom we'd like to collaborate). They also show up as ideas in Microsoft's next-generation Longhorn operating system (patent encumbered; however, this project, QOF and the DWI project are prior art.)

QOF also has support for INSERT SQL commands to populate the data set (a QofBook).

Top

Sharing Technologies, Querying and the Future of Programming

Shared or global filesystem

One traditional solution to the sharing problem has been the idea of a shared or even global filesystem. This was the original vision behind NFS and Microsoft Windows Workgroups (SMB/Samba file shares). Global filesystem attempts include AFS (the Andrew File System, from the early 90's), DFS (from the late 90's) and Coda (a free/open solution). The local solutions, NFS and Samba, don't / can't scale to a global level. The global filesystems haven't really worked out either, with one grand exception: the web itself ( i.e. HTTP) is in many ways filesystem-like. But the success of the web is not due to its filesystem-like aspects, but rather due to the existence of a common document viewer: the web browser. Shared file systems, though better than nothing, have one major pitfall: they still don't make writing multi-user apps easy. Two users updating the same file at the same time is a recipe for disaster. The two apps that are updating the same file at the same time don't know about each other and there is no native or built-in default that would let them find out.

Physical movement of objects

Another older solution to the sharing problem is the idea behind RPC, CORBA, SOM/DSOM, COM/DCOM and more recently, SOAP and the Microsoft .net frameworks. Here, the focus of sharing is actually about the physical movement of objects (OOP objects) from one system to another. Each of these technologies in some way enable the programmer to touch or work with objects on some other, remote system. However, as most programmers eventually seem to figure out, its bitchingly hard to use CORBA (or its cousins) and there seems to be relatively little payoff for the effort. Great: now one has distributed, networked objects, but so what? One still doesn't have persistence (which is the strength of file systems) and one doesn't have searchability (which is what databases offer) and one doesn't have much in the way of security (witness Microsoft's Virus Apocalypse).

P2P file sharing

The latest file-sharing technology is 'P2P', the peer-to-peer networks. Unfortunately, MP3 trading has given them a bad name. The other problem is that P2P doesn't really provide security (would you use P2P to share your financial information with your bank? How about medical records? Didn't think so). P2P networks are also terrible at sharing 'objects' in the sense of OOP objects, they're useless for that. They have no built-in versioning for the files they do share: you can't do SCCS/CVS/Subversion/BitKeeper on top of P2P, even though that would be useful. Curiously, though, P2P networks do have the one true important feature that makes them popular: P2P query. You can issue a query to find the one thing that you really wanted and P2P will run that query in a distributed, decentralized way, across a panalopy of servers. How else can you find that rare MP3 that you wanted? Unfortunately, none of the P2P query protocols are as rich or as powerful as SQL.

As P2P illustrates, what's old is new: querying is central to sharing and to multi-user applications. In fact, most true multi-user apps are (and have been, from the start, for many decades) based on database technology if not SQL proper. This is because SQL can scale to extremely large datasets without loosing the ability for a user to focus in on one or two records. SQL came with a kind-of browser: the SQL command line. Though pale in comparison to the web browser, it did allow you to look where you wanted, to find the needle in the haystack. SQL is scalable and it can be browsed: these are the same strengths that made the web possible.

Thus it is natural to think of querying and databases when thinking of the future of multi-user applications. What hasn't yet been done or fully explored is the possibility of a global database: a database that is accessibly from everywhere in the world (much as http is), but is decentralized, searchable and managed by the users who use it, much as P2P is. There are many people, ideas and technology trends that are pushing towards a global database, but it hasn't happened yet. The convergence between computers, phones and TV's is pushing in this direction, but the games have only begun.

Object oriented SQL

SQL, however, has its drawbacks. Most programmers don't know SQL and a large portion of them blanch and run in the opposite direction when they hear those three letters. Actually, writing applications that mix SQL into the program logic is not that hard ... at first. In fact, its quite easy. The problem arises when one has to port to a different database and one discovers that SQL is not terribly well standardized. The next problem comes when one wants to add or change tables. Programmers typically discover that the SQL code they are maintaining is not modular and sometimes very poorly structured. Object oriented programming and SQL weren't really made for each other, although object oriented databases do help.

In summary, the future of desktop programming lies with a broadly powerful query engine, coupled to persistent OOP-style objects that are distributed, decentralized, secure, protectable and versionable. This is the true 'convergence': not that of TV's, phones and computers, but the convergence of the best features of P2P, SQL, file systems, version control and the web, which, to the programmer, looks like nothing more than 'persistent objects', and is no harder to 'use' than garbage collection is for a Java programmer. This is the grand vision of what desktop programming should be.

Top

Application Architecture

The QOF architecture, as it currently stands, is taken from the vantage point of Free Software development and explicitly ignores commercial/proprietary technologies. It is also somewhat focused on the specific case study of the applications that currently benefit most from QOF: GnuCash, GnoTime and Pilot-QOF. But this is the good news: QOF has evolved through trial-by-fire, in the forge of real-world application development, rather than through the the over-arching, abstract principles discussed above.

Object/View/Controller

The starting point for the architecture is a modified "Object/View/Controller" paradigm. The GUI acts as a controller, controlling and manipulating a set of objects in local system memory. The objects in system memory are a cache, or mirror, or local copy of data in a (remote) database. Two technical problems immediately present themselves. One is how to connect the GUI to the objects in an easy-to-develop, easy-to-maintain style. The second is how to do the same in keeping the local copy of the objects in sync with the database. In particular, how keep them in sync with the data in other (remote) objects that other users might be manipulating. One possible answer and that is the answer used today by GnuCash, is to design the GUI with the Glade graphical GUI designer and hook it up to some C (could have been C++, or even other language) objects. The objects are in turn loaded from either a file, or fetched from an SQL database.

I (would) like to have my apps be natively multi-user. This means dealing with all of the locking and caching problems that multi-user presents. Multi-user also means that one needs to have a centralized data storage location: a database and by presumption, an SQL database. Thus, if one wants to have C objects, one needs to shim them onto SQL somehow. Today, for me, this is a labor-intensive, manual process. In the commercial software world, there are tools and companies that have systems that make this a lot easier and even automatic; but there aren't any in the Free Software world.

After writing the shims between the first half-dozen objects and their equivalent SQL tables, it becomes painfully apparent that the process can be mostly automated, as the shims consist of mostly common code. Unfortunately, the automation of this process is easier said than done. At first, it seems that all one needs is to specify a mapping or 'dictionary' from an SQL table field to an object setter/getter. In practice, its more subtle than that. Nonetheless one of the goals of QOF is to provide a fairly generic dictionary or mapping mechanism. To the best of my knowledge, in the Free Software/Open Source world, there is only one system that maps databases to objects and back: DWI/DUI. However, that system is still rather primitive. At least one goal is to have QOF integrate with and expand on that function.


Core Assumptions

The QOF project and the above comments, make some core assumptions that are questionable and need to be addressed. These are discussed below.
Top

... Unfinished ... Draft ...


See also:

Linas Vepstas <linas@linas.org> July 2003, April, June 2004

Updated: Neil Williams <linux@codehelp.co.uk> May 2005.


The copyright licensing notice below applies to this text.

Copyright © 2003,2004 Linas Vepstas

Copyright © 2005 Neil Williams

Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of this license is included in the file copying.txt