One of the biggest complaints about software and perhaps even more specifically FLOSS software is the lack of documentation. I'm not talking about those
small little dialogs we throw in our code that rarely see the light of day. I'm talking about stuff you throw in user docs or specifications and so forth
that an analyst or end-user reads.
The main reasons for this lack of documentation is that while everyone seems to want documentation, few really want to pay for it, and it's time consuming to keep
documentation in synch with what the software actually does. Having documentation that is very inconsistent with the software is worse than not having any documentation at all.
A good way to keep documentation up to date is to fold it into the process of developing and testing the software such that changes in software expose errors in the documentation and vice-versa
or that the act of changing the software corrects the documentation.
Part of the way we try to do that on the PostGIS project is to require each function patch to include documentation. We also try to make our documentation executable
by auto-generating PostgreSQL help and
some test suites from the documentation. For example the process of installing documentation-generated function and type descriptions
in PostgreSQL catches inconsistencies between
the implementation and what we actually publish about how the functions work. Even the image generator that builds most of the images in the manual is built with PostGIS as a core of it
so that when our build-bot is rebuilding the documentation it is exercising some PostGIS output functions.
I can't tell you how many times I've flagged changes in the current PostGIS 2.0 release
simply by trying to install the docbook generated PostgreSQL comment descriptions and PostgreSQL complains that the function signature no longer exists that the documentations says should be there.
So then I go back and correct
the documentation or yell at the programmer if the documentation makes more sense than what they coded. On the other side,
its easy to catch what we neglected to document simply by scanning the functions in PostgreSQL and seeing which ones don't have descriptions.
Okay as far as databases goes, many a DB Programmer/Analyst has waltzed into a project only to be befuddled about the meanings of all these tables used by the application.
If you are like us, you don't even want to look at any database documentation that is not part of the definition of the database because you know 99% of the time it's so obsolete
or was just a pipe dream of someone working in a vacuum
that its more useless than not having any
documentation at all. It is nice to have nicely formatted documentation you can read separate from the code,
but even sweeter if you can query the documentation just as easily as you can query the data.
A good way of having up to date documentation is to weave it in as part of the process of defining the structure. For databases
this means using foreign keys, primary keys, and using the commenting features that most relational databases offer these days. For example the databases we commonly work with,
PostgreSQL, SQL Server, MS Access, and even MySQL all allow you to provide descriptions for tables, table columns, and sometimes other objects such as functions and stored procs right in the database. PostgreSQL even allows you to provide descriptions of columns in
views though that's a bit messier to do. Sadly there isn't a consistent way of pulling these descriptions out of the database that will work for all of these. Each has differently defined meta tables it stores these descriptions in.
For thise article, we'll demonstrate how to pull this information from PostgreSQL.
PostgreSQL does offer many useful switches in psql for querying this data, but we'll focus our attention on pulling this data via SQL. It's much easier to incorporate this information
in auto-generated documentation with SQL because you can have more control what you include and how to format it.