The State of PostGIS, Joys of Testing, and PLR the Prequel

I've always enjoyed dismantling things. Deconstruction was a good way of analyzing how things were built by cataloging all the ways I could dismantle or destroy them. I experimented with mechanical systems, electrical circuitry, chemicals and biological systems sometimes coming close to bodily harm. In later years I decided to play it safe and just stick with programming and computer simulation as a convenient channel to enjoy my destructive pursuits. Now getting to the point of this article.

In later articles, I'll start to demonstrate the use of PL/R, the procedural language for PostgreSQL that allows you to program functions in the statistical language and Environment R. To make these examples more useful, I'll be analyzing data generated from PostGIS tests I've been working on for stress testing the upcoming PostGIS 2.0. PostGIS 2.0 is a major and probably the most exciting release for us. Paul Ramsey did a summary talk recently of Past, Present, Future of PostGIS at State of PostGIS FOSS4G Japan http://www.ustream.tv/recorded/10667125 which provides a brief glimpse of what's in store in 2.0.

Of course, no doubt, the attendees at PgDay EU will hear an earful of it from fellow PostGIS core development team members - Mark Cave-Ayland and Olivier Coutin in their - PostGIS 1.5 and beyond: a technical perspective. You won't want to miss the talk Discover PostGIS: GIS for PostgreSQL, by Vincent Picavet or any of Mark's tutorials Intro to PostGIS and Advanced PostGIS. If that were not enough, there are even more PostGIS talks at PgDay Europe: PostGIS , OpenStreetMap, OpenLayers by Hartmut Holzgraefe and PostGIS - das Wo? in der Datenbank (PostGIS the where? in the database) by Stefan Keller.

We are in the middle of gutting out a lot of the PostGIS inside code structure, changing the on disk format. In addition to significant code refactoring, PostGIS 2.0 introduces raster support and true 3D support.

I don't like writing unit tests. Units tests are useful, but too tedious to develop to provide satisfactory code coverage. In PostGIS we have unit tests built with CUnit which many PostGIS development team members take great deal of pride and joy in building. We also have various PostgreSQL specific tests, many of which were built from past bug ticket reports. Such things bore me to tears. I have a different dream. I dream of a world that allows for a more automated way of determining incorrectness. A testing engine that can build itself and find its way. A testing machine that doesn't need as much spoon feeding as unit tests do.

My general philosophy of life is to always try to kill at least 2 birds with every stone you throw. I write a lot of the PostGIS official documentation, but documentation would be a bit of a waste if it merely existed just for reading by humans. Around when we were developing PostGIS 1.4, I wrote an XSL file that takes our PostGIS doc-book documentation and converts it to a merciless battery of SQL tests by taking a permutation of every function we have documented and a reference example of every kind of geometry we support and cross joining to create a lot of SQL spatial query statements. A damn lot of valid SQL statements that are otherwise non-sensical. Basically, putting the rules of PostGIS and PostgreSQL grammar together with a population of arguments to formulate valid giberrish. I call this a machine gun test because its main intent is to wreak carnage on PostGIS with the intent of finding gaping holes in its armor. It provided much better code coverage than our unit tests, but was childish in its questions -- asking what is the length of a point or what happens if you try to use ST_MakeLine with a polygon and a linestring. Childish is not a bad thing since a lot of PostGIS users ask a lot of childish questions, so our system needs to handle them without crumbling. It was pretty good at detecting crashable areas, particularly when major code changes were happening such as we are doing in PostGIS 2.0 and as we did in PostGIS 1.4.

It also ended up doing a little more than I had planned for it, like raising flags when we had documented functions that had been accidentally or intentionally removed from the code base, functions that worked with certain new geometry types by accident rather than deliberate design, or even errors in our documentation that didn't jive with our codebase in terms of the arguments it claimed should go into the function. The little more, often required some inspection of the test results.

You could use it for regression testing as well by diffing the log outputs from two different versions of PostGIS run on the same server. Diffing two 300 MB test result files and visually comparing was not the most useful way of spending time though it was interesting since you could spot check changes in various GEOS versions etc.

In PostGIS 2.0 I started working on enhancements that would log results success, failure, and timing to a table for easier analysis. I also created an XSL for raster support that would take all the documented raster functions and create a battery of raster tests ALA raster grammar gibberish generator :). Now I'll start analyzing these with R and along the way demonstrate how to use PL/R in PostgreSQL. If you do attend Pg Europe and are interested in learning more about PL/R, then make sure to attend Joe Conway's Advanced analytics with PL/R.