Lessons learned Packaging PostGIS Extensions: Part 1

In prior articles we talked about the new PostgreSQL 9.1 extension model and upcoming PostGIS 2.0 extensions which we have experimental builds of so far. In this article and other's to follow, we shall provide a documentary of our venture into this new extensions world. We'll discuss some of the obstacles we had with building extensions, lessons learned, and foolishness exposed, with the hope that others can learn from our experience.

First off, the upcoming PostGIS 2.0 extensions will be packaged as at least two extensions -- postgis which will contain both PostGIS geometry/geography types, functions, meta views and tables as well as raster type and associated functions and tables. Topology support, while a part of upcoming PostGIS 2.0, will be packaged as a separate extension called postgis_topology. The main reason for breaking topology out as a separate extension is that it is always stored in a schema called topology and is not relocatable to another schema. The way the current extension model works, all the parts of your extension should live in the same schema. Later we plan to package tiger geocoder as an extension, but this one probably makes more sense to live on http://pgxn.org/ since it is only of interest to United States users, , is purely plpgsql with dependency on PostGIS, and we had beefed it up as part of a consulting contract for a company running PostGIS 1.5. It's the only piece documented in PostGIS 2.0 that works on 1.5 as well (aside from the tiger toplogy loader which has dependency on toplogy), although it has always lived as an extra in the PostGIS code base.

We'll probably package postgis_legacy_functions as an extension too for those people who badly need those 500 alias functions I chucked.

We mentioned in our prior article that we ran into some issues with how our extension worked -- e.g. topology referencing the postgis extension. Most of these turned out just to be ignorance on my part as to how the different pieces fit together and I'll elaborate on these.

Much of what will be described here is also documented in Packaging Related Objects into an Extension.

In the future I'm hoping we'll also see plr and pgrouting packaged as extensions which are common favorites of PostGIS users.

What is an extension good for?

First off, why do you even want to use the extension model?

All these benefits take some effort on the extension packager to make possible.

How does PostgreSQL know what versions of a package exist and which is the primary?

It knows by file name convension and the control file. For example, for postgis_topology, we have these files in the share/extension folder named:

postgis_topology.control
postgis_topology--2.0.0a1.sql
postgis_topology--2.0.0a1--2.0.0.sql
postgis_topology--unpackaged--2.0.0a1.sql

PostgreSQL looks at this and now knows there exists a version 2.0.0a1 and a file that will migrate a 2.0.0 to 2.0.0a1. This is because I mislabeled the first version of the extension 2.0.0 though 2.0.0 is not released yet. This allows the packager to make possible both downgrade and upgrade paths. the sql files (except for the unpackaged), are the same scripts you would run in prior versions of PostgreSQL, except they are not allowed to contain BEGIN/COMMIT; clauses. So in short making a PostGIS extension that can only install new versions was relatively painless once you learned the nuances of json, control and make install script conventions of the extension model.

The control file is a text file that dictates what other extensions are required, what is the default version to install if none is specified, if an extension can be relocated to a different schema and if not, what schema should it be installed in. Below is an example. NOTE: my version has a a1 at the end since its an alpha release.

# postgis topology extension
comment = 'postgis topology spatial types and functions'
default_version = '2.0.0a1'
relocatable = false
schema = topology
requires = postgis

The install from unpackaged is a special CREATE EXTENSION command that doesn't install new functionality, but bundles functions, tables etc. you already have in your database as an extension. You would need this feature for example if you installed PostGIS 2.0.0 not as an extension and want it to become an extension. In this scenario you have 900+ PostGIS functions floating about unaware of the fact that they are part of the same family. This command will christen them as members of The PostGIS family. In the case of PostGIS Topology family, we are talking about 70 someodd functions, types, and meta tables. Still a number too large to count on your fingers and toes.

Quickie Lessons Learned

My new friend Sed

Sed and I have known each other for a while, but we have been in the past, casual interlopers. I dismissed him as a tool of those underground UNIX folks too in love with their cryptic commands to see the light of day and he dismissed me as a misguided window's user unequipped to appreciate the marvels of stream manipulation.

Then one day I had a problem that it seemed Sed might have the best answer to and a 1 line answer at that. I love one line answers which is why I love SQL and spatial SQL.

Problem: How do I convert Sandro's topology.sql install script file which has commands like:

CREATE TABLE topology.topology (
	id SERIAL NOT NULL PRIMARY KEY,
	name VARCHAR NOT NULL UNIQUE,
	SRID INTEGER NOT NULL,
	precision FLOAT8 NOT NULL,
	hasz BOOLEAN NOT NULL DEFAULT false
);
CREATE OR REPLACE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
  RETURNS SETOF topology.GetFaceEdges_ReturnType
AS
$$
stuff here
$$
$$
LANGUAGE 'plpgsql' VOLATILE;
to: something that looks like this:
ALTER EXTENSION postgis_topology ADD TABLE topology.topology;
ALTER EXTENSION postgis_topology ADD FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer);

And Sed said, "I can do that with one line of code." More on that later