Lessons learned Packaging PostGIS Extensions: Part 1

In prior articles we talked about the new PostgreSQL 9.1 extension model and upcoming PostGIS 2.0 extensions which we have experimental builds of so far. In this article and other's to follow, we shall provide a documentary of our venture into this new extensions world. We'll discuss some of the obstacles we had with building extensions, lessons learned, and foolishness exposed, with the hope that others can learn from our experience.

First off, the upcoming PostGIS 2.0 extensions will be packaged as at least two extensions -- postgis which will contain both PostGIS geometry/geography types, functions, meta views and tables as well as raster type and associated functions and tables. Topology support, while a part of upcoming PostGIS 2.0, will be packaged as a separate extension called postgis_topology. The main reason for breaking topology out as a separate extension is that it is always stored in a schema called topology and is not relocatable to another schema. The way the current extension model works, all the parts of your extension should live in the same schema. Later we plan to package tiger geocoder as an extension, but this one probably makes more sense to live on http://pgxn.org/ since it is only of interest to United States users, , is purely plpgsql with dependency on PostGIS, and we had beefed it up as part of a consulting contract for a company running PostGIS 1.5. It's the only piece documented in PostGIS 2.0 that works on 1.5 as well (aside from the tiger toplogy loader which has dependency on toplogy), although it has always lived as an extra in the PostGIS code base.

We'll probably package postgis_legacy_functions as an extension too for those people who badly need those 500 alias functions I chucked.

We mentioned in our prior article that we ran into some issues with how our extension worked -- e.g. topology referencing the postgis extension. Most of these turned out just to be ignorance on my part as to how the different pieces fit together and I'll elaborate on these.

Much of what will be described here is also documented in Packaging Related Objects into an Extension.

In the future I'm hoping we'll also see plr and pgrouting packaged as extensions which are common favorites of PostGIS users.

What is an extension good for?

First off, why do you even want to use the extension model?

One line statement to install a set of functions in a database that are aware of the fact that they exist as a unit.
CREATE EXTENSION postgis_topology;
One line statement to bundle loose functions as a contiguous unit.
CREATE EXTENSION postgis_topology FROM unpackaged;
Ability to upgrade to micro or minor versions of a set of functions again with a one liner that looks like this:
ALTER EXTENSION postgis_topology UPDATE TO '2.0.0a1';
Ability to drop a package of functions, tables, types etc. with a single statement
DROP EXTENSION postgis_topology;

Or if you really really want to drop everything dependent on an extension, including YOUR DATA
DROP EXTENSION postgis_topology CASCADE;
A declarative way of stating that a set of functions has a dependency with another extension and alerting the user to that during installation.

For example -- this is the message I would get if I tried to run: CREATE EXTENSION postgis_topology;
without postgis extension first being installed
```
ERROR:  required extension "postgis" is not installed
```
A quick way for user's to know what the hell your module does without trying to guess from the file name.
SELECT * FROM pg_available_extensions;
And best yet, so you never have to answer this question again though you may have to answer newer questions: PostGIS backup/restore because extensions don't get backed up aside from the CREATE EXTENSION statement or whatever is marked as user editable data by teh extension packager.
Better yet, pgAdmin III has an interface for it so you don't need to remember these commands except the unpackaged one, which sadly pgAdmin seems to be lacking in its arsenal.

All these benefits take some effort on the extension packager to make possible.

How does PostgreSQL know what versions of a package exist and which is the primary?

It knows by file name convension and the control file. For example, for postgis_topology, we have these files in the share/extension folder named:

postgis_topology.control
postgis_topology--2.0.0a1.sql
postgis_topology--2.0.0a1--2.0.0.sql
postgis_topology--unpackaged--2.0.0a1.sql

PostgreSQL looks at this and now knows there exists a version 2.0.0a1 and a file that will migrate a 2.0.0 to 2.0.0a1. This is because I mislabeled the first version of the extension 2.0.0 though 2.0.0 is not released yet. This allows the packager to make possible both downgrade and upgrade paths. the sql files (except for the unpackaged), are the same scripts you would run in prior versions of PostgreSQL, except they are not allowed to contain BEGIN/COMMIT; clauses. So in short making a PostGIS extension that can only install new versions was relatively painless once you learned the nuances of json, control and make install script conventions of the extension model.

The control file is a text file that dictates what other extensions are required, what is the default version to install if none is specified, if an extension can be relocated to a different schema and if not, what schema should it be installed in. Below is an example. NOTE: my version has a a1 at the end since its an alpha release.

# postgis topology extension
comment = 'postgis topology spatial types and functions'
default_version = '2.0.0a1'
relocatable = false
schema = topology
requires = postgis

The install from unpackaged is a special CREATE EXTENSION command that doesn't install new functionality, but bundles functions, tables etc. you already have in your database as an extension. You would need this feature for example if you installed PostGIS 2.0.0 not as an extension and want it to become an extension. In this scenario you have 900+ PostGIS functions floating about unaware of the fact that they are part of the same family. This command will christen them as members of The PostGIS family. In the case of PostGIS Topology family, we are talking about 70 someodd functions, types, and meta tables. Still a number too large to count on your fingers and toes.

Quickie Lessons Learned

Allowing your extension to be uninstalled requires no extra coding. Yeh I can finally one day get rid of that uninstall postgis script I despise with a passion. It just works and won't allow you to drop if you have dependencies unless you unleash a DROP CASCADE.
If your extension is not relocateable such as the case with topology, DO NOT INCLUDE A CREATE SCHEMA in your install script. The extension model creates the schema for you and get's mad when you try to recreate it again in your script.
DO NOT PUT ALTER TABLE commands in your install script. It's fine for an upgrade script, but not an install script. strk was being cute and did this in topology so that he could distinguish between the old 1.* topology and the new 2.0. I got a cryptic error with this. Why it had no issue with the ALTER DOMAIN DROP CONSTRAINT ... is a bit of a mystery to me.
If you are going to have an extension that is not relocateable and it has dependency on another extension, YOU ABSOLUTELY HAVE TO LIST THESE DEPENDENCIES IN YOUR control file. If you don't your extension will not install. The reason for this is when you mark an extension as non-reloacateable, PostgreSQL sets the schema to the non-relocateable extension schema completely ignoring the database's search_path. If you add dependencies to the control, it will also add to the search_path the schemas of the dependency extensions.
Creating an unpackaged script when you have 70 functions to worry about looks daunting on the outset, but there is hope.
The CREATE EXTENSION postgis_topology FROM unpackaged is all made possible by a script in the share/extension folder that has the name something like postgis_topology--unpackaged--2.0.0a1.sql

However it looks very different from an install script in that the commands in it are of the form: ALTER EXTENSION postgis_topology ADD FUNCTION ...;
repeat the recipe for every object that your package contains
When you are talking about 70 functions that's scary, and when 900+ with developers adding new ones daily, that is a nightmare unless you have a plan.

My new friend Sed

Sed and I have known each other for a while, but we have been in the past, casual interlopers. I dismissed him as a tool of those underground UNIX folks too in love with their cryptic commands to see the light of day and he dismissed me as a misguided window's user unequipped to appreciate the marvels of stream manipulation.

Then one day I had a problem that it seemed Sed might have the best answer to and a 1 line answer at that. I love one line answers which is why I love SQL and spatial SQL.

Problem: How do I convert Sandro's topology.sql install script file which has commands like:


CREATE TABLE topology.topology (
	id SERIAL NOT NULL PRIMARY KEY,
	name VARCHAR NOT NULL UNIQUE,
	SRID INTEGER NOT NULL,
	precision FLOAT8 NOT NULL,
	hasz BOOLEAN NOT NULL DEFAULT false
);
CREATE OR REPLACE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
  RETURNS SETOF topology.GetFaceEdges_ReturnType
AS
$$
stuff here
$$
$$
LANGUAGE 'plpgsql' VOLATILE;

to: something that looks like this:

ALTER EXTENSION postgis_topology ADD TABLE topology.topology;
ALTER EXTENSION postgis_topology ADD FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer);

And Sed said, "I can do that with one line of code." More on that later