One of the great lessons learned in building PostGIS extensions is my rediscovery of SED. SED turned out to be
mighty useful in this regard and I'll explain a bit in this article. Unfortunately there is still a lot I need to learn about it
to take full advantage of it and most of my use can be summed up as monkey see, monkey scratch head, monkey do. In addition I came across what I shall refer to as Pain points with using
the PostgreSQL Extension model. Part of which has a lot to do with the non-granular management of changes in PostGIS,
the day to day major flux of changes happening in PostGIS 2.0 space, and my attempt at trying to creat upgrade freeze points amidst these changes.
When PostGIS 2.0 finally arrives, the freeze points will be better defined and not change from day to day. So some of these issues
may not be that big of a deal.
The Joy of SED: Building unpackaged extension install scripts
In the Packaging PostGIS Extensions Part 1, I mentioned a problem I had with converting a topology install script to an
Extension unpackaged script. The idea of the extension unpackaged script is to package already installed loose functions, tables, types, operators,etc
into a package. This would come about if you installed a contrib module the old fashioned way and then wanted
to mark it as an extension so you could experience all the joys the extension model provides.
Problem: So the topology install script looked something like this:
CREATE TABLE topology.topology (
id SERIAL NOT NULL PRIMARY KEY,
name VARCHAR NOT NULL UNIQUE,
SRID INTEGER NOT NULL,
precision FLOAT8 NOT NULL,
hasz BOOLEAN NOT NULL DEFAULT false
CREATE OR REPLACE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
RETURNS SETOF topology.GetFaceEdges_ReturnType
LANGUAGE 'plpgsql' VOLATILE;
and I needed to change it to something that looks like this:
ALTER EXTENSION postgis_topology ADD TABLE topology.topology;
ALTER EXTENSION postgis_topology ADD FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer);
This little sed script worked well:
sed -e '/^CREATE \(OR REPLACE\|TRIGGER\|TYPE\|TABLE\|VIEW\)/!d;' \
-e 's/OR REPLACE//g' \
-e 's/CREATE\(.*\)/ALTER EXTENSION $(EXTENSION) ADD\1;/' \
-e 's/DEFAULT [a-zA-Z]\+//g' \
-e 's/\(BEFORE\|AS\)\(.*\)/;/' \
-e 's/(;/;/' \
-e 's/\\(;/;/' \
-e 's/;;/;/g' $< > $@
, though needs some slight modifications to work more seamlessly. To get
it to work I had to make sure all the function arguments were on a single line in topology.sql. I've been struggling
to get it to work with functions where the arguments are broken up in multiple lines. I've learned enough to know it's possible, but
just haven't come up with the right sequence of sed to make it happen. This was a minor set back since most functions in PostGIS geometry, geography, raster, and topology
have function args defined on a single line in there respective scripts so only about 10 or so need changing. It's just bothersome that you can create a valid install script
that can't be converted to an unpackaged script. Any people with suggestions I'm all ears.
To summarize what this script does.
- The first sed -e uses the !d command to delete all lines that don't have a CREATE followed by REPLACE, TRIGGER,TYPE, TABLE, VIEW.
This basically just leaves the header lines of each object like:
CREATE OR REPLACE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer) and chucks the bodies.
- The second just wipes out any OR REPLACE that is left so we are left with:
CREATE FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
- The third sed line replaces the word CREATE with
ALTER EXTENSION topology where $(EXTENSION) is a variable set to topology
and then adds back in the FUNCTION and args. So my new line becomes:
ALTER EXTENSION topology ADD FUNCTION topology.ST_GetFaceEdges(toponame varchar, face_id integer)
- The fourth sed line strips out all the DEFAULT values. In this particular set I have no default args but I do have some in other functions.
- The fifth line gets rid of BEFORE or AS and anything that comes after and replaces with a semicolon. This I needed because sometimes the AS is put on the first line of the object body and sometimes its now.
- The last sed command strips any duplicate semicolons
The Joy of SED: Building extension upgrade scripts
I developed a similar answer for creating an upgrade script and this one actually doesn't care about multilines etc.
This worked great for topology, but for combining raster / postgis combo package I ran into some set backs. These set backs
have more to do with limitations of extensions model than anything to do with SED. My script looked like this:
sed -e 's/BEGIN;//g' -e 's/COMMIT;//g' \
-e '/^\(CREATE\|ALTER\) \(CAST\|TYPE\|TABLE\|SCHEMA\|DOMAIN\|TRIGGER\).*;/d' \
-e '/^\(CREATE\|ALTER\|DROP\) \(CAST\|TYPE\|TABLE\|SCHEMA\|DOMAIN\|TRIGGER\)/,/\;/d' \
-e 's/^DROP \(AGGREGATE\|FUNCTION\) [\(IF EXISTS\)]*\(.*\);/ALTER EXTENSION $(EXTENSION) DROP \1 \2;DROP \1 IF EXISTS \2 ;/' \
$<< >> $@
Basically the intent was to create a micro upgrade script. The first line strips BEGIN and COMMIT
clauses since the CREATE and ALTER Extension commands put in their own begin and commit.
Since PostGIS doesn't introduce any new complex types
in theory for micro upgrades, I could avoid including CASTS, TABLES, SCHEMA, DOMAIN etc. creations.
That is what the
second line of the sed part does -- strip out all undroppable object types from the update that is all defined on a single
line. The third line drops alls such things defined on multiple lines by deleting all phrases that start with CREATE ... up to the ;.
It wouldn't work for functions since they could have ; in the body, but for CASTS types etc it works fine.
The final sed command looks for AGGREGATE or FUNCTION drops and then adds a line before them to drop the AGGREGATE/FUNCTION from
the EXTENSION. This is where my issue starts.
In the PostGIS 2.0 wind of changes, people are renaming function argument names every so often, changing AGGREGATES internals,
or dropping functions replacing them with equivalents that take default args.
Such changed require a DROP of the Aggregate or Function first if it exists. IF EXISTS works great since you don't have to worry
about the case of your script throwing an error if it doesn't exist. However, the
syntax does not support the DROP IF EXISTS, so it has to exist. For my older installs, the functions I am trying to drop may not exist.
If a object is part of an EXTENSION, it has to be dropped from the extension before you can drop it. This forces me to
use a DROP CASCADE clause, which I hate using because it could destroy user defined objects that aren't part of the extension. In those cases
I would prefer the update to fail rather than risking destroying user generated data. Keep in mind that in standard
release time, these deeper kind of changes wouldn't be allowed so might be a non-issue.
Tracked: Nov 03, 18:38