In this article we shall attempt to perform the same feats that Simon did, but with PostgreSQL instead of
Oracle XMLDB. Note while we are demonstrating this with a GPX file, the same XPath approach can be used to process any XML file.
PostgreSQL since 8.3 has had ANSI SQL 2003 XML functionality built in. Before 8.3, you could use the xml2 contrib module to achieve the same effect in a not so standards compliant sort of way.
In this example we shall demonstrate the built in functionality in 8.3 and above. The key function we will use is the xpath function. XPath is a language used to query XML data and PostgreSQL supports the XPath
1.0 version. The following is a quick primer on XPath that seems useful http://www.zvon.org/xxl/XPathTutorial/General/examples.html.
You can also refer to the PostgreSQL XML section of the documentation.
Getting the data
We will use the same sample data Simon used, except sadly we had to change it further because the schema wasn't defined in such a way that PostgreSQL liked or rather I was too stupid to construct the XPath statement in such a fashion that would satisfy the
PostgreSQL XML thingy. PostgreSQL seems
to require that the schema have a name in addition to a location or at least that is what we concluded. The GPX example Simon had a location but no name. The docs have an example of the form. Where the second argument is an array of 2 dimensional
arrays with the first item being the schema name and second the URI for the schema. Also note that the xpath function always returns an array even if there is only one element.
Suffice it to say, our version is like Simon's revised version except we also stripped off the namespace references since the PostgreSQL XML parser seemed unhappy that the name space defined was never referenced in format xsi:.... or something of that sort
. We will be using the xpath version that takes no schema references.
Our revision of Simon's revision can be downloaded from here
The change made is very subtle. Simon had this as the first part
Simon used Oracle's get LOB fileopen to get the xml file into the db which he calls from an Oracle stored function. When you think about the closest
parallel in PostgreSQL, I would say its the lo_* functions that allow import export of files into the db, though that only allows you to import and export files and not read the file.
There is also the pg_read_file which does what we want, but can only read files from the PostgreSQL init cluster. Of course their are other ways. You could use
perl or python or some other language such as PLPerlU that has system file access.
For now we'll just create a folder called gpxdir in the PostgreSQL cluster. You can determine the location of your cluster by running as super user
SELECT name, setting
Now we'll create a function to mirror Simon's getClobDocument, except instead of calling it getClobDocument, we'll call it getXMLDocument because it will return an XML object instead of a CLOB object.
Please note -- our getXMLDocument function is marked as SECURITY DEFINER because
only super users can use the pg_read_file, so to allow regular users access to this, we have this run in the postgres context and then can give rights to this function to those users we want to who may not have super user rights.
--createthefunctiontoloadxmldocCREATEORREPLACEFUNCTIONgetXMLDocument(p_filename charactervarying)RETURNS xml AS
---wesettheendreadtosomebignumber--becausewearetoolazytograbthelength--anditwillcutofattheEOFanywaySELECTCAST(pg_read_file(E'gpxdir/'|| $1 ,0, 100000000)As xml);
LANGUAGE 'sql' VOLATILE SECURITY DEFINER;
ALTERFUNCTIONgetxmldocument(charactervarying) OWNER TO postgres;
Now to use this function we simply do:
Copy the gpxtestrevised.gpx file into the gpxdir and call the below
Next we'll create a table similar to what Simon has called gpx and stuff our xml in there
--insertxmldocINSERTINTOgpx(object_name, object_value)VALUES('gpxtestrevised.gpx', getXMLDocument('gpxtestrevised.gpx'));
Unfortunately PostgreSQL even in 8.4 is not as rich as Oracle's offering for XMLDB and doesn't have all
that fancy validation schema stuff, though if you try to pull an obviously malformed XML document with getXMLDocument it will
tell you you are missing tags and so forth. So we are skipping that section of Simon's and moving straight to the fun part.
--GetthemetadatanameSELECT(xpath('/gpx/metadata/name/text()', g.object_value))As metadataname
FROM GPX As g;
Manila to Mt. Pinatubo
(xpath('/gpx/metadata/desc/text()', g.object_value))as Description,
(xpath('/gpx/metadata/copyright/year/text()', g.object_value))as Copyright_Year,
(xpath('/gpx/metadata/copyright/license/text()', g.object_value))as Copyright_License,
(xpath('/gpx/metadata/link/@href', g.object_value))as Hyperlink,
(xpath('/gpx/metadata/link/text/text()', g.object_value))as Hyperlink_Text ,
(xpath('/gpx/metadata/link/time/text()', g.object_value))as Document_DateTime ,
(xpath('/gpx/metadata/link/keywords/text()', g.object_value))as keywords ,
(xpath('/gpx/metadata/bounds/@minlon', g.object_value))as MinLong,
(xpath('/gpx/metadata/bounds/@minlat', g.object_value))as MinLat,
(xpath('/gpx/metadata/bounds/@maxlon', g.object_value))as MaxLong,
(xpath('/gpx/metadata/bounds/@maxlat', g.object_value))as MaxLat
FROM GPX AS g;
--thesamestuffSimongotname| description | copyright_year |
copyright_license | hyperlink | hyperlink_text |
document_datetime | keywords | minlong | minlat | maxlong | maxlat
Manila to Mt. Pinatubo | This is test data for gpx2shp. |2004| http
://gpx2shp.sourceforge.jp | http://gpx2shp.sourceforge.jp | Toshihiro Hiraoka |||-180.0|-90.0|179.9|90.0(1 row)
And now for the finale -- we shall pull the way points just as Simon did
--Lets extract way points (Simon's is a bit shorter)
-- (the offset here is an ugly hack to force Postgres to use our xml value instead of recopying it as
-- suggested by a commenter to our previous post
--note we were using order by before but OFFSET though still ugly seems cleaner --
--With the offset hack -- this finishes in 895ms. Without offset hack it takes about 3182 ms (~3 seconds)SELECTCAST((xpath('/wpt/name/text()', wayp.pt))Asvarchar(20))AsName,
CAST(CAST((xpath('/wpt/@lon', wayp.pt))Asvarchar)Asnumeric)As longitude,
CAST(CAST((xpath('/wpt/@lat', wayp.pt))Asvarchar)Asnumeric)As latitude,
CAST(CAST((xpath('/wpt/ele/text()', wayp.pt))Asvarchar)Asnumeric)As Elevation
FROM GPX As g WHEREobject_name='gpxtestrevised.gpx')As it
FROM GPX WHEREobject_name='gpxtestrevised.gpx')As g OFFSET 0)As wayp;
Thanks for an interesting article. I've found that when the xml file has a unnamed default name space you can define a named one in the third parameter (e.g. 'aaa') of the xpath function and use it in the first parameter (e.g. '/aaa:gpx'). e.g.