Postgres OnLine Journal: July 2011 / August 2011
An in-depth Exploration of the PostgreSQL Open Source Database
 

Table Of Contents

From the Editors
PostgreSQL Q & A
Using PostgreSQL Extensions
Application Development

From the Editors

 

Free and Open Source Software for Geospatial - FOSS4G 2011- PostGIS galore



The Free and Open Source Software for Geospatial (FOSS4G) is schedule for September 12-16, 2011 in Denver, CO. PostGIS is going to be making a big showing at this event. Paul Ramsey's popular Introduction to PostGIS workshop is already sold out. Check out the schedule of other PostGIS related talks FOSS4G 2011 PostGIS related talks.

We'll be presenting on Friday PostGIS 2.0, the new stuff and showcasing some of the new features in upcoming PostGIS 2.0. In fact Friday seems to be a day jam packed with PostGIS talks back to back in the Windows room. We probably won't even have to leave the room to get our fill of PostGIS.

I'm particularly looking forward to Steven Singer's PostGIS replication talk and Jim Mlodgenski's Scaling PostGIS Queries with Stado since these are becoming critical areas as we take on larger and more complex work.


PostgreSQL Q & A

 

Querying table, view, column and function descriptions Beginner



One of the biggest complaints about software and perhaps even more specifically FLOSS software is the lack of documentation. I'm not talking about those small little dialogs we throw in our code that rarely see the light of day. I'm talking about stuff you throw in user docs or specifications and so forth that an analyst or end-user reads.

The main reasons for this lack of documentation is that while everyone seems to want documentation, few really want to pay for it, and it's time consuming to keep documentation in synch with what the software actually does. Having documentation that is very inconsistent with the software is worse than not having any documentation at all. A good way to keep documentation up to date is to fold it into the process of developing and testing the software such that changes in software expose errors in the documentation and vice-versa or that the act of changing the software corrects the documentation.

Part of the way we try to do that on the PostGIS project is to require each function patch to include documentation. We also try to make our documentation executable by auto-generating PostgreSQL help and some test suites from the documentation. For example the process of installing documentation-generated function and type descriptions in PostgreSQL catches inconsistencies between the implementation and what we actually publish about how the functions work. Even the image generator that builds most of the images in the manual is built with PostGIS as a core of it so that when our build-bot is rebuilding the documentation it is exercising some PostGIS output functions.

I can't tell you how many times I've flagged changes in the current PostGIS 2.0 release simply by trying to install the docbook generated PostgreSQL comment descriptions and PostgreSQL complains that the function signature no longer exists that the documentations says should be there. So then I go back and correct the documentation or yell at the programmer if the documentation makes more sense than what they coded. On the other side, its easy to catch what we neglected to document simply by scanning the functions in PostgreSQL and seeing which ones don't have descriptions.

Okay as far as databases goes, many a DB Programmer/Analyst has waltzed into a project only to be befuddled about the meanings of all these tables used by the application. If you are like us, you don't even want to look at any database documentation that is not part of the definition of the database because you know 99% of the time it's so obsolete or was just a pipe dream of someone working in a vacuum that its more useless than not having any documentation at all. It is nice to have nicely formatted documentation you can read separate from the code, but even sweeter if you can query the documentation just as easily as you can query the data.

A good way of having up to date documentation is to weave it in as part of the process of defining the structure. For databases this means using foreign keys, primary keys, and using the commenting features that most relational databases offer these days. For example the databases we commonly work with, PostgreSQL, SQL Server, MS Access, and even MySQL all allow you to provide descriptions for tables, table columns, and sometimes other objects such as functions and stored procs right in the database. PostgreSQL even allows you to provide descriptions of columns in views though that's a bit messier to do. Sadly there isn't a consistent way of pulling these descriptions out of the database that will work for all of these. Each has differently defined meta tables it stores these descriptions in. For thise article, we'll demonstrate how to pull this information from PostgreSQL.

PostgreSQL does offer many useful switches in psql for querying this data, but we'll focus our attention on pulling this data via SQL. It's much easier to incorporate this information in auto-generated documentation with SQL because you can have more control what you include and how to format it.

Question: how do you get the description for all the tables and views in your database?

Answer:

The below will just list tables with descriptions. If you want all tables listed, you would just take out the d.description > '' criteria and you can even further filter to just get descriptions for tables in a particular schema.


SELECT c.relname As tname, CASE WHEN c.relkind = 'v' THEN 'view' ELSE 'table' END As type, 
    pg_get_userbyid(c.relowner) AS towner, t.spcname AS tspace, 
    n.nspname AS sname,  d.description
   FROM pg_class As c
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = 0)
   WHERE c.relkind IN('r', 'v') AND d.description > ''
   ORDER BY n.nspname, c.relname ;

The above will output somethings like this:

tname | type |  towner  | tspace |   sname    |            description

-------+------+----------+--------+------------+-----------------------------------
 edge  | view | postgres |        | tiger_topo | Contains edge topology primitives

Question: How do you get the description for all the table/view columns in a table?

For this example we just do it for one specific table.

SELECT a.attname As column_name,  d.description
   FROM pg_class As c
    INNER JOIN pg_attribute As a ON c.oid = a.attrelid
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = a.attnum)
   WHERE  c.relkind IN('r', 'v') AND  n.nspname = 'tiger_topo' AND c.relname = 'edge'
   ORDER BY n.nspname, c.relname, a.attname ;

Which outputs something like this:

   column_name   |                  description
-----------------+----------------------------------------------------------------------------
 edge_id         | Unique identifier of the edge
 end_node        | Unique identifier of the node at the end of the edge
 geom            | The geometry of the edge
 left_face       | Unique identifier of the face on the left side of the edge when looking in the direction from START_NODE to END_NODE
 next_left_edge  | Unique identifier of the next edge of the face on the left (when looking in the direction from START_NODE to END_NODE)
                   , moving counterclockwise around the face boundary
 next_right_edge | Unique identifier of the next edge of the face on the right (when looking in the direction from START_NODE to END_NODE)
                  , moving counterclockwise around the face boundary
 right_face      | Unique identifier of the face on the right side of the edge when looking in the direction from START_NODE to END_NODE
 start_node      | Unique identifier of the node at the start of the edge

Question: How do you get the descriptions for select functions?

This example pulls all the functions in the PostGIS 2.0 topology schema that have descriptions and have to do with creation.

SELECT p.proname AS funcname,  d.description
 FROM pg_proc p
   INNER JOIN pg_namespace n ON n.oid = p.pronamespace
   LEFT JOIN pg_description As d ON (d.objoid = p.oid )
     WHERE n.nspname = 'topology' and d.description ILIKE '%creat%'
   ORDER BY n.nspname, p.proname ;

Output looks like this:

     funcname     |                      description                                                                                                           
------------------+--------------------------------------------------------------------------------------------------------------------------------
 createtopogeom   | args: toponame, tg_type, layer_id, tg_objs - Creates a new topo geometry object from topo element array - 
                    tg_type: 1:[multi]point, 2:[multi]line, 3:[multi]poly, 4:collection
 createtopology   | args: topology_schema_name, srid, tolerance, hasz - Creates a new topology schema and 
                    registers this new schema in the topology.topology table.
 createtopology   | args: topology_schema_name, srid, tolerance - Creates a new topology schema 
                       and registers this new schema in the topology.topology table.
 createtopology   | args: topology_schema_name, srid - Creates a new topology schema and registers this new schema in the topology.topology table.
 createtopology   | args: topology_schema_name - Creates a new topology schema and registers this new schema in the topology.topology table.
 st_addisonode    | args: atopology, aface, apoint - Adds an isolated node to a face in a topology and returns the nodeid of the new node. 
                    If face is null, the node is still created.
 st_inittopogeo   | args: topology_schema_name - Creates a new topology schema and registers this new schema 
                    in the topology.topology table and details summary of process.
 st_modedgesplit  | args: atopology, anedge, apoint - Split an edge by creating a new node along an existing edge, 
                    modifying the original edge and adding a new edge.
 st_newedgessplit | args: atopology, anedge, apoint - Split an edge by creating a new node along an existing edge, 
                    deleting the original edge 
                    and replacing it with two new edges. Returns the id of the new node created that joins the new edges.

Using PostgreSQL Extensions

 

PostgreSQL 9.1 Exploring Extensions Intermediate



One of the new features I'm excited about in upcoming PostgreSQL 9.1 are extensions. It is also my hope that for PostGIS 2.0, we'll be able to package PostGIS 2.0 as an extension. Reinspired in my mission by David Wheeler's recent post and video on Building and Distributing Extensions without C, I decided to take some time to investigate how all the extension pieces fit together.

The three things I like most about extensions are:

  • It has a companion sql CREATE EXTENSION and catalog of what's installed and available right from the db, which makes installing/uninstalling relatively painless
  • Installed functions don't get backed up with data, which is really a bit of a nightmare for PostGIS folks and relief be much welcome as you can tell in Paul's PostGIS backup and restore which gets a bit more of an adventure in PostGIS 2.0 now that we have raster and topology and many more fun fun GEOS functions.
  • The ease with which you can uninstall,migrate your extension to another schema, and upgrade (in theory). There will be caveats here of course with changes that require on disk format and involve data.

Of course the ease is all in the thoughtfulness of the packaging. To get some ideas of how we would go about packaging PostGIS 2.0 as an extension (it could very well be 3 extensions if we decide to package the core postgis, raster, and topology (and even tiger geocoder) as separate extensions), I thought I would take a look at how others have packaged theirs, and how one goes about registering one of these packages to make it available in CREATE EXTENSION.

Figuring out the extensions you have available ready to install

First I decided to start by doing a little snooping, by applying some lessons from our previous article Querying table, view, column and function descriptions I wrote this query to figure out what useful functions are available to learn about extensions.

SELECT p.proname AS funcname,  d.description
 FROM pg_proc p
   INNER JOIN pg_namespace n ON n.oid = p.pronamespace
   LEFT JOIN pg_description As d ON (d.objoid = p.oid )
     WHERE n.nspname = 'pg_catalog' 
        AND (d.description ILIKE '%extension%' or p.proname ILIKE '%extension%')
   ORDER BY n.nspname, p.proname ;

-- which output this --
            funcname             |                         description
---------------------------------+-----------------------------------------------------
 pg_available_extension_versions | list available extension versions
 pg_available_extensions         | list available extensions
 pg_extension_config_dump        | flag an extension's table contents to be emitted by pg_dump
 pg_extension_update_paths       | list an extension's version update paths
SELECT c.relname As tname,c.relkind As type,  d.description
   FROM pg_class As c
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = 0)
    WHERE n.nspname = 'pg_catalog' 
        AND (d.description ILIKE '%extension%' 
                OR c.relname ILIKE '%extension%')
   ORDER BY n.nspname, c.relname ;


-- which output this --
              tname              | type | description
---------------------------------+------+-------------
 pg_available_extension_versions | v    |
 pg_available_extensions         | v    |
 pg_extension                    | r    |
 pg_extension_name_index         | i    |
 pg_extension_oid_index          | i    |

The most useful of the bunch, as you can probably guess is the pg_available_extensions which outputs something like this on my PostgreSQL 9.1beta3 install:

        name        | default_version | installed_version |                               comment                                
--------------------+-----------------+-------------------+----------------------------------------------------------------------
 adminpack          | 1.0             |                   | administrative functions for PostgreSQL
 autoinc            | 1.0             |                   | functions for autoincrementing fields
 btree_gin          | 1.0             |                   | support for indexing common datatypes in GIN
 btree_gist         | 1.0             |                   | support for indexing common datatypes in GiST
 chkpass            | 1.0             |                   | data type for auto-encrypted passwords
 citext             | 1.0             |                   | data type for case-insensitive character strings
 cube               | 1.0             |                   | data type for multidimensional cubes
 dblink             | 1.0             |                   | connect to other PostgreSQL databases from within a database
 dict_int           | 1.0             |                   | text search dictionary template for integers
 dict_xsyn          | 1.0             |                   | text search dictionary template for extended synonym processing
 earthdistance      | 1.0             |                   | calculate great-circle distances on the surface of the Earth
 file_fdw           | 1.0             |                   | foreign-data wrapper for flat file access
 fuzzystrmatch      | 1.0             | 1.0               | determine similarities and distance between strings
 hstore             | 1.0             |                   | data type for storing sets of (key, value) pairs
 insert_username    | 1.0             |                   | functions for tracking who changed a table
 :
 :

Now the next question on my mind, is how the heck does an extension get themselves on this list. What I will describe is a somewhat Microsoft Windows focused approach, but much of it is the same as what you would do in a Unix-based system, since I'm compiling under MingW.

Downloading the packages

While my out of the box PostgreSQL had many extensions, not all are present. Most of the ones listed were contribs of yesteryear. Many of the newer PostgreSQL extensions can be found at http://www.pgxn.org/. To get a feel for how this whole new Extension world works, I thought I'd take a stab at compiling a couple and installing them. I was able to get many of the ones I tried to compile under MingW and run in my 32-bit PostgreSQL 9.1 EnterpriseDb VC++ binaries. The ones I experimented with do not come packaged with the EnterpriseDB packaged Windows binaries and were relatively painless since they didn't have extra dependencies beyond what is required by PostgreSQL. Some others I wanted try like the twitter FDW had extra dependencies I didn't have the energy to gather and the explain analyze one I tried but it failed (though it did create the package, it failed with a plpgsql error when I went to install it with CREATE EXTENSION). Below are the ones I was successful with. It should be noted that while these are packaged as extensions, many will compile under lower versions of PostgreSQL like 8.4/9.0 and can be used like any other contrib in lower versions. You just don't get the syntactic/manageability goodness of the new EXTENSION model.

  • temporal This provides a new data type called period which is composed of two timestamps that define a range. it also includes functions such as period_intersect() that will return another period that defines the intersection of two periods. Similar in concept to PostGIS intersection function.
  • kmeans A kmeans window function implemented in C.
  • first_last_agg Similar to ourwho's first and last functions, we described but implemented in C.

Compiling them under MingW and making available in PostgreSQL

What I am describing here are my compile steps for MingW, but the steps should be essentially the same on Unix/Linux platforms, the last. Most of the extensions follow the same path, you need postgresql-develop available and path to your pg_config should be in your path (pg_config is generally located in the bin folder of your PostgreSQL install). Since we use mingW to compile PostGIS for various versions of PostgreSQL (8.2-9.1), as described Compiling PostGIS 2.0 under MingW-32 we change our paths accordingly to compile for a different versions of PostgreSQL.

So for example, to compile kmeans, we extracted the folder into C:\projects\postgresql\extensions

export PG_VER=91b3
export PGPORT=5441
export PGUSER=postgres 
export PATH="/c/projects/pg/pg${PG_VER}/bin/:$PATH"

cd /c/projects/postgresql/extensions
cd kmeans-1.1.0
make
make install

The make install will install it in your PostgreSQL install and if you were actually using your PostgreSQL compile environment, you'd be done and have the extensions available and listed in pg_available_extensions. However, for production use, we normally use the VC++ compiled PostgreSQL distributed by EnterpriseDb. Locally these seem to work fine under there.

Using MingW compiled extensions under Windows PostgreSQL VC++ builds

There are two sets of things you need to copy which is where extensions install their stuff. /share/extension (under MingW, this is share/postgresql/extension and the relevant extension dlls in the /lib/ folder in MingW this is lib/postgresql.

Copy these into the lib and share/extension folder of your regular PostgreSQL install and run this query to confirm they are available.

SELECT * FROM pg_available_extensions 
WHERE name IN('first_last_agg', 'temporal','kmeans' )
ORDER BY name;  

voila!
      name      | default_version | installed_version |                  comment                  
----------------+-----------------+-------------------+-------------------------------------------
 first_last_agg | 0.1.1           | 0.1.1             | first() and last() aggregate functions
 kmeans         | 1.1.0           |                   | k-means classification by window function
 temporal       | 0.7.1           | 0.7.1             | temporal data type and functions

Installing the extensions into database

Okay I cheated and started installing them already. To install the extensions, you use PgAdminIII add Extension feature or:

To install in a named schema

CREATE EXTENSION kmeans SCHEMA contrib;

To install in default schema

CREATE EXTENSION kmeans;

If there exist multiple versions of an extension, you'll want to use the optional VERSION clause: CREATE EXTENSION kmeans SCHEMA contrib VERSION 1.1.0;

For those windows users, who are interested in trying out these particular extensions and aren't set up for compiling -- you can download them in postgres91_32bit_extensions.zip. I'm still setting up my 64-bit desktop and hope to have 64-bit equivalents of these running in my 64-bit PostgreSQL 9.0/9.1 installs.


Application Development

 

PostGIS as a graphical engine



A while back in New Additions and Promotions in PostGIS Development Team, I mentioned that the new addition to our team Bborie Park was working on image output functions for raster support, among other things. His last addition was ST_AsRaster which allows a PostGIS geometry to cross the line to the raster world, all in the database. This new addition almost completes the basic cycle of making PostGIS not only a spatial analytical tool, but also a rendering engine.

To test out these new functions, I whipped up a quick ASP.NET/JQuery app as described in Minimalist Web-based ASP.NET PostGIS 2.0 Spatial Geometry/Raster Viewer, and Bborie followed up with the PHP version which you can download from http://www.postgis.us/downloads/postgis_webviewer_php.zip.

There is still much room for improvement, e.g. intersection of 2 rasters, faster response, etc, but I can see all the lights flickering and the connections coming together like a self-orchestrating organism. From chaos comes order.