Monday, September 09. 2013
PostgreSQL 9.3 came out today and we are excited to start using the new features. PostGIS 2.1.0 came out about 2 weeks ago and pgRouting just cut the RC 3 release. For windows PostGIS users who are impatient to try the new suite, binaries can be found on the Unreleased versions of PostGIS.net windows page.
We are planning an official release sometime probably next week on StackBuilder. We are waiting for release of pgRouting 2.0 before we do which should be out next week. This new 2.1 release will be dubbed the PostGIS 2.1 Bundle since it will have more than just PostGIS. It will include postgis extensions (postgis which includes geometry,raster, geography) , postgis_topology, postgis_tiger_geocoder), address_standardizer extension (a companion to tiger geocoder), and pgRouting 2.0.
For those people running PostGIS 2.0 and PostgreSQL 9.0+, especially (raster and geography) users, I highly recommend you jump to PostGIS 2.1. PostGIS 2.1 is a soft upgrade from 2.0. For raster there are enormous speed improvements and new functions. The ones we are most excited about in raster are the much much much faster ST_Clip and ST_Union functions (which now does multi-band in addition to being faster). These two functions are highly important since they are generally the first step in many raster workflows. Geography has speed improvements for point in poly and a ST_Segmentize function done on the spheroid (important for long range). Geometry has a couple of new functions. The Enhanced 3D functionality provided by SFCGAL is brand new and probably won't be distributed by many package maintainers until PostGIS 2.2 where it will garner a few more features and stability improvements.
Continue reading "PostGIS 2.1 windows bundle"
Saturday, August 17. 2013
The 2.1.0 release of PostGIS is now available.
The PostGIS development team is proud to release PostGIS 2.1.0. As befits a minor release, the focus is on speed improvements, more features, and bug fixes.
If you are currently using PostGIS 2.0+, you can go the soft upgrade path:
ALTER EXTENSION postgis UPDATE TO "2.1.0";If you are running 2.1.0 of beta or an unreleased version (and the above step fails with error “extension postgis has no update path from version x.x.x to version 2.1.0”) , you need to first copy the file in share\extensions\postgis—2.0.3—2.1.0.sql and change the 2.0.3 to the x.x.x noted in the error you are running. Then follow the above upgrade step again.
Users of 1.5 and below will need to go the hard-upgrade path documented in manual: http://postgis.net/docs/manual-2.1/postgis_installation.html#hard_upgrade
Best served with a bottle of GEOS 3.4.1 and PostgreSQL 9.3beta2 (planned release September).
Continue reading "PostGIS 2.1.0 released"
Sunday, March 17. 2013
Recommended Books: PostGIS in Action 2nd Edition
We have all the setup executables for PostgreSQL 8.4-9.2 32-bit and PostgreSQL 9.1-9.2 x64-bit for PostGIS 2.0.3. We'll have the 9.0x64-bit ones up shortly but do have the binaries.
We are planning to release these to stackbuilder sometime next week, but would like some feedback first with any issues people run into.
One person already reported a serious bug in our original packaging which we fixed. If you already installed 2.0.3 the x64-bit version or any of the 9.2 versions (both x32 and x64) that was not marked as -2.zip, -2.exe, you'll want to reinstall. The 8.4-9.1 32-bits were not affected so just have a -1.
Packages can be downloaded from:
General details on what is included:
As always if you are installing for PostgreSQL x64 make sure to pick a zip or installer with 64 in the name. The unmarked or marked with 32 are just for PostgreSQL 32-bit.
Friday, June 08. 2012
I recently had the need to figure out which ranges of my keys were contiguously numbered. The related exercise is finding gaps in data as well.
Reasons might be because you need to determine what data did not get copied or what records got deleted. There are lots of ways of accomplishing this, but this is the
first that came to mind. This approach uses window aggregates
Continue reading "Finding contiguous primary keys"
Monday, January 16. 2012
If I could name a number one feature I love most about PostgreSQL, it's the table inheritance feature which we described in How to Inherit and Uninherit. A lot of people use it for table partitioning using CONSTRAINT EXCLUSION. Aside from that, in combination with PostgreSQL schema search_path (customizable by user and/or database) it makes for a very flexible abstraction tool. For example, for many of our web apps that service many departments where each department/client wants to keep a high level of autonomy, we have a schema set aside for each that inherits from a master template schema. Each department site uses a different set of accounts with the primary schema being that of the department/client so that they are hitting their own tables.
Inheritance allows us to keep data separate,do roll-up reports if we need to, use the same application front-end, and yet allows us the ability to add new columns in just one place (the master template schema). It is more flexible than other approaches because for example we may have a city organization that need to share tables, like for example a system loaded list of funding source shared across the agency. We can set aside these shared tables in a separate schema visible to all or have some have their own copy they can change if they don't want to use the shared one.
Every once in a while, we find ourselves needing to query the whole hierarchy and needing to know which table the results of the query are coming from. To help solve that issue, we employ the use of the system column tableoid which all user tables have. The tableoid is the the object id of a table. PostgreSQL has many system columns that you have to explicitly select and can't be accessed with a SELECT * with the tableoid being one of them. These are: tableoid, cmax,cmin, xmin,xmax,ctid which are all described in System Columns. The PostgreSQL docs on inheritance have examples of using it, but we thought it worthwile to repeat the exercise since it's not that common knowledge and is unique enough feature of PostgreSQL that others coming from other relational databases, may miss the treat. I've often demonstrated it to non-PostgreSQL users who use for example SQL Server or MySQL, and they literally fall out of their chair when I show the feature to them and its endless possibilities.
Continue reading "Table Inheritance and the tableoid"
Sunday, January 08. 2012
Recommended Books: PostgreSQL 9.0 Reference Manual - Volume 1A: The SQL Language Volume 1B: The SQL Language
PostgreSQL has this interesting placeholder called anyelement which it has had for a long time and its complement anyarray. They are used when you want to define a function that can handle many types arguments or can output many types of outputs. They are particularly useful for defining aggregates, which we demonstrated in Who's on First and Who's on Last and several other aggregate articles.
Anyelement / anyarray can be used just as conveniently in other functions. The main gotcha is that when you pass in the first anyelement/anyarray all subsequent anyelement / anyarray must match the same data type as the first anyelement / anyarray.
Continue reading "The wonders of Any Element"
Wednesday, December 28. 2011
For those who aren't familiar with hstore, it's a key/value storage type that is packaged as an extension or contrib in PostgreSQL 8.2+. In PostgreSQL 9.0 it got a little extra loving in several ways one of which was the introduction of the hstore(record) casting function that converts a record to an hstore. In this article, I'll demonstrate how you can use this new casting function to do very sleek mail merges right in the database. The only caveat is that it seems to only correctly name the keys if it is fed a real table or view. Derived queries such as aggregates etc get keys named f1, f2, etc.
If you are on PostgreSQL 9.1 or above installing -- hstore is just a
Continue reading "Mail Merging using Hstore"
Friday, November 11. 2011
Recommended Books: PostgreSQL 9.0 Reference Manual - Volume 1A: The SQL Language
One of my favorite tools and I think that of many folks working with GIS and other kinds of Multimedia is the GDAL/OGR suite. Though I started using it to conquer GIS ETL activities, I found myself using it for problems that are inherently not GIS at all. I talked about the GDAL OGR2OGR component a while ago in GDAL OGR2OGR for Data Loading and this time I'll talk tangentially about its raster capabilities. It is a fantastic tool for converting between various raster formats and applying various raster operations. In PostGIS world the new 2.0 raster functionality puts an SQL wrapper around much of its power. I'm not going to talk about that though except as a fleeting comment to explore later (we've got cool 2 band Map Algebra in PostGIS 2.0 to flaunt its stuff). So what does this have to do with XPathing XML data with PostgreSQL? Well that's what I'm going to talk about what to do with machine generated data that comes at you in XML format.
A lot of machine generated data is hitting us in an XML like form. I talked about GPX data and navigating that in Which by the way GDAL/OGR can load and export easily into/out of a PostGIS enabled database.
GDAL exposes another kind of machine generated data in XML format which turns out to be very useful for all kinds of things. This is Exchangeable image file format (EXIF) data. There are all kinds of random text information embedded in pictures and this varies depending on what camera is taking it. Newer cameras like the ones you have built into your iphone or android embed location based information into them sometimes like where you were standing when you took the picture. Most cameras these days embed the time the picture was taken. This information is important because if you are taking electronic notes while you are snapping your pictures, it provides an easy way to match up your notes with the picture about the object. So what does this EXIF info look like when you point GDAL at it? We'll see.
Continue reading "XPathing XML data with PostgreSQL"
Tuesday, November 01. 2011
Ever have the need to create a holding table say spreadsheet data with say 100 columns. You need to create a table to hold this stuff. Or perhaps you were feeling in a sadist mood and wanted to abuse your PostgreSQL database to see how many columns you can create in a table of a specific data type. Here is a quick script to do it:
Both variants will return output that looks like this:
CREATE TABLE data_import(field1 varchar(255),field2 varchar(255),field3 varchar(255),field4 varchar(255) ,field5 varchar(255),field6 varchar(255),field7 varchar(255) ,field8 varchar(255),field9 varchar(255),field10 varchar(255));
Now if you want it to also execute because you are running it as part of an sql script, you could wrap it in an anonymous function.
Wednesday, October 05. 2011
In most release notices, it's the big shiny sexy features that get all the glamor, but in reality on day to day use
it's the small usability enhancements that make the most difference. I'm reminded about this now that I'm working
on upgrade scripts and extensions for PostGIS. There are a couple of new features that make application upgrades easier that I
regret not having in older versions of PostgreSQL we support and additional ones I had in other databases that I find lacking in PostgreSQL. PostgreSQL 8.2 for example brought us
In 9.1 we got two new DDL commands not much talked about that I am very excited about.
I know it sounds like I'm complaining. That's because I am. Honestly though, I think the first step to caring about something is really taking notice of its flaws and wanting to change them. The strength of an open source project is the ease with which it allows its developers and users to have a great impact on its direction. This is something I do think PostgreSQL excels much much better than most open source projects. I find a ton of flaws in PostGIS I'd like to change and have and I am greatful that PostGIS, like PostgreSQL is not resistant to change if the community wants it. If you are going to take notice of flaws in other products without admitting to your own or admitting that some things are easier in other products and learning from them, then you are a hypocrite or living in a closet. Now getting back to my complaining. Things I miss in PostgreSQL that I had in others which I'm sure I'm not alone.
Monday, September 26. 2011
Recommended Books: PostgreSQL 9 Admin Cookbook
UPDATE Turns out there is a simpler way of getting rid of roles that have explicit permissions to objects as Tom Lane pointed out in the comments.
Will drop the permissions to objects a user has rights to even if they don't own the objects. Of course this needs to be applied with caution since it will drop tables
and other things you don't want necessarily dropped. So it is best to first run a:
One of the things that is still tricky in PostgreSQL is permission management. Even though 9.0 brought us default privileges and the like, these permissions aren't retroactive so still a pain to deal with if you already have objects defined in your database.
One of the annoyances we come across with is deleting roles. Lets say you have a role and it has explicit permissions to an object. PostgreSQL won't allow you to delete this role if it owns objects or has explicit permissions to objects. In order to delete it seems you have to go in and clear out all those permissions. To help with that -- we wrote a quickie script that will generate a script to revoke all permissions on objects for a specific role. It looks like this:
Continue reading "Bulk Revoke of Permissions for Specific Group/User role"
Monday, June 27. 2011
I'm one of those old-fashioned folks that debugs with print lines and raise notices. They're nice. They always work, you can put clock time stops in there and don't require any fancy configuration. At a certain point you do have to pull out a real debugger to see what is going on. This often happens when your one-liners are no longer good enough and now you have to write 20 liners of plpgsql code.
Such is the case with geocoding and the PostGIS tiger geocoder specifically. Lots of interest has revived on that with people submitting bug reports and we've got paying clients in need of a fairly easy and speedy drop-in geocoder that can be molded to handle such things as road way locations, badly mis-spelled real estate data, or just simply to get rid of their dependency on Google, Yahoo, MapQuest, ESRI and other online or pricey geocoding tools. So I thought I'd take this opportunity to supplement our old-fashioned debugging with plpgsqldebugger goodness. In this article, we'll show you how to configure the plpgsql debugger integrated in PgAdmin and run with it.
Continue reading "Using PgAdmin PLPgSQL Debugger"
Thursday, June 16. 2011
Recommended Books: PostGIS in Action
PostGIS 2.0.0 has inched a lot closer to completion. This past week, Paul enabled his gserialization work which changed the on disk-format of PostGIS and in return I think we'll have a much better platform to grow on. With this change we now have the 3D index and bounding box bindings in place. Say hello to the &&& operator which is like &&, but is 3D aware and comes with its own companion 3D spatial indexes. This will allow you to do true 3D bounding box searches with any of the new 2.5/3D geometries we have in place for PostGIS 2.0.0. We are still noodling out the semantics of boxes. Read Paul's call for action on the The Box Plan?, if you are interested. PostgreSQL 8.4 is the lowest supported version for PostGIS 2.0.0. It took a bit of squabbling between PSC members to make that decision, but I put my foot down and I think in the end was for the best to allow us to use new features, less platforms to test, and get rid of some unnecessary code.
PostGIS Windows 32-bit Experimental builds fresh off the presses
With all these changes, if you are running an earlier alpha release of PostGIS 2.0.0, you'll need to do a dump restore since the on disk format is now changed.
If you are on windows and want to give some of this all a test drive, you can download one of our PostGIS 2.0.0 Windows experimental builds. We still only have 32-bit builds. We have builds for PostgreSQL 8.4, PostgreSQL 9.0, and PostgreSQL 9.1 beta 2. The problems we faced in PostgreSQL 9.1 beta 1 were resolved in beta 2 so that most regress tests past except some minor ones involving stupid things like difference in line number marking of errors. Complement your PostgreSQL 9.1 beta 2 meal with a yummy large helping of PostGIS 2.0.0 goodness.
Continue reading "State of PostGIS and PostGIS Windows binaries for 9.1 9.0 and 8.4"
Sunday, May 22. 2011
Recommended Books: PostgreSQL 9.0 Volume 1 SQL Reference PostgreSQL 9 High Performance PostGIS in Action
Question: How do you deal with bad stat counts?
You have a large table and the default planner stats collector underestimates distinct counts of a critical query column thus resulting in much less than optimal query plans. How can you manually set this?
PostgreSQL 9.0 introduced ability to set two settings on table columns: n_distinct and n_distinct_inherited which are described a bit in ALTER TABLE help.
The n_distinct is the estimated number of distinct values for that column with -1 or any negative number representing a percentage of estimated table count instead of a true count.
n_distinct_inherited is a setting useful for parent tables that denotes the estimated distinct count sum of all a parent's child tables.
By tweaking these settings when they are less than optimal, you can influence the query planner to produce better plans. Why this is necessary is mostly for large tables where the stat collector will not query the whole table to determine stats. The stats collector generally queries at most 10-30% of a table.
Determine If you need to set counts
It's always nice to have the stat collector do all these things for you especially if you have a table that is constantly updated and distinct counts can fluctuate a lot. For static tables you may just want to set them manually. So how do you know whether you should bother or not. Well you can check the current values the stats collector has with this query:
You would then compare with your actuals
Will give you the current count.
Setting n_distinct and n_distinct_inheritedYou may want to bump this up or down when you set the value. Next to set the column distinct count stats you would do something like below replacing 50 with the count you computed:
Tuesday, May 10. 2011
Recommended Books: PostGIS in Action PostgreSQL 9.0 Manual - Volume 1A: SQL Reference PostgreSQL 9.0 Reference - Volume 1B: SQL Command Reference
PostgreSQL 9 High Performance / Admin Cookbook combo
What is the difference between CURRENT_TIMESTAMP and clock_timestamp()Answer:
CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction, you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record. One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get a sense of what a problem address looks like. So I wrote this query:
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start ------------------+------------+-------+------- +-------------------------------------------+--------------+------------------ 48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032 15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
(Page 1 of 4, totaling 49 entries) » next page
Syndicate This Blog
Show tagged entries