Sunday, September 18. 2011
We attended FOSS4G this year in Denver, Colorado. Friday was a PostGIS bonanza with 5 PostGIS talks back to back including ours. The crowd was huge. All the PostGIS talks as I recall were so packed that there were not enough seats to accommodate everyone. A more comprehensive detail of the events is described on OpenGeo FOSS4G Day #5
We admit to overstuffing our slides with SQL and ran short on time at the end. Leo complained and vowed to do a better job next time. We really weren't expecting such a large crowd. Admittedly I'm all for the after conference experience which is much longer than the conference which is why I tend to make slides that are very dense. WARNING: The following slides feature SQL doing unconventional things suitable only for mature audiences. Viewer discretion is adviced.. You can check out our slides here PostGIS 2.0 the new stuff.
Continue reading "FOSS4G 2011 PostGIS the new stuf"
Sunday, September 04. 2011
We've been working on converting some of our SQL Server apps to PostgreSQL. In this article we'll describe some things to watch out for and provide a function we wrote to automate some of the conversion.
Although both databases are fairly ANSI-SQL compliant, there are still differences with their CREATE TABLE statements, data types, and how they handle other things that makes porting applications not so trivial.
Continue reading "SQL Server to PostgreSQL: Converting table structure"
Friday, August 26. 2011
A while back in New Additions and Promotions in PostGIS Development Team, I mentioned that the new addition to our team Bborie Park was working on image output functions for raster support, among other things. His last addition was ST_AsRaster which allows a PostGIS geometry to cross the line to the raster world, all in the database. This new addition almost completes the basic cycle of making PostGIS not only a spatial analytical tool, but also a rendering engine.
To test out these new functions, I whipped up a quick ASP.NET/JQuery app as described in Minimalist Web-based ASP.NET PostGIS 2.0 Spatial Geometry/Raster Viewer, and Bborie followed up with the PHP version which you can download from http://www.postgis.us/downloads/postgis_webviewer_php.zip.
There is still much room for improvement, e.g. intersection of 2 rasters, faster response, etc, but I can see all the lights flickering and the connections coming together like a self-orchestrating organism. From chaos comes order.
Wednesday, August 24. 2011
The Free and Open Source Software for Geospatial (FOSS4G) is schedule for September 12-16, 2011 in Denver, CO. PostGIS is going to be making a big showing at this event. Paul Ramsey's popular Introduction to PostGIS workshop is already sold out. Check out the schedule of other PostGIS related talks FOSS4G 2011 PostGIS related talks.
We'll be presenting on Friday PostGIS 2.0, the new stuff and showcasing some of the new features in upcoming PostGIS 2.0. In fact Friday seems to be a day jam packed with PostGIS talks back to back in the Windows room. We probably won't even have to leave the room to get our fill of PostGIS.
I'm particularly looking forward to Steven Singer's PostGIS replication talk and Jim Mlodgenski's Scaling PostGIS Queries with Stado since these are becoming critical areas as we take on larger and more complex work.
Sunday, August 14. 2011
One of the new features I'm excited about in upcoming PostgreSQL 9.1 are extensions. It is also my hope that for PostGIS 2.0, we'll be able to package PostGIS 2.0 as an extension. Reinspired in my mission by David Wheeler's recent post and video on Building and Distributing Extensions without C, I decided to take some time to investigate how all the extension pieces fit together.
The three things I like most about extensions are:
Of course the ease is all in the thoughtfulness of the packaging. To get some ideas of how we would go about packaging PostGIS 2.0 as an extension (it could very well be 3 extensions if we decide to package the core postgis, raster, and topology (and even tiger geocoder) as separate extensions), I thought I would take a look at how others have packaged theirs, and how one goes about registering one of these packages to make it available in CREATE EXTENSION.
Figuring out the extensions you have available ready to install
First I decided to start by doing a little snooping, by applying some lessons from our previous article Querying table, view, column and function descriptions I wrote this query to figure out what useful functions are available to learn about extensions.
Continue reading "PostgreSQL 9.1 Exploring Extensions"
Sunday, July 31. 2011
One of the biggest complaints about software and perhaps even more specifically FLOSS software is the lack of documentation. I'm not talking about those small little dialogs we throw in our code that rarely see the light of day. I'm talking about stuff you throw in user docs or specifications and so forth that an analyst or end-user reads.
The main reasons for this lack of documentation is that while everyone seems to want documentation, few really want to pay for it, and it's time consuming to keep documentation in synch with what the software actually does. Having documentation that is very inconsistent with the software is worse than not having any documentation at all. A good way to keep documentation up to date is to fold it into the process of developing and testing the software such that changes in software expose errors in the documentation and vice-versa or that the act of changing the software corrects the documentation.
Part of the way we try to do that on the PostGIS project is to require each function patch to include documentation. We also try to make our documentation executable by auto-generating PostgreSQL help and some test suites from the documentation. For example the process of installing documentation-generated function and type descriptions in PostgreSQL catches inconsistencies between the implementation and what we actually publish about how the functions work. Even the image generator that builds most of the images in the manual is built with PostGIS as a core of it so that when our build-bot is rebuilding the documentation it is exercising some PostGIS output functions.
I can't tell you how many times I've flagged changes in the current PostGIS 2.0 release simply by trying to install the docbook generated PostgreSQL comment descriptions and PostgreSQL complains that the function signature no longer exists that the documentations says should be there. So then I go back and correct the documentation or yell at the programmer if the documentation makes more sense than what they coded. On the other side, its easy to catch what we neglected to document simply by scanning the functions in PostgreSQL and seeing which ones don't have descriptions.
Okay as far as databases goes, many a DB Programmer/Analyst has waltzed into a project only to be befuddled about the meanings of all these tables used by the application. If you are like us, you don't even want to look at any database documentation that is not part of the definition of the database because you know 99% of the time it's so obsolete or was just a pipe dream of someone working in a vacuum that its more useless than not having any documentation at all. It is nice to have nicely formatted documentation you can read separate from the code, but even sweeter if you can query the documentation just as easily as you can query the data.
A good way of having up to date documentation is to weave it in as part of the process of defining the structure. For databases this means using foreign keys, primary keys, and using the commenting features that most relational databases offer these days. For example the databases we commonly work with, PostgreSQL, SQL Server, MS Access, and even MySQL all allow you to provide descriptions for tables, table columns, and sometimes other objects such as functions and stored procs right in the database. PostgreSQL even allows you to provide descriptions of columns in views though that's a bit messier to do. Sadly there isn't a consistent way of pulling these descriptions out of the database that will work for all of these. Each has differently defined meta tables it stores these descriptions in. For thise article, we'll demonstrate how to pull this information from PostgreSQL.
PostgreSQL does offer many useful switches in psql for querying this data, but we'll focus our attention on pulling this data via SQL. It's much easier to incorporate this information in auto-generated documentation with SQL because you can have more control what you include and how to format it.
Continue reading "Querying table, view, column and function descriptions"
Monday, June 27. 2011
I'm one of those old-fashioned folks that debugs with print lines and raise notices. They're nice. They always work, you can put clock time stops in there and don't require any fancy configuration. At a certain point you do have to pull out a real debugger to see what is going on. This often happens when your one-liners are no longer good enough and now you have to write 20 liners of plpgsql code.
Such is the case with geocoding and the PostGIS tiger geocoder specifically. Lots of interest has revived on that with people submitting bug reports and we've got paying clients in need of a fairly easy and speedy drop-in geocoder that can be molded to handle such things as road way locations, badly mis-spelled real estate data, or just simply to get rid of their dependency on Google, Yahoo, MapQuest, ESRI and other online or pricey geocoding tools. So I thought I'd take this opportunity to supplement our old-fashioned debugging with plpgsqldebugger goodness. In this article, we'll show you how to configure the plpgsql debugger integrated in PgAdmin and run with it.
Continue reading "Using PgAdmin PLPgSQL Debugger"
Thursday, June 16. 2011
PostGIS 2.0.0 has inched a lot closer to completion. This past week, Paul enabled his gserialization work which changed the on disk-format of PostGIS and in return I think we'll have a much better platform to grow on. With this change we now have the 3D index and bounding box bindings in place. Say hello to the &&& operator which is like &&, but is 3D aware and comes with its own companion 3D spatial indexes. This will allow you to do true 3D bounding box searches with any of the new 2.5/3D geometries we have in place for PostGIS 2.0.0. We are still noodling out the semantics of boxes. Read Paul's call for action on the The Box Plan?, if you are interested. PostgreSQL 8.4 is the lowest supported version for PostGIS 2.0.0. It took a bit of squabbling between PSC members to make that decision, but I put my foot down and I think in the end was for the best to allow us to use new features, less platforms to test, and get rid of some unnecessary code.
PostGIS Windows 32-bit Experimental builds fresh off the presses
With all these changes, if you are running an earlier alpha release of PostGIS 2.0.0, you'll need to do a dump restore since the on disk format is now changed.
If you are on windows and want to give some of this all a test drive, you can download one of our PostGIS 2.0.0 Windows experimental builds. We still only have 32-bit builds. We have builds for PostgreSQL 8.4, PostgreSQL 9.0, and PostgreSQL 9.1 beta 2. The problems we faced in PostgreSQL 9.1 beta 1 were resolved in beta 2 so that most regress tests past except some minor ones involving stupid things like difference in line number marking of errors. Complement your PostgreSQL 9.1 beta 2 meal with a yummy large helping of PostGIS 2.0.0 goodness.
Continue reading "State of PostGIS and PostGIS Windows binaries for 9.1 9.0 and 8.4"
Monday, June 06. 2011
Recommended Books: PostgreSQL 9.0 Volume 1 SQL Reference PostgreSQL 9 High Performance PostGIS in Action
There once existed programmers who were asked to explain this snippet of code:
I find this episode interesting because while the Lisp programmer I feel is more right, the Smalltalk programmer has managed to follow the rest of the crowd and still stick
to her core principle. This brings us to what does this have to do with trigrams
in PostgreSQL 9.1. Well just like
There are 2 main enhancements happening with trigrams in PostgreSQL 9.1 both of which depesz has already touched on in FASTER LIKE/ILIKE and KNNGIST. This means you can have an even faster trigram search than you ever have had before and you can do it in such a fashion that doesn't require any PostgreSQL trigram specific syntactical expressions. So while PostgreSQL 9.1 might be understanding LIKE much like all the other databases you work with, if you have a trigram index in place, it will just be doing it a little faster and sometimes a lot faster using the more clever PostgreSQL 9.1 planner. This is one example of how you can use applications designed for many databases and still be able to utilize advanced features in your database of choice. In this article we'll demonstrate.
For this example we'll use a table of 490,000 someodd records consisting of Massachusetts street segments and their names excerpted from TIGER 2010 data. You can download the trimmed data set from here if you want to play along.
Continue reading "PostgreSQL 9.1 Trigrams teaching LIKE and ILIKE new tricks"
Friday, June 03. 2011
PostgreSQL 8.4 introduced the ability to create user-defined variadic functions. These are basically functions that take as input an undefined number of arguments where the argument that is an undefined number are all of the same type and are the last input arguments. Depesz went over it two years ago in Waiting for 8.4 variadic functions, so we are a bit late to the party. In a nutshell -- variadic functions are syntactic sugar for functions that would otherwise take arrays. In this article we'll provide some more demonstrations of them to supplement Depesz article.
I was reminded that I had never explored this feature, when recently documenting one of the new PostGIS 2.0 Raster functions - ST_Reclass which employs this feature. I think ST_Reclass is a superb function and one of my favorite raster functions thus far that I hope to put to good use soon. Our new PostGIS family member,Bborie Park, is running thru our PostGIS Raster milestones much faster than I had dreamed. He's already implemented a good chunk of stuff we discussed in Chapter 13 - PostGIS Raster and had stated you probably won't see in PostGIS 2.0. He's going a bit faster than I can catalog them, so the documentation is already embarrassingly behind the fantastic functionality that is already present in PostGIS 2.0.
Continue reading "Variadic Functions in PostgreSQL"
Since others have shared their PostgreSQL Conference notes on Planet PostgreSQL: Josh Berkus, Blake Crosby, Selena Deckelman, Dmitri Fontaine, Bruce Momjian, Gabrielle Roth, Andreas Scherbaum, and Greg Smith, I thought I'd do my civic duty and add Paul Ramsey's notes to the mix. His are on his corporate OpenGeo blog which is carried by Planet Geospatial and Planet OSGeo but not by Planet PostgreSQL. One thing I admire about Paul is how easily he lets himself be changed by his environment. Sometimes you have to be a little careful what you say to him since he sometimes takes your comments a little too much to heart and changes a little bit more than you had intended. Anyrate here are his notes: PgCon Notes #1, PgCon Notes #2, PgCon Notes #3.
Sunday, May 22. 2011
Recommended Books: PostgreSQL 9.0 Volume 1 SQL Reference PostgreSQL 9 High Performance PostGIS in Action
Question: How do you deal with bad stat counts?
You have a large table and the default planner stats collector underestimates distinct counts of a critical query column thus resulting in much less than optimal query plans. How can you manually set this?
PostgreSQL 9.0 introduced ability to set two settings on table columns: n_distinct and n_distinct_inherited which are described a bit in ALTER TABLE help.
The n_distinct is the estimated number of distinct values for that column with -1 or any negative number representing a percentage of estimated table count instead of a true count.
n_distinct_inherited is a setting useful for parent tables that denotes the estimated distinct count sum of all a parent's child tables.
By tweaking these settings when they are less than optimal, you can influence the query planner to produce better plans. Why this is necessary is mostly for large tables where the stat collector will not query the whole table to determine stats. The stats collector generally queries at most 10-30% of a table.
Determine If you need to set counts
It's always nice to have the stat collector do all these things for you especially if you have a table that is constantly updated and distinct counts can fluctuate a lot. For static tables you may just want to set them manually. So how do you know whether you should bother or not. Well you can check the current values the stats collector has with this query:
You would then compare with your actuals
Will give you the current count.
Setting n_distinct and n_distinct_inheritedYou may want to bump this up or down when you set the value. Next to set the column distinct count stats you would do something like below replacing 50 with the count you computed:
Monday, May 16. 2011
This past week our PostGIS Project Steering Committee has gotten a wee bit bigger with the addition of Sandro Santilli and Chris Hodgson. So now we are 5 people strong. Though we have drastically different opinions on things, I think we all have the best interest of PostGIS users in mind such that the difference creates a healthy compromise in perspectives.
Our PostGIS development team has gotten a new addition as well. We have Bborie Park hailing from UC Davis Center for Vectorborne Diseases helping out on the raster front. You might have seen him on the PostgreSQL news groups asking questions. He is currently working on raster image export functions, so that you can do things like ST_AsPNG(rast,...) right from the database. In addition he is also working on raster statistics functions like histograms, mean, minmax, stddev, reclass functions.
Bborie, if you get some of this in for 2.0, I promise to help document these new functions and to write an ASP.NET and PHP application/tutorial that flaunts some of them.
Bborie just committed all these functions to PostGIS code base. I'm starting to add them to the documentation and start testing them now. We'll release a windows experimental build with these in them in the next couple of days. Yee Pee!
Continue reading "New Additions and Promotions in PostGIS Development Team"
Tuesday, May 10. 2011
Recommended Books: PostGIS in Action PostgreSQL 9.0 Manual - Volume 1A: SQL Reference PostgreSQL 9.0 Reference - Volume 1B: SQL Command Reference
PostgreSQL 9 High Performance / Admin Cookbook combo
What is the difference between CURRENT_TIMESTAMP and clock_timestamp()Answer:
CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction, you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record. One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get a sense of what a problem address looks like. So I wrote this query:
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start ------------------+------------+-------+------- +-------------------------------------------+--------------+------------------ 48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032 15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
Friday, May 06. 2011
Recommended Books: PostgreSQL 9.0 Manual - Volume 1A: SQL Reference PostgreSQL 9.0 Reference - Volume 1B: SQL Command Reference
PostgreSQL 9 High Performance / Admin Cookbook combo PostGIS in Action
We've started to play with PostgreSQL 9.1beta and the PgAdmin III 1.14.0 Beta 1. We'll briefly go over the cool gems found in PgAdmin III beta 1. Most of the new features are for navigating the upcoming PostgreSQL 9.1. Well first obstacle we ran into was we can't get our favorite extension, PostGIS, to compile against PostgreSQL 9.1beta though it did with the alphas, so you won't be seeing any windows experimental builds until we resolve this issue. Details of ticket here? PostGIS 2.0 won't compile for PostgreSQL 9.1 beta1
Despite that minor set back, we decided to push on and navigate the new features by using PgAdmin III 1.14.0 as our Tour Guide. Below is a list of new features you can experience via PgAdmin III 1.14.0 Beta 1. I'm sure there are more we missed, but these are the ones that were most flashing.
Continue reading "Navigating PostgreSQL 9.1 beta 1 with PgAdmin III 1.14.0 Beta 1"
Syndicate This Blog
Show tagged entries