Thursday, February 14. 2013
Continue reading "Saying Happy Valentine in PostGIS"
A while back I mentioned to Bborie (aka dustymugs) and Sandro (aka strk): We're missing ability to label our images with text.
Picture this: What if someone has got a road or a parcel of land and they want to stamp
the parcel id or the road name on it and they don't want to have to depend on some piece of mapping software. Many report writers and databases can talk to each other directly
such as we demonstrated in Rendering PostGIS raster graphics with LibreOffice
and we've got raster functionality in the database already. Can we cut out the middleman?
Some ideas came to mind. One we could embed a font lib into PostGIS thus having the ultimate bloatware minus the built-in coffee maker.
strk screeched. He's still recovering from my constant demands of having him upgrade his PostgreSQL version. Okay fair enough.
Bborie suggested why don't you import your letters as rasters and then vectorize them. So that's exactly what we are going to demonstrate in this article
and in doing so we will combine some of the new features coming in PostGIS 2.1 we've been talking about in Waiting for PostGIS 2.1 series.
Warning this article will have a hint of Rube Goldbergishness
If you are easily offended by seeing stupid tricks done with SQL, stop reading now.
We are going to take a natural vector product and rasterize it just so
we can vectorize it again so we can then rasterize it again. Don't think about it too much. It may trigger activity in parts of your brain you didn't know were there
thus resulting in stabbing pains similar to what you experience by quickly guplping down a handful of Wasabi peas.
So here are the steps for creating your own font set you can
overlay on your geometries and rasters.
See if you can spot the use of window functions and CTEs in these examples.
Wednesday, December 28. 2011
Continue reading "Mail Merging using Hstore"
For those who aren't familiar with hstore, it's a key/value
storage type that is packaged as an extension or contrib in PostgreSQL 8.2+. In PostgreSQL 9.0 it got a little extra loving in several ways one of which was the introduction
of the hstore(record) casting function that converts a record to an hstore. In this article, I'll demonstrate how you can use this new casting function to do very sleek mail merges right in the database. The only caveat is that it seems to only correctly name the keys if it is fed a real table or view. Derived queries such as aggregates etc get keys named f1, f2, etc.
If you are on PostgreSQL 9.1 or above installing -- hstore is just a
CREATE EXTENSION hstore; sql command away. If you are on a lower version of PostgreSQL,
you can usually find the hstore.sql in share/contribs.
Sunday, August 14. 2011
Continue reading "PostgreSQL 9.1 Exploring Extensions"
One of the new features I'm excited about in upcoming PostgreSQL 9.1 are extensions. It is also my hope
that for PostGIS 2.0, we'll be able to package PostGIS 2.0 as an extension.
Reinspired in my mission by David Wheeler's recent post and video on Building and Distributing Extensions without C, I decided to take some time to investigate how all the extension pieces fit together.
The three things I like most about extensions
- It has a companion sql CREATE EXTENSION and catalog of what's installed and available right from the db, which makes installing/uninstalling relatively painless
- Installed functions don't get backed up with data, which is really a bit of a nightmare for PostGIS folks and relief be much welcome as you can tell in Paul's PostGIS backup and restore
which gets a bit more of an adventure in PostGIS 2.0 now that we have raster and topology and many more fun fun GEOS functions.
- The ease with which you can uninstall,migrate your extension to another schema, and upgrade (in theory). There will be caveats here of course
with changes that require on disk format and involve data.
Of course the ease is all in the thoughtfulness of the packaging. To get some ideas of how we would go about packaging
PostGIS 2.0 as an extension (it could very well be 3 extensions if we decide to package the core postgis, raster, and topology (and even tiger geocoder) as separate extensions), I thought I would take a look at how others have packaged theirs, and how one goes about registering one of these packages
to make it available in CREATE EXTENSION.
Figuring out the extensions you have available ready to install
First I decided to start by doing a little snooping, by applying some lessons from our previous article Querying table, view, column and function descriptions
I wrote this query to figure out what useful functions are available to learn about extensions.
SELECT p.proname AS funcname, d.description
FROM pg_proc p
INNER JOIN pg_namespace n ON n.oid = p.pronamespace
LEFT JOIN pg_description As d ON (d.objoid = p.oid )
WHERE n.nspname = 'pg_catalog'
AND (d.description ILIKE '%extension%' or p.proname ILIKE '%extension%')
ORDER BY n.nspname, p.proname ;
-- which output this --
funcname | description
pg_available_extension_versions | list available extension versions
pg_available_extensions | list available extensions
pg_extension_config_dump | flag an extension's table contents to be emitted by pg_dump
pg_extension_update_paths | list an extension's version update paths
Tuesday, May 10. 2011
What is the difference between CURRENT_TIMESTAMP and clock_timestamp()
CURRENT_TIMESTAMP is an ANSI-SQL Standard variable you will find in many relational databases including PostgreSQL, SQL Server, Firebird, IBM DB2 and MySQL to name a few
that records the start of the transaction. The important thing to keep in mind about it is there is only one entry per transaction so if you have a long running transaction,
you won't be seeing it changing as you go along.
clock_timestamp() is a PostgreSQL function that always returns the current clock's timestamp. I don't think I'm alone in using it for doing simple benchmarking and other things
where for example I need to record the timings of each part of a function within the function using pedestrian RAISE NOTICE debug print statements.
There is another cool way I like using it, and that is for a batch of records each with an expensive function call, benchmarking how long it takes to process each record.
One of the things I'm working on is improving the speed of the tiger_geocoder packaged in PostGIS 2.0. The first root of attack seemed to me would be the normalize_address function
which I was noticing was taking anywhere from 10% to 50% of my time in the geocode process. That's a ton of time if you are trying to batch geocode a ton of records. The thing is
the function is very particular to how badly formed the address is so a whole batch could be held up by one bad apple and since the batch doesn't return until all are processed, it makes
the whole thing seem to take a while.
So rather than looping thru each, I thought it would be cool if I could run the batch, but for each record have it tell me how long it took to process relative to the rest so I could get
a sense of what a problem address looks like. So I wrote this query:
the_time - COALESCE(lag(the_time) OVER(ORDER BY the_time), CURRENT_TIMESTAMP) As process_time,
the_time - CURRENT_TIMESTAMP As diff_from_start
FROM (SELECT address_1, city, state, zip,
pprint_addy(normalize_address(coalesce(address_1,'') || ', ' || coalesce(city || ' ','') || state || ' ' || zip)) As pp_addr,
clock_timestamp() As the_time
FROM testgeocode LIMIT 1000) As foo )
WHERE process_time > '00:00:00.016'::interval;
Which returned an output something like this:
address_1 | city | state | zip | pp_addr | the_time | process_time | diff_from_start
48 MAIN ST .. | S.. | MA | 021.. | 48 MAIN .. | 2011-05-10 03:24:43.078-04 | 00:00:00.032 | 00:00:00.032
15 ... | | MA | 018... | 15 GREN... | 2011-05-10 03:24:50.796-04 | 00:00:00.031 | 00:00:07.75
Wednesday, March 30. 2011
I am happy to report, that the final proof of the PostGIS in Action E-Book got released today
and the printed version is scheduled for release Aprill 11th, 2011 and should be available on Amazon and other locations around then. The other e-Reader formats will come after that.
You can buy from here or download the two free chapters, if you haven't already.
Each hard-copy purchase comes with a free E-Book version. There is a coupon in the back of the book when you get it to get the E-Book versions.
Yes, I know it's been a really really long time.
On the bright side, we produced twice as much content as we had set out to do and that was with keeping things as concise as we
could get away with, still managing to cover more than we set out to cover, and stripping out as many unnecessary words as we could muster.
So 520 pages and almost 2 years later, this is where we are.
A good chunk of the additional bulk of the book was the appendices which are about 150 pages
total and focus strictly on PostgreSQL and SQL. After many comments from early reviewers, we thought it unfair not to have a good chunk of PostgreSQL
and just general relational database content to familiarize programmers and GIS folks with the RDBMS that PostGIS lives in. Most GIS folk unfortunately
have the hardest time with getting up to speed with SQL and just standard RDBMS management.
Two free chapters and accompanying code for all chapters
The two free chapters we selectively picked because we thought they would be most beneficial to newcomers and people new to relational databases.
So the free chapters are:
- Chapter 1: What is a spatial database? Which provides a fast paced history of PostGIS, PostgreSQL, Spatial Databases and moves into
an even faster journey into converting flat file restaurant locations to spatial point geometries, loading in an ESRI shapefile of roads. Then shows you how to write standard
spatial queries and render the results.
- Appendix C: SQL Primer -- goes through querying information_schemas, the common points of writing SELECT, INSERT, UPDATE, DELETE SQL statements and the finer points of using aggregate functions, Windowing constructs and common table expressions as well
as a brief overview of how PostgreSQL stacks up with other relational databases (SQL Server, Oracle, IBM DB2, MySQL, Firebird) in SQL features.
- All the chapter code and accompanying data. It's a bit hefty at 57 MB.
So even if you don't buy our book, we hope you find the free chapters useful.
You can get a more detailed listing of all the chapters from the PostGIS in Action book site.
We'd like to thank all those who supported us through this long and unpredictable journey. Hopefully we'll have several more, though hopefully
a bit less nerve-racking than this first one.
Friday, December 24. 2010
Continue reading "String Aggregation in PostgreSQL, SQL Server, and MySQL"
Question: You have a table of people and a table that specifies the activities each person is involved
in. You want to return a result that has one record per person and a column that has a listing of activities for each person
separated by semicolons and alphabetically sorted by activity. You also want the whole set alphabetically sorted by person's name.
This is a question we are always asked and since we mentor on various flavors of databases,
we need to be able to switch gears and provide an answer that works on the client's database. Most
often the additional requirement is that you can't install new functions in the database. This means that
for PostgreSQL/SQL Server that both support defining custom aggregates, that is out as an option.
Normally we try to come up with an answer that works in most databases, but sadly the only solution that works in
most is to push the problem off to the client front end and throw up your hands and proclaim -- "This ain't something that should be
done in the database and is a reporting problem." That is in fact what many database purists do, and all I can say to them is wake up and smell the coffee before you are out of a job.
We feel that data
transformation is an important function of a database, and if your database is incapable of massaging the data into a format
your various client apps can easily digest, WELL THAT's A PROBLEM.
We shall now document this answer rather than trying to answer for the nteenth time. For starter's
PostgreSQL has a lot of answers to this question, probably more so than any other, though some are easier to execute than others
and many depend on the version of PostgreSQL you are using. SQL Server has 2 classes of answers neither of which is terribly appealing,
but we'll go over the ones that don't require you to be able to install .NET stored functions in your database since we said that is often a requirement.
MySQL has a fairly
simple, elegant and very portable way that it has had for a really long time.
Monday, October 05. 2009
Continue reading "Allocating People into Groups with SQL the Sequel"
In our prior story about allocating people with the power of window aggregation, we saw our valiant hero and heroine trying
to sort people into elevators
to ensure that each elevator ride was not over capacity. All was good in the world until someone named Frank came along and spoiled the party.
Frank rightfully pointed out that our algorithm was flawed because should Charlie double his weight, then we could have one elevator ride over capacity.
We have a plan.
Monday, September 28. 2009
Continue reading "Allocating People into Groups with Window aggregation"
This is along the lines of more stupid window function fun and how many ways can we abuse this technology in PostgreSQL. Well actually we were using this approach to allocate geographic areas such that each area has approximately the same population
of things. So you can imagine densely populated areas would have smaller regions and more of them and less dense areas will have larger regions but fewer of them (kind of like US Census tracts).
So you have to think about ways of allocating your regions so you don't have a multipolygon where one part is in one part of the world and the other in another etc. Using window aggregation is one approach in conjunction with spatial sorting algorithms.
The non-spatial equivalent of this problem is how do you shove people in an elevator and ensure you don't exceed the capacity of the elevator for each ride. Below is a somewhat naive way of doing this.
The idea being you keep on summing the weights until you reach capacity and then start a new grouping.
Wednesday, July 01. 2009
Continue reading "Window Functions Comparison Between PostgreSQL 8.4, SQL Server 2008, Oracle, IBM DB2"
PostgreSQL 8.4 has ANSI SQL:2003 window functions support. These are often classified under the umbrella terms of basic Analytical or Online Application Processing (OLAP) functions.
They are used most commonly for producing cumulative sums, moving averages and generally rolling calculations that need to look at a subset of the overall dataset (a window frame of data) often relative to a particular row.
For users who use SQL window constructs extensively, this may have been one reason in the past to not to give PostgreSQL a second look. While you may not
consider PostgreSQL as a replacement for existing projects because of the cost of migration, recoding and testing, this added new feature is definitely a selling point
for new project consideration.
If you rely heavily on windowing functions, the things you probably want to know most about the new PostgreSQL 8.4 offering are:
- What SQL window functionality is supported?
- How does PostgreSQL 8.4 offering compare to that of the database you are currently using?
- Is the subset of functionality you use supported?
To make this an easier exercise we have curled thru the documents of the other database vendors to distill what the SQL Windowing functionality they provide in their core product.
If you find any mistakes or ambiguities in the below please don't hesitate to let us know and we will gladly amend.
For those who are not sure what this is and what all the big fuss is about, please read our rich commentary on the topic of window functions.
Wednesday, May 27. 2009
Continue reading "Running totals and sums using PostgreSQL 8.4 Windowing functions"
One thing that is pretty neat about windowing functions in PostgreSQL 8.4 aside from built-in windowing functions (row_number(), rank(), lead(), lag(), dense_rank(), percent_rank(), cume_dist(), first_value, last_value, nth_value) as documented in the manual Windowing Functions is that you can use windows with most aggregate functions (built-in or custom defined) as well as define your own specific windowing functions. In a later article, we'll demonstrate creating custom windowing functions.
In our PGCon 2009 PostGIS presentation one of the last slides demonstrates using lead() and lag() windowing functions
to show a family's income level in the same record with the income levels of the next door neighbors in the fictitious town we created. This is not terribly useful unless you live
in a somewhat dysfunctional neighborhood where everyone is concerned about how rich their neighbors are compared to themselves. Our town was very dysfunctional but mostly geographically dysfunctional. We will have much more useful use cases of this as applied to GIS in our upcoming PostGIS in Action book.
Hitoshi Harada and David Fetter did a presentation of this in PGCon 2009 which sadly we missed since we were giving our own presentation.
Check out the PGCon2009 PostgreSQL 8.4 Windowing Functions Video. Also check out the slides at Introducing Windowing Functions.
Those who have used SQL Server 2005+, Oracle or IBM DBII are probably familar or have run into examples of Windowing functions in those products. Windowing in PostgreSQL 8.4 works more or less the same way. In a prior article, we demonstrated how to return running totals and sums using rudimentary SQL.
To precelebrate the eminent arrival of PostgreSQL 8.4 and the current PostgreSQL 8.4 beta 2 release, we shall demonstrate the same exercise using the new ANSI SQL:2003 Windowing functionality built
into the upcoming PostgreSQL 8.4.
Sunday, April 05. 2009
PostgresQL 8.4 beta will be out any day and 8.4 official release will hopefully not be too far behind. As we wait patiently for the official release,
Robert Treat has summarized nicely all the new features you can expect in 8.4.
PostgreSQL 8.4 is what I like to call an earth-shattering release because it has so many big ticket items in there, but also some long-needed usability features in it.
While we all know about the Windowing functions and CTEs and Recursive CTEs, there are a couple of usability features that we always get beat up on, which I am glad to see will be in 8.4.
- Ability to add new columns to a view with CREATE OR REPLACE without having to drop the view and all the view dependents
- Case insensitivity module
- Improved Vacuum performance
- Common Table Expressions and Recursive Common Table Expressions (CTE), windowing functions - Hubert has an example of this in Waiting for 8.4 - window functions
Now the other niceties and usuability features which are nice but not quite as top of our list as the aforementioned.
Note this far from an exhaustive list, but Robert Treat's 8.4 slide presentaton is pretty exhaustive:
- Variadic functions -- these are functions that have default values defined so can be called with varying arguments. To achieve this before you would have had to create
a separate function that calls the first and passes in the default arg. NOte this can be done with any pl langauge and in fact we demonstrated its use in PL/Python PL/Python and default parameters.
- All plpgsql language and other non-sql/non-c proc languages that return sets to be called in the SELECT clause. To get around this problem before, you'd create your sophisticated
set returning function in plpgsql or python or whatever and then wrap it in an SQL function. No need for that hack anymore. Again we demonstrated this feature in PL/Python Pl/Python for loops and returning sets
- pg_terminate_backend -- this kills a backend PostgreSQL process instead of just cancelling the query running on it as pg_cancel_backend did
- Column level priviledges - Hubert has a good example of this in Waiting for 8.4 - column level privileges .
- Faster Restore -- now Restore can use parallel threads
- RETURN QUERY EXECUTE support in plpgsql
- LIMIT clause can take a subquery -- SELECT a.field1, a.field2 FROM a LIMIT (SELECT COUNT(*)/10 FROM a)
- Make As alias in column SELECT optional as the ANSI SQL Standard allows. So you can now do - SELECT a field1, b field2 .... This is not something we would suggest since we find it makes
code hard to read, but does make code that used this regrettable syntax more portably converted to PostgreSQL. It would be nice if this were a flag though in the config that can be turned on since
I find it to be bad practice and encouraging bad habits.
- Numerours changes to EXPLAIN to show columns used, maintenance improvements such as dead-locking reporting