Postgres OnLine Journal

Thursday, June 18. 2009

Restore of functional indexes gotcha

This has been bugging me for a long time and I finally complained about it and Tom Lane kindly gave a reason for the problem and that its by design and not a bug.

So I thought I would post the situation here without getting into too many embarassing specifics in case others have suffered from a similar fate and can learn from this.

The situation:

You create a function lets call it myniftyfunc() in the public schema.
Then you create another function that depends on myniftyfunc(), lets call it mysuperniftyfunc() also in public schema.
Then because your function is such a super nifty function, you decide to create a functional index with that super function on your table that sits in mysuperdata schema - mysuperdata.mysupertable

Your super nifty function is doing its thing; your table is happy; the planner is spitting out your queries lightning fast using the super nifty index on your super table; The world is good.

One day you decide to restore your nifty database backup and to your chagrin, your nifty index is not there. The planner is no longer happily spitting out your queries lighting fast and everything has come to a painful crawl. Your super nifty index is gone. What happened to super nifty functional index?

I have to admit that I'm the type of person that assumes the public schema is always there and always in search_path and that my assumption is a flawed one. After all the public schema is there by default on new databases for convenience, but one can change it not to be in the search_path and in fact pg_dump does just that. So if everything you have is kept in public schema -- you don't run into this particular misfortune. If however you have your functions in public and your tables in different schemas, during restore -- the search path is changed to the schema being restored and your super functional indexes based on super functions that depend on other super functions fail because public is no longer in the search_path.

Continue reading "Restore of functional indexes gotcha"

Posted by Leo Hsu and Regina Obe in intermediate, q&a at 23:04 | Comments (8) | Trackbacks (0)

Tuesday, June 09. 2009

Planner Statistics

Printer Friendly

You'll often hear the term planner statistics thrown around by database geeks. Did you update your statistics. This lingo isn't even limited to PostgreSQL, but is part and parcel to how most decent databases work. For example in PostgreSQL you do a vacuum analyze to update your planner statistics in addition to cleaning up dead space. In SQL Server you do an UPDATE STATISTICS. In MySQL you do an ANALYZE TABLE or a more invasive OPTIMIZE TABLE.

Normally all this "update your stats so your planner can be happy" is usually unnecessary unless you just did a bulk load or a bulk delete or you are noticing your queries are suddenly slowing down. These stat things are generally updated behind the scenes by most databases on an as needed basis.

What makes SQL really interesting and a bit different from procedural languages is that it is declarative (like functional and logical programming languages) and relies on the database planner to come up with strategies for navigating the data. Its strategy is not fixed as it is in procedural languages. A big part of this strategy is decided on by the query planner which looks at distributions of data. Given different WHERE conditions for similar queries, it could come up with vastly different strategies if one value has a significantly higher distribution in a table than another. This is also the mystery of why it sometimes refuses to use an index on a field because it has decided a table scan is more efficient and also why some people consider HINTS evil because they pollute the imperative nature of the language.

Continue reading "Planner Statistics"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, basics, beginner, mysql, sql server at 09:10 | Comments (0) | Trackbacks (0)

Wednesday, May 27. 2009

Running totals and sums using PostgreSQL 8.4 Windowing functions

Printer Friendly

Recommended Books: SQL Cookbook SQL Pocket Guide SQL The Complete Reference

One thing that is pretty neat about windowing functions in PostgreSQL 8.4 aside from built-in windowing functions (row_number(), rank(), lead(), lag(), dense_rank(), percent_rank(), cume_dist(), first_value, last_value, nth_value) as documented in the manual Windowing Functions is that you can use windows with most aggregate functions (built-in or custom defined) as well as define your own specific windowing functions. In a later article, we'll demonstrate creating custom windowing functions.

In our PGCon 2009 PostGIS presentation one of the last slides demonstrates using lead() and lag() windowing functions to show a family's income level in the same record with the income levels of the next door neighbors in the fictitious town we created. This is not terribly useful unless you live in a somewhat dysfunctional neighborhood where everyone is concerned about how rich their neighbors are compared to themselves. Our town was very dysfunctional but mostly geographically dysfunctional. We will have much more useful use cases of this as applied to GIS in our upcoming PostGIS in Action book.

Hitoshi Harada and David Fetter did a presentation of this in PGCon 2009 which sadly we missed since we were giving our own presentation. Check out the PGCon2009 PostgreSQL 8.4 Windowing Functions Video. Also check out the slides at Introducing Windowing Functions.

Those who have used SQL Server 2005+, Oracle or IBM DBII are probably familar or have run into examples of Windowing functions in those products. Windowing in PostgreSQL 8.4 works more or less the same way. In a prior article, we demonstrated how to return running totals and sums using rudimentary SQL. To precelebrate the eminent arrival of PostgreSQL 8.4 and the current PostgreSQL 8.4 beta 2 release, we shall demonstrate the same exercise using the new ANSI SQL:2003 Windowing functionality built into the upcoming PostgreSQL 8.4.

Continue reading "Running totals and sums using PostgreSQL 8.4 Windowing functions"

Posted by Leo Hsu and Regina Obe in basics, db2, intermediate, oracle, sql server, window functions at 17:01 | Comments (0) | Trackback (1)

Thursday, May 21. 2009

Creating instance of custom type

Printer Friendly

Someone asked me this recently and not playing with custom types much, I'm not sure this is the best solution. Anyrate here it goes. Note this works for PostgreSQL 8.2 and above. Note sure about 8.1 etc. I assume it probably does.

Let us say you created a custom type something like this:

CREATE TYPE my_type1 AS
   (name varchar(150),
    rotation_x double precision,
    rotation_y double precision,
    x_pos integer,
    y_pos integer
    );

Continue reading "Creating instance of custom type"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, intermediate, postgis, postgresql versions, q&a at 04:54 | Comments (4) | Trackbacks (0)

Friday, May 08. 2009

PostGIS 1.3.6 is out and new upcoming PostGIS book

Printer Friendly

This month is jam packed with a lot of PostGIS news.

PostGIS 1.3.6 is out

PostGIS 1.3.6 has been released. It is mostly a bug fix relase and is the first PostGIS that can be compiled under PostgreSQL 8.4 beta. Details can be found at PostGIS 1.3.6 release notes. We don't have Windows binaries ready yet, but expect to see that in the next week or so.

We are writing a PostGIS Book

Leo and I have been writing a PostGIS in Action book for the past couple of months, and now that it is finally listed on the Manning website, we can talk about it. We are working on our chapter 4 right now. If you are interested in learning PostGIS, check it out. The first chapter is free and with the Manning Early Access Program (MEAP), you can purchase the book now and have great influence on the direction of the book.

The book starts off hopefully with a gentle introduction to OpenGIS Consortium (OGC) spatial databases and concepts in general and PostgreSQL/PostGIS in particular. As we move further into the book, we cover more advanced ground. We plan to cover some of the new PostgreSQL 8.4 features in conjunction with PostGIS, writing stored functions to solve spatial problems and some of the other new exciting stuff and ancillary tools for PostGIS such as PgRouting, Tiger Geocoder, and WKT Raster.

Given all that ground, I suspect our estimate of 325 pages, may be a little low when all is said and done. It is funny that when we started out, we thought to ourselves -- "How can anyone fill up 325 pages." Turns out very easily especially once you start throwing in diagrams and pictures to demonstrate a point. Diagrams are kind of important to have when describing GIS and geometry concepts. So far its been fun and has forced us to sit down and walk thru all the things we took for granted and thought we understood but didn't. You realize just how little you understand when you try to explain something to someone else who really doesn't understand. So perhaps the process of explaining is the greatest of all learning experiences.

Continue reading "PostGIS 1.3.6 is out and new upcoming PostGIS book"

Posted by Leo Hsu and Regina Obe in 8.4, editor note, gis, postgis at 00:46 | Comments (0) | Trackback (1)

Wednesday, April 29. 2009

Loading and Processing GPX XML files using PostgreSQL

Printer Friendly

Recommended Books: XSLT XSLT 2.0 and XPath 2.0 Programmer's Reference

Simon Greener, wrote an article on how to load GPX xml files into Oracle XMLDB. That got me thinking that I haven't really explored all the XML features that PostgreSQL has to offer and to some extent I've been reticent about XML processed in any database for that matter.

In this article we shall attempt to perform the same feats that Simon did, but with PostgreSQL instead of Oracle XMLDB. Note while we are demonstrating this with a GPX file, the same XPath approach can be used to process any XML file.

Continue reading "Loading and Processing GPX XML files using PostgreSQL"

Posted by Leo Hsu and Regina Obe in 8.3, 8.4, application development, intermediate, oracle, postgresql versions at 00:28 | Comments (4) | Trackbacks (0)

Monday, April 27. 2009

Who needs sports when you have the database industry

Printer Friendly

The database industry is getting way too action packed for my blood. First we had Sun buying MySQL, then Oracle buying Sun (thus inheriting MySQL and Java) (recall they already owned InnoDb (the MySQL main transactional engine) and they also by the way own BerkelyDB which is the database engine underlying Subversion repository and they also own all those CRMS and ERPs (Peoplesoft and Seibel), and now we have IBM integrating EnterpriseDb into their DB so that it can look like Oracle Db. What next? Perhaps Microsoft will join the party to integrate EnterpriseDb into their SQL Server offering so SQL Server can look like Oracle and better yet a SQL Server for Linux/Unix to complete the circle.

Continue reading "Who needs sports when you have the database industry"

Posted by Leo Hsu and Regina Obe in editor note at 13:56 | Comment (1) | Trackbacks (0)

Sunday, April 26. 2009

Where is my data and other stuff

Printer Friendly

Different Linux distros have their preferred place of where stuff goes and of course the default location on windows is completely different from that too. So there isn't really one default location where you can find PostgreSQL data cluster. Of course user's can pick their locations as well. So what is a casual DBA supposed to do?

The pg_settings table

PostgreSQL has a convenient system ~~table~~ view called pg_settings that stores a lot of information. It stores the location of the data cluster, the pg_hbafile and other conf files. In additon to that you can interogate it to find out information you will find in the postgresql.conf file. Why sift thru that postgresql.conf file (assuming you can already query your postgresql server) when you can find the answers you are looking for with an SQL query?

Continue reading "Where is my data and other stuff"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, beginner, q&a at 02:45 | Comments (5) | Trackbacks (0)

Saturday, April 18. 2009

How to force PostgreSQL to use a pre-calculated value

Printer Friendly

This question is one that has come up a number of times in PostGIS newsgroups worded in many different ways. The situation is that if you use a function a number of times not changing the arguments that go into the function, PostgreSQL still insists on recalculating the value even when the function is marked IMMUTABLE. I have tested this on 8.2 and 8.3 with similarly awful results.

This issue is not so much a problem if function calculations are fast, but spatial function calculations relative to most other functions you will use are pretty slow especially when dealing with large geometries. As a result your query could end up twice as slow. Even setting the costs of these functions to relatively high does not help the situation.

To demonstrate here is a non-PostGIS version of the issue that everyone should be able to run and demonstrates its not a PostGIS only issue.

Continue reading "How to force PostgreSQL to use a pre-calculated value"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, gis, intermediate, postgis, q&a at 22:33 | Comments (8) | Trackbacks (2)

Sunday, April 05. 2009

What is new in PostgreSQL 8.4

Printer Friendly

PostgresQL 8.4 beta will be out any day and 8.4 official release will hopefully not be too far behind. As we wait patiently for the official release, Robert Treat has summarized nicely all the new features you can expect in 8.4. PostgreSQL 8.4 is what I like to call an earth-shattering release because it has so many big ticket items in there, but also some long-needed usability features in it.

While we all know about the Windowing functions and CTEs and Recursive CTEs, there are a couple of usability features that we always get beat up on, which I am glad to see will be in 8.4. these are

Ability to add new columns to a view with CREATE OR REPLACE without having to drop the view and all the view dependents
Case insensitivity module
Improved Vacuum performance
Common Table Expressions and Recursive Common Table Expressions (CTE), windowing functions - Hubert has an example of this in Waiting for 8.4 - window functions

Now the other niceties and usuability features which are nice but not quite as top of our list as the aforementioned. Note this far from an exhaustive list, but Robert Treat's 8.4 slide presentaton is pretty exhaustive:

Variadic functions -- these are functions that have default values defined so can be called with varying arguments. To achieve this before you would have had to create a separate function that calls the first and passes in the default arg. NOte this can be done with any pl langauge and in fact we demonstrated its use in PL/Python PL/Python and default parameters.
All plpgsql language and other non-sql/non-c proc languages that return sets to be called in the SELECT clause. To get around this problem before, you'd create your sophisticated set returning function in plpgsql or python or whatever and then wrap it in an SQL function. No need for that hack anymore. Again we demonstrated this feature in PL/Python Pl/Python for loops and returning sets
pg_terminate_backend -- this kills a backend PostgreSQL process instead of just cancelling the query running on it as pg_cancel_backend did
Column level priviledges - Hubert has a good example of this in Waiting for 8.4 - column level privileges .
Faster Restore -- now Restore can use parallel threads
RETURN QUERY EXECUTE support in plpgsql
LIMIT clause can take a subquery -- SELECT a.field1, a.field2 FROM a LIMIT (SELECT COUNT(*)/10 FROM a)
Make As alias in column SELECT optional as the ANSI SQL Standard allows. So you can now do - SELECT a field1, b field2 .... This is not something we would suggest since we find it makes code hard to read, but does make code that used this regrettable syntax more portably converted to PostgreSQL. It would be nice if this were a flag though in the config that can be turned on since I find it to be bad practice and encouraging bad habits.
Numerours changes to EXPLAIN to show columns used, maintenance improvements such as dead-locking reporting

Posted by Leo Hsu and Regina Obe in 8.4, new in postgresql, postgresql versions, window functions at 01:43 | Comments (2) | Trackbacks (0)

Wednesday, April 01. 2009

Many changes in PostgreSQL

Printer Friendly

Today was a very eventful day for PostgreSQL. We'll cover these changes in a bit.

Massive Forking of PostgreSQL project

First in PostgreSQL Announcements - David Fetter announces massive forking of the PostgreSQL project in several factions. We now have the following -- so take your pick:

Shizzle: High-performance and Feature-Free
MaryMary: Compiled with libhaltingproblem
Narcona: Painless installation and setup
OurThing: Lots of sources, based in Sicily
XPostgres: Everybody who's ever worked on Postgres code, back to UC Berkeley and Illustra.
Moon/PostgreSQL: Corporate support, as long as it lasts.

I feel this may be good for the community because it is hard to satisfy all these factions in one project. Now perhaps the newsgroups will be a bit calmer.

PgAdmin has come to an end -- make way for OpenPgAdmin

Dave Page announced today that the PgAdmin team received an offer they couldn't refuse from a very big software company yet to be announced. So they are closing PgAdmin and you will soon be able to purchase the services and support contracts from this new company.

Not to worry, Devrim Gunduz, has forked PgAdmin to form OpenPgAdmin. You can check out the site here http://openpgadmin.info.

I must say as much as we are saddened to see the PgAdmin group leave for more fun escapades, We are happy we finally have an administration tool that has the word Open in its name. If it starts with Open, its got to be open. Now all we need is an OpenPost PostgreSQL fork to go with it. OpenPost I think will be easier to pronounce than PostgreSQL and Postgres and it has Open in its name.

So to add to the list of PostgreSQL project forks, I would like to see another project fork

OpenPost: Its free, open source, fast, feature-rich, easy to use and best of all you can pronounce it and its open.

Posted by Leo Hsu and Regina Obe in editor note, joke at 08:26 | Comments (2) | Trackbacks (0)

Saturday, March 28. 2009

Determining size of database, schema, tables, and geometry

Printer Friendly

Even though others have blogged about this in the past and its well-documented in the docs, its a frequently enough asked question, that we thought we'd post it here again with a couple of additional twists.

How to determine the size of a database on disk

SELECT pg_size_pretty(pg_database_size('somedatabase')) As fulldbsize;

How to determine the size of a database table on disk

NOTE: There are two functions in PostgreSQL - pg_relation_size and pg_total_relation_size. The pg_relation_size just measures the size of the actual table where as the pg_total_relation_size includes both the table and all its toasted tables and indexes.

SELECT pg_size_pretty(pg_total_relation_size('someschema.sometable')) As fulltblsize,
pg_size_pretty(pg_relation_size('someschema.sometable')) As justthetblsize;

How to determine the size of a database schema

When you are doing a lot of data loading for later massaging to dump into your production tables, you will often generate junk tables in the process and worse yet other people may generate junk tables period. As a general practice, we like to create a schema just for junk. The reason for that is that it is extremely easy to exclude schemas from being backed up and to load individual schemas. So with this approach we create or move all tables we are not quite sure if they are needed or we only need temporarily into a schema called junk, scratch, possiblejunk or whatever you want to call the schema and exclude it from our backups.

Something of the form:

CREATE SCHEMA scratch;
ALTER TABLE someschema.joeyplaying SET SCHEMA scratch;

After saving the world from junk, you would like to know how much space you have saved your backups from backing up. So you would employ a query something of the form:


SELECT pg_size_pretty(pg_database_size('mycurrentdb')) As fullprod,
pg_size_pretty(CAST(pg_database_size('mycurrentdb') - (SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name)  )
FROM information_schema.tables WHERE table_schema = 'scratch') As bigint)) As  tobebackedup_size,
pg_size_pretty(CAST((SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables
	WHERE table_schema = 'scratch') As bigint) )  As junk_size;

Size of Geometries in PostGIS tables

PostGIS has a companion function for measuring geometry size which is useful when you want to get a sense of how much space your geometries are taking up on disk for a set of records.


SELECT ST_Mem_Size(ST_GeomFromText('LINESTRING(220268 150415,220227 150505,220227 150406)'));

SELECT pg_size_pretty(CAST(SUM(ST_Mem_Size(the_geom)) As bigint) ) as totgeomsum
FROM sometable WHERE state = 'MA';

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, beginner, gis, postgis, postgresql versions, q&a at 12:19 | Comments (9) | Trackback (1)

Thursday, March 26. 2009

In the News

Printer Friendly

Happy belated Ada Byron Lovelace day

First this is the first year that Ada Byron Lovelace day is celebrated, and somehow I managed to miss it. It was celebrated March 24th, 2009 so I guess I'm a couple of days behind the times. For those who are unfamiliar with who Ada Byron Lovelace is. She was the daughter of the poet Lord Byron and considered to be the first computer programmer in the world. The fact that she was a woman I consider a side benefit. So I guess this means March 24th is also computer programmer honor day too.

Continue reading "In the News"

Posted by Leo Hsu and Regina Obe at 17:39 | Comments (0) | Trackbacks (0)

Monday, March 16. 2009

PostGIS Raster and More

Printer Friendly

Recommended Books: PostGIS In Action Applied Spatial Data Analysis with R

The OSGEO Toronto Sprint

The OSGEO C-Camp Toronto Sprint was fun, although Leo and I couldn't stay for the whole event. I've never seen people close bugs so quickly. Paul Ramsey and Mark Cave-Ayland were on a marathon run in the PostGIS ring. Olivier Courtin was also following not too far behind with SVG bug fixes and so forth. We also discussed the possiblity of having ST_GeomFromGML, ST_GeomFromGeoJSON, ST_GeomFromKml and so forth and what that would entail. It was great to meet Pierre Racine of WKT Raster fame in person and chat with Mateusz and Sandro Santilli via IRC. Frank Warmerdam, the GDAL god came to our table to provide his big two cents about how WKT Raster meta data should be stored, dealing with large RASTERS and other things I didn't understand.

Mark Leslie in Australia did his part too, though he wasn't present -- he would come into IRC when others had fallen asleep. Such is the way with timezones. He has been working on beefing up the curved support in PostGIS. The FOSS 4G 2009 conference will be in Sydney, Australia.

It was nice to be able to put a face to these people I've talked via newsgroups. It was also strange since most of the clients and many of the people we work with we have never met, so the idea of meeting in person has become a very foreign concept for us.

Continue reading "PostGIS Raster and More"

Posted by Leo Hsu and Regina Obe in editor note at 19:26 | Comments (0) | Trackbacks (0)

Friday, March 13. 2009

PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots

Printer Friendly

Recommended Books: Quick Python Book Beginning Python Python Pocket Reference

In our PLPython Part 4: PLPython meets aggregates we demonstrated the power of PostgreSQL in combination with PLPython to create a sequence of ascii plots by using the power of aggregation. Our friend Simon Greener over at Spatial Db Advisor told me the example was clever but ascii plots was so 70ish and ugly, why didn't I do SVG plots? He felt people being predominantly visual would be caught up in the ugliness of Ascii plots and miss the point. At least Bob found them neat and can appreciate the elegance in simplicity, thanks Bob..

To make it up to Simon, we shall demonstrate 2 features we left out in our prior article.

Pretty SVG graphs -- so I don't have to hear more about SVG from Simon and how I'm caught in the 70s.
Ability to plot a schedule -- schedule item, start date, end date using a multi-column aggregate function. Here is a good use for multi-column aggregates or at least I think so. (Yeh for Multi-column aggregates)

If you are into spatial processing in the database or how many ways you can play with XML particularly Oracle Locator, Oracle Spatial, PostGIS and SQL Server 2008, we highly recommend Simon's articles.

Continue reading "PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, advanced, pl programming, plpython, postgresql versions at 08:28 | Comments (3) | Trackbacks (0)

Thursday, June 18. 2009

Tuesday, June 09. 2009

Wednesday, May 27. 2009

Thursday, May 21. 2009

Friday, May 08. 2009

PostGIS 1.3.6 is out

We are writing a PostGIS Book

Wednesday, April 29. 2009

Monday, April 27. 2009

Sunday, April 26. 2009

The pg_settings table

Saturday, April 18. 2009

Sunday, April 05. 2009

Wednesday, April 01. 2009

Massive Forking of PostgreSQL project

PgAdmin has come to an end -- make way for OpenPgAdmin

Saturday, March 28. 2009

How to determine the size of a database on disk

How to determine the size of a database table on disk

How to determine the size of a database schema

Size of Geometries in PostGIS tables

Thursday, March 26. 2009

Happy belated Ada Byron Lovelace day

Monday, March 16. 2009

The OSGEO Toronto Sprint

Friday, March 13. 2009

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration