Thursday, June 18. 2009
Printer Friendly
This has been bugging me for a long time and I finally complained about it and Tom Lane kindly gave a reason for the problem and that its by design and not a bug.
So I thought I would post the situation here without getting into too many embarassing specifics in case others have suffered from a similar fate and can learn from this.
The situation:
- You create a function lets call it myniftyfunc() in the public schema.
- Then you create another function that depends on myniftyfunc(), lets call it mysuperniftyfunc() also in public schema.
- Then because your function is such a super nifty function, you decide to create a functional index with that super function on your table that sits in mysuperdata schema - mysuperdata.mysupertable
Your super nifty function is doing its thing; your table is happy; the planner is spitting out your queries lightning fast using the super nifty index on your super table;
The world is good.
One day you decide to restore your nifty database backup and to your chagrin, your nifty index is not there. The planner is no longer happily spitting out your queries lighting fast and everything has come to a painful crawl.
Your super nifty index is gone. What happened to super nifty functional index?
I have to admit that I'm the type of person that assumes the public schema is always there and always in search_path and that my assumption is a flawed one. After all the public schema is there by default on new databases for convenience,
but one can change it not to be in the search_path and in fact pg_dump does just that. So if everything you have is kept in public schema -- you don't run into this particular misfortune. If however you have your functions in
public and your tables in different schemas, during restore -- the search path is changed to the schema being restored and your super functional indexes based on super functions that depend on other super functions fail because public is no longer in the search_path.
Continue reading "Restore of functional indexes gotcha"
Tuesday, June 09. 2009
Printer Friendly
You'll often hear the term planner statistics thrown around by database geeks. Did you update your statistics. This lingo isn't even limited
to PostgreSQL, but is part and parcel to how most decent databases work. For example in PostgreSQL you do a vacuum analyze to update your planner statistics in addition
to cleaning up dead space. In SQL Server you do an UPDATE STATISTICS . In MySQL you do an
ANALYZE TABLE or a more invasive OPTIMIZE TABLE .
Normally all this "update your stats so your planner can be happy" is usually unnecessary unless
you just did a bulk load or a bulk delete or you are noticing your queries are suddenly slowing down. These stat things are generally updated behind the scenes by most databases
on an as needed basis.
What makes SQL really interesting and a bit different from procedural languages is that it is declarative (like functional and logical programming languages) and relies on the database planner to come up with strategies for navigating the data. Its strategy is not fixed as it is in procedural languages.
A big part of this strategy is decided on by the query planner which looks at distributions of data. Given different WHERE conditions for similar queries, it could come up with vastly different strategies if one value has a significantly
higher distribution in a table than another. This is also the mystery of why it sometimes refuses to use an index on a field because it has decided a table scan is more efficient and also why some people consider HINTS evil because they pollute the imperative nature of the language.
Continue reading "Planner Statistics"
Wednesday, May 27. 2009
Printer Friendly
One thing that is pretty neat about windowing functions in PostgreSQL 8.4 aside from built-in windowing functions (row_number(), rank(), lead(), lag(), dense_rank(), percent_rank(), cume_dist(), first_value, last_value, nth_value) as documented in the manual Windowing Functions is that you can use windows with most aggregate functions (built-in or custom defined) as well as define your own specific windowing functions. In a later article, we'll demonstrate creating custom windowing functions.
In our PGCon 2009 PostGIS presentation one of the last slides demonstrates using lead() and lag() windowing functions
to show a family's income level in the same record with the income levels of the next door neighbors in the fictitious town we created. This is not terribly useful unless you live
in a somewhat dysfunctional neighborhood where everyone is concerned about how rich their neighbors are compared to themselves. Our town was very dysfunctional but mostly geographically dysfunctional. We will have much more useful use cases of this as applied to GIS in our upcoming PostGIS in Action book.
Hitoshi Harada and David Fetter did a presentation of this in PGCon 2009 which sadly we missed since we were giving our own presentation.
Check out the PGCon2009 PostgreSQL 8.4 Windowing Functions Video. Also check out the slides at Introducing Windowing Functions.
Those who have used SQL Server 2005+, Oracle or IBM DBII are probably familar or have run into examples of Windowing functions in those products. Windowing in PostgreSQL 8.4 works more or less the same way. In a prior article, we demonstrated how to return running totals and sums using rudimentary SQL.
To precelebrate the eminent arrival of PostgreSQL 8.4 and the current PostgreSQL 8.4 beta 2 release, we shall demonstrate the same exercise using the new ANSI SQL:2003 Windowing functionality built
into the upcoming PostgreSQL 8.4.
Continue reading "Running totals and sums using PostgreSQL 8.4 Windowing functions"
Thursday, May 21. 2009
Printer Friendly
Someone asked me this recently and not playing with custom types much, I'm not sure this is
the best solution. Anyrate here it goes. Note this works for PostgreSQL 8.2 and above. Note sure about 8.1 etc.
I assume it probably does.
Let us say you created a custom type something like this:
CREATE TYPE my_type1 AS
(name varchar(150),
rotation_x double precision,
rotation_y double precision,
x_pos integer,
y_pos integer
);
Continue reading "Creating instance of custom type"
Friday, May 08. 2009
Printer Friendly
This month is jam packed with a lot of PostGIS news.
PostGIS 1.3.6 is out
PostGIS 1.3.6 has been released. It is mostly a bug fix relase and is the first PostGIS that can be compiled under PostgreSQL 8.4 beta. Details can be found at
PostGIS 1.3.6 release notes. We don't have Windows binaries ready yet, but expect to see that in the next week or so.
We are writing a PostGIS Book
Leo and I have been writing a PostGIS in Action book for the past couple of months, and now that it is finally listed on the Manning website, we can talk about it.
We are working on our chapter 4 right now. If you are interested in learning PostGIS, check it out. The first chapter is free and with the Manning Early Access Program (MEAP), you can purchase the book now and have great influence on
the direction of the book.
The book starts off hopefully with a gentle introduction to OpenGIS Consortium (OGC) spatial databases and concepts in general and PostgreSQL/PostGIS in particular. As we move further into the book, we cover more advanced ground.
We plan to cover some of the new PostgreSQL 8.4 features in conjunction with PostGIS, writing stored functions to solve spatial problems and some of the other new exciting stuff and ancillary tools for PostGIS such as PgRouting, Tiger Geocoder,
and WKT Raster.
Given all that ground, I suspect our estimate of 325 pages, may be a little low when all is said and done. It is funny that when we started out, we thought to ourselves -- "How can anyone fill up 325 pages." Turns out very easily especially
once you start throwing in diagrams and pictures to demonstrate a point. Diagrams are kind of important to have when describing GIS and geometry concepts. So far its been fun and has forced us to sit down and walk thru all the things we took for granted and thought we understood but didn't. You realize just how little
you understand when you try to explain something to someone else who really doesn't understand. So perhaps the process of explaining is the greatest of all learning experiences.
Continue reading "PostGIS 1.3.6 is out and new upcoming PostGIS book"
Wednesday, April 29. 2009
Printer Friendly
Simon Greener, wrote an article on how to load GPX xml files into Oracle XMLDB. That
got me thinking that I haven't really explored all the XML features that PostgreSQL has to offer
and to some extent I've been reticent about XML processed in any database for that matter.
In this article we shall attempt to perform the same feats that Simon did, but with PostgreSQL instead of
Oracle XMLDB. Note while we are demonstrating this with a GPX file, the same XPath approach can be used to process any XML file.
Continue reading "Loading and Processing GPX XML files using PostgreSQL"
Monday, April 27. 2009
Printer Friendly
The database industry is getting way too action packed for my blood.
First we had Sun buying MySQL, then Oracle buying Sun (thus inheriting MySQL and Java)
(recall they already owned
InnoDb (the MySQL main transactional engine) and they also by the way own BerkelyDB
which is the database engine underlying Subversion repository
and they also own all those CRMS and ERPs (Peoplesoft and Seibel),
and now we have IBM integrating EnterpriseDb into their DB so that it can look like Oracle Db.
What next? Perhaps Microsoft will join the party
to integrate EnterpriseDb into their SQL Server offering so SQL Server can look like Oracle and better yet a SQL Server for Linux/Unix to complete the circle.
Continue reading "Who needs sports when you have the database industry"
Sunday, April 26. 2009
Printer Friendly
Different Linux distros have their preferred place of where stuff goes and of course the default location on windows is completely different from that too. So there isn't really one default location where you can find PostgreSQL data cluster. Of course user's can pick their
locations as well. So what is a casual DBA supposed to do?
The pg_settings table
PostgreSQL has a convenient system table view called pg_settings that stores a lot of information. It stores the location of the data cluster, the pg_hbafile and other conf files.
In additon to that you can interogate it to find out information you will find in the postgresql.conf file. Why sift thru that postgresql.conf file (assuming you can already query your postgresql server) when you can find the answers you are looking
for with an SQL query?
Continue reading "Where is my data and other stuff"
Saturday, April 18. 2009
Printer Friendly
This question is one that has come up a number of times in PostGIS newsgroups worded in many different ways. The situation is that if you use a function a number of times
not changing the arguments that go into the function, PostgreSQL still insists on recalculating the value even when the function is marked IMMUTABLE. I have tested this on
8.2 and 8.3 with similarly awful results.
This issue is not so much a problem if function calculations are fast, but spatial function calculations relative to most other functions you will use are pretty
slow especially when dealing with large geometries. As a result your query could end up twice as slow. Even setting the costs of these functions to relatively high does not help the situation.
To demonstrate here is a non-PostGIS version of the issue that everyone should be able to run and demonstrates its not a PostGIS only issue.
Continue reading "How to force PostgreSQL to use a pre-calculated value"
Sunday, April 05. 2009
Printer Friendly
PostgresQL 8.4 beta will be out any day and 8.4 official release will hopefully not be too far behind. As we wait patiently for the official release,
Robert Treat has summarized nicely all the new features you can expect in 8.4.
PostgreSQL 8.4 is what I like to call an earth-shattering release because it has so many big ticket items in there, but also some long-needed usability features in it.
While we all know about the Windowing functions and CTEs and Recursive CTEs, there are a couple of usability features that we always get beat up on, which I am glad to see will be in 8.4.
these are
- Ability to add new columns to a view with CREATE OR REPLACE without having to drop the view and all the view dependents
- Case insensitivity module
- Improved Vacuum performance
- Common Table Expressions and Recursive Common Table Expressions (CTE), windowing functions - Hubert has an example of this in Waiting for 8.4 - window functions
Now the other niceties and usuability features which are nice but not quite as top of our list as the aforementioned.
Note this far from an exhaustive list, but Robert Treat's 8.4 slide presentaton is pretty exhaustive:
- Variadic functions -- these are functions that have default values defined so can be called with varying arguments. To achieve this before you would have had to create
a separate function that calls the first and passes in the default arg. NOte this can be done with any pl langauge and in fact we demonstrated its use in PL/Python PL/Python and default parameters.
- All plpgsql language and other non-sql/non-c proc languages that return sets to be called in the SELECT clause. To get around this problem before, you'd create your sophisticated
set returning function in plpgsql or python or whatever and then wrap it in an SQL function. No need for that hack anymore. Again we demonstrated this feature in PL/Python Pl/Python for loops and returning sets
- pg_terminate_backend -- this kills a backend PostgreSQL process instead of just cancelling the query running on it as pg_cancel_backend did
- Column level priviledges - Hubert has a good example of this in Waiting for 8.4 - column level privileges .
- Faster Restore -- now Restore can use parallel threads
- RETURN QUERY EXECUTE support in plpgsql
- LIMIT clause can take a subquery -- SELECT a.field1, a.field2 FROM a LIMIT (SELECT COUNT(*)/10 FROM a)
- Make As alias in column SELECT optional as the ANSI SQL Standard allows. So you can now do - SELECT a field1, b field2 .... This is not something we would suggest since we find it makes
code hard to read, but does make code that used this regrettable syntax more portably converted to PostgreSQL. It would be nice if this were a flag though in the config that can be turned on since
I find it to be bad practice and encouraging bad habits.
- Numerours changes to EXPLAIN to show columns used, maintenance improvements such as dead-locking reporting
Wednesday, April 01. 2009
Printer Friendly
Today was a very eventful day for PostgreSQL. We'll cover these changes in a bit.
Massive Forking of PostgreSQL project
First in PostgreSQL Announcements - David Fetter announces massive forking of the PostgreSQL project
in several factions. We now have the following -- so take your pick:
- Shizzle: High-performance and Feature-Free
- MaryMary: Compiled with libhaltingproblem
- Narcona: Painless installation and setup
- OurThing: Lots of sources, based in Sicily
- XPostgres: Everybody who's ever worked on Postgres code, back to UC Berkeley and Illustra.
- Moon/PostgreSQL: Corporate support, as long as it lasts.
I feel this may be good for the community
because it is hard to satisfy all these factions in one project. Now perhaps the newsgroups will be a bit calmer.
PgAdmin has come to an end -- make way for OpenPgAdmin
Dave Page announced today that the PgAdmin team received an offer they couldn't refuse from a very
big software company yet to be announced. So they are closing PgAdmin and you will soon be able to purchase
the services and support contracts from this new company.
Not to worry, Devrim Gunduz, has forked PgAdmin to form OpenPgAdmin. You can
check out the site here http://openpgadmin.info.
I must say as much as we are saddened to see the PgAdmin group leave for more fun escapades, We are happy we finally have an administration tool that has the word Open in its name.
If it starts with Open, its got to be open. Now all we need is an OpenPost PostgreSQL fork to go with it. OpenPost I think
will be easier to pronounce than PostgreSQL and Postgres and it has Open in its name.
So to add to the list of PostgreSQL project forks, I would like to see another project fork
OpenPost: Its free, open source, fast, feature-rich, easy to use and best of all you can pronounce it and its open.
Saturday, March 28. 2009
Printer Friendly
Even though others have blogged about this in the past and its well-documented in the docs, its a frequently enough asked question, that we thought we'd post it here again
with a couple of additional twists.
How to determine the size of a database on disk
SELECT pg_size_pretty(pg_database_size('somedatabase')) As fulldbsize;
How to determine the size of a database table on disk
NOTE: There are two functions in PostgreSQL - pg_relation_size and pg_total_relation_size. The pg_relation_size just measures the size of the actual table where
as the pg_total_relation_size includes both the table and all its toasted tables and indexes.
SELECT pg_size_pretty(pg_total_relation_size('someschema.sometable')) As fulltblsize,
pg_size_pretty(pg_relation_size('someschema.sometable')) As justthetblsize;
How to determine the size of a database schema
When you are doing a lot of data loading for later massaging to dump into your production tables, you will often generate junk tables in the process and worse yet other
people may generate junk tables period. As a general practice, we like to create a schema just for junk. The reason for that is that it is extremely easy to
exclude schemas from being backed up and to load individual schemas. So with this approach we create or move all tables we are not quite sure if they are needed or we only need temporarily
into a schema
called junk, scratch, possiblejunk or whatever you want to call the schema and exclude it from our backups.
Something of the form:
CREATE SCHEMA scratch;
ALTER TABLE someschema.joeyplaying SET SCHEMA scratch;
After saving the world from junk, you would like to know how much space you have saved your backups from backing up. So you would employ a query something of the form:
SELECT pg_size_pretty(pg_database_size('mycurrentdb')) As fullprod,
pg_size_pretty(CAST(pg_database_size('mycurrentdb') - (SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables WHERE table_schema = 'scratch') As bigint)) As tobebackedup_size,
pg_size_pretty(CAST((SELECT SUM(pg_total_relation_size(table_schema || '.' || table_name) )
FROM information_schema.tables
WHERE table_schema = 'scratch') As bigint) ) As junk_size;
Size of Geometries in PostGIS tables
PostGIS has a companion function for measuring geometry size which is useful when you want to get a sense of how much space your geometries are taking up on disk
for a set of records.
SELECT ST_Mem_Size(ST_GeomFromText('LINESTRING(220268 150415,220227 150505,220227 150406)'));
SELECT pg_size_pretty(CAST(SUM(ST_Mem_Size(the_geom)) As bigint) ) as totgeomsum
FROM sometable WHERE state = 'MA';
Thursday, March 26. 2009
Printer Friendly
Happy belated Ada Byron Lovelace day
First this is the first year that Ada Byron Lovelace day is celebrated, and somehow I managed to miss it. It was celebrated March 24th, 2009
so I guess I'm a couple of days behind the times. For those who are unfamiliar with who Ada Byron Lovelace is. She was the daughter of the poet Lord Byron
and considered to be the first computer programmer in the world. The fact that she was a woman I consider a side benefit. So I guess this means March 24th is also
computer programmer honor day too.
Continue reading "In the News"
Monday, March 16. 2009
Printer Friendly
The OSGEO Toronto Sprint
The OSGEO C-Camp Toronto Sprint was fun, although Leo and I couldn't stay for the whole event. I've never seen people close bugs so quickly.
Paul Ramsey
and Mark Cave-Ayland were on a marathon run in the PostGIS ring. Olivier Courtin was also following not too far behind with SVG bug fixes and so forth. We also
discussed the possiblity of having ST_GeomFromGML, ST_GeomFromGeoJSON, ST_GeomFromKml and so forth and what that would entail.
It was great to meet Pierre Racine of WKT Raster fame in person and chat with Mateusz and Sandro Santilli via IRC. Frank Warmerdam, the GDAL god came to our table to provide his big two cents
about how WKT Raster meta data should be stored, dealing with large RASTERS and other things I didn't understand.
Mark Leslie in Australia did his part too, though he wasn't present -- he would come into IRC when others had fallen asleep. Such is the way with timezones. He has been working on beefing up the
curved support in PostGIS. The FOSS 4G 2009 conference will be in Sydney, Australia.
It was nice to be able to put a face to these people I've talked via newsgroups. It was also strange since most of the clients and many of the people we work
with we have never met, so the idea of meeting in person has become a very foreign concept for us.
Continue reading "PostGIS Raster and More"
Friday, March 13. 2009
Printer Friendly
In our PLPython Part 4: PLPython meets aggregates we demonstrated the power of PostgreSQL in combination
with PLPython to create a sequence of ascii plots by using the power of aggregation. Our friend Simon Greener over at Spatial Db Advisor told me the example
was clever but ascii plots was so 70ish and ugly, why didn't I do SVG plots? He felt people being predominantly visual would be caught up in the ugliness of Ascii plots and miss the point.
At least Bob found them neat and can appreciate the elegance in simplicity, thanks Bob..
To make it up
to Simon, we shall demonstrate 2 features we left out in our prior article.
- Pretty SVG graphs -- so I don't have to hear more about SVG from Simon and how I'm caught in the 70s.
- Ability to plot a schedule -- schedule item, start date, end date using a multi-column aggregate function. Here is a good use for multi-column aggregates or at least I think so. (Yeh for Multi-column aggregates)
If you are into spatial processing in the database or how many ways you can play with XML particularly Oracle Locator, Oracle Spatial, PostGIS and SQL Server 2008, we highly recommend Simon's articles.
Continue reading "PLPython Part 5: PLPython meets PostgreSQL Multi-column aggregates and SVG plots"
|