Thursday, July 29. 2010
Printer Friendly
One of the new features of PostgreSQL 9.0 is the ability to specify the format of an explain plan.
In prior versions your only choice was text (and graphic explain with tools like PgAdmin III and other GUIS), but in 9.0 on, you have the additional options of
Javascript Object Notation (JSON) which some people have a thing or two to say about them, YAML Ain't Markup Language (YAML) or eXtended Markup Language (XML). The new explain options are itemized in PostgreSQL 9.0 EXPLAIN.
The main benefit of the JSON, XML, YAML formats is that they are easier
to machine parse than the default text version. This will allow for creative renderings of planner trees with minimal coding.
In Part 1 of this series, we'll demonstrate how to output the plans in these various formats and what they look like.
In later parts of this series -- we'll demonstrate how to use Javascript, XSL and other scripting/markup languages
to transform these into works of art you can hang on your wall.
-- START POSTGIS IN ACTION ASIDE --
We just submitted the third major revision of Chapter 3 Data Modeling
of our upcoming PostGIS in Action book.
The second major revision we never submitted and threw it out because it wasn't worldly enough and was too involved. We may
use it later on for an example.
Chapter 3 should be up on Manning Early Access Program (MEAP) soon. If you haven't bought the book yet Buy now.
You don't want to miss out on a major masterpiece in the making. Okay we exaggerate a bit.
-- END POSTGIS IN ACTION ASIDE --
Continue reading "Explain Plans PostgreSQL 9.0 Text, JSON, XML, YAML - Part 1: You Choose"
Monday, July 05. 2010
Printer Friendly
Gathering from the number of hits we got from our What's new in PostgreSQL 9.0,
and the large number of slashdot responses we got
as well as the fair number of reddit responses,
I guess a lot of people are really excited about the upcoming PostgreSQL 9.0 or at least
have a lot of opinions about what is still missing in it.
For this discussion, we would like to point out one of the companion adminstration tools that
will be packaged in with PostgreSQL 9.0 (and currently packaged in beta 2). This is PgAdmin III, which
we will affectionately refer to as the Administrative tool for mere mortals. It is the first administrative
tool that most users new to PostgreSQL use and gives them a user-friendly interface to the
power behind PostgreSQL. I would say if it were not for this tool and its web cousin PhpPgAdmin, many
a scared newbie user would be running away at the vast unencumbered freedom that PostgreSQL/psql and sibling commandline tools offer.
Continue reading "What is new in PgAdmin III 1.12.0"
Tuesday, June 22. 2010
Printer Friendly
I know a lot has been said about this beautiful value we affectionately call NULL, which is neither here nor there and that manages to catch many of us
off guard with its casual neither here nor thereness. Database analysts who are really just back seat mathematicians in disguise like to philosophize
about the unknown and pat themselves on the back when they feel they have mastered the unknown better than any one else. Of course database spatial analysts, the worst kind of back seat mathematicians,
like to talk not only about NULL but about EMPTY and compare notes with their brethren and write dissertations
about what to do about something that is neither here nor there
but is more known than the unknown, but not quite as known as the empty string.
Okay getting to the point, one of our clients asked us about a peculiar problem they had with a query, and the strange results they were getting. We admit this still manages to catch us off guard every once in a while.
Continue reading "NOT IN NULL Uniqueness trickery"
Wednesday, June 02. 2010
Printer Friendly
One of the coolest features of PostgreSQL is the ability to write functions using plain old
SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural mumbo jumbo like
loops and what not. This is cool for two reasons:
- Plain old SQL is the simplest to write and most anyone can write one and is just what the doctor ordered in many cases. PostgreSQL even allows you to write
aggregate functions with plain old SQL. Try to write an aggregate function in SQL Server
you've got to pull out your Visual Studio this and that and do some compiling and loading and you better know C# or VB.NET. Try in MySQL and you better learn C.
Do the same in PostgreSQL (you have a large choice of languages including SQL) and the code is simple to write. Nevermind
with MySQL and SQL Server, you aren't even allowed to do those type of things on a shared server or a server where the IT department is paranoid. The closest
with this much ease would be Oracle, which is unnecessarily verbose.
- Most importantly -- since it is just SQL, for simple user-defined functions, a PostgreSQL sql function can often be in-lined into the overall query plan since
it only uses what is legal in plain old SQL.
This inlining feature is part of the secret sauce that makes PostGIS fast and easy to use.
So instead of writing geom1 && geom2 AND Intersects(geom1,geom2) -- a user can write
ST_Intersects(geom1,geom2) . The short-hand is even more striking when you think of the ST_DWithin function.
With an inlined function, the planner has visibility into the function and breaks apart the
spatial index short-circuit test && from the more exhaustive absolute test Intersects(geom1,geom2)
and has great flexibility in reordering the clauses in the plan.
Continue reading "STRICT on SQL Function Breaks In-lining Gotcha"
Friday, April 23. 2010
Printer Friendly
Fixed width data is probably the most annoying data to import because you need some mechanism to break the columns at the column boundaries. A lot of people bring this kind of data
into a tool such as OpenOffice, Excel or MS Access, massage it into a delimeted format and then pull it in with PostgreSQL copy command or some other means. There is another way and one that doesn't require
anything else aside from what gets packaged with PostgreSQL. We will demonstrate this way.
Its quite simple. Pull each record in as a single column and then spit it into the columns you want with plain old SQL. We'll demonstrate this by importing Census data places fixed width file.
Although this technique we have is focused on PostgreSQL, its pretty easy to do the same steps in any other relational database.
Both David Fetter and Dimitri Fontaine have demonstrated other approaches of doing this as well
so check theirs out.
UPDATE
Continue reading "Import fixed width data into PostgreSQL with just PSQL"
Thursday, March 04. 2010
Printer Friendly
This is a rebuttal to depesz's charx, varcharx, varchar, and text
and David Fetter's varchar(n) considered harmful.
I respect both depesz and David and in fact enjoy reading their blogs. We just have deferring opinions on the topic.
For starters, I am pretty tired of the following sentiments from some PostgreSQL people:
- 99% of the people who choose varchar(x) over text in PostgreSQL in most cases
are just ignorant folk and don't realize that text is just as fast if not faster than varchar in PostgreSQL.
- stuff your most despised database here compatibility is not high on my priority list.
- It is unfortunate you have to work with the crappy tools you work with that can't see the beauty in PostgreSQL text implementation.
Just get something better that treats PostgreSQL as the superior creature it is.
Continue reading "In Defense of varchar(x)"
Thursday, January 21. 2010
Printer Friendly
Every once in a while, especially if you have a fairly large database, you may find the need to do select backups of certain tables.
Your criteria might be based on name or how relatively recently data has changed in the table.
Below are some of the tricks we use. Some use our favorite hack of scripting command line scripts with SQL.
Continue reading "Making backups of select tables"
Friday, January 01. 2010
Printer Friendly
UPDATE: Since this article was written, Yum now comes packaged with PostGIS 1.5. Please refer to PostGIS Yum Install for PostgreSQL 9.0 if you want to go with a faster more idiot proof install process
A couple of people have asked this, so thought we would address the common issues people run into
with compiling PostGIS.
The first question, some will ask is Why don't you just take the packaged PostGIS that comes with the PostgreSQL
Yum repository?
There are two reasons for not installing PostGIS under using Yum and our general reasons for not.
- We like running a newer GEOS than what is generally packaged with Yum because certain PostGIS functionality
is not enabled with older versions of GEOS. For example GEOS 3.1 introduced a couple of new functions like ST_Covers and
ST_CoveredBy, ST_ContainedProperly
the Cascaded Union and prepared geometry speed enhancements require GEOS 3.1+. GEOS 3.2 brought more robust support for dealing with topological
exceptions, new buffering features like one-sided buffer (which the upcoming PostGIS 1.5 exposes if and only if you are running
GEOS 3.2+), faster buffering, and several memory leak cleanups. We highly recommend using GEOS 3.2.0. It is just a more robust product than prior GEOS versions.
- We generally like to run newer versions of PostGIS than what Yum provides and also run multiple versions of PostGIS
on the same server. This is a bit tricky to do with Yum since it only exposes one and generally an older one than what we
care for. PostGIS 1.5 will be especially great since it will have the geodetic support and cool functions like ST_ClosestPoint
and other major goodies.
NOTE: PostGIS 1.3, PostGIS 1.4, PostGIS 1.5 can coexist on the same PostgreSQL daemon service as long as they are installed in different databases. They will all share the same Proj and GEOS. So installing PostGIS 1.5 will not break your
PostGIS 1.4 or 1.3 installs. The newer GEOS 3.2.0 C-API is backward compatible with older GEOS C-API and the C-API is what PostGIS uses.
Continue reading "Compiling PostGIS 1.5 and installing after Yum PostgreSQL Install"
Saturday, November 28. 2009
Printer Friendly
In this exercise, we'll go thru our steps for upgrading a Redhat Enterprise Linux 64-bit PostgreSQL
box from PostgreSQL 8.3 to PostgreSQL 8.4. If you don't have any kind of PostgreSQL installed on your box,
you can skip the Upgrade step.
UPDATE - since Devrim's move from Command prompt - he has started a new yum repository. You may want to use
this one instead since it seems more up to date than the other. http://yum.pgrpms.org/
UPDATEWe have instructions for installing PostgreSQL 9.0 via yum.
Updgrading from PostgreSQL 8.* to PostgreSQL 8.4
If you are starting from scratch -- just skip this section.
If you are upgrading from 8.4.0 to 8.4.1 you can get away with a simple
yum update postgresql and skip the rest of this article.
If you are upgrading from PostgreSQL 8.3 to 8.4, in theory you can use PgMigrator, but in practice,
particularly with a Yum install, you are bound to run into obstacles. If you are running an older version, you must dump and restore.
Continue reading "An almost idiot's guide to Install and Upgrade to PostgreSQL 8.4 with Yum"
Monday, October 26. 2009
Printer Friendly
This is an unfortunate predicament that many people find themselves in and does cause a bit of frustration. You bring in some tables into your PostgreSQL
database using some column name preserving application, and the casings are all preserved from the source data store. So now you have to quote all the fields
everytime you need to use them. In these cases, we usually rename the columns to be all lower case using a script. There are two approaches we have seen/can think of for doing this
one to run a script that generates the appropriate alter table statements and the other is to update the pg_attribute system catalog table directly.
Continue reading "Lowercasing table and column names"
Sunday, August 16. 2009
Printer Friendly
A very long time ago, we wrote an article on how to use PostgreSQL to show the fully qualified name of an item in an inventory tree.
Basically we were modeling a paper products tree. The original article can be found here Using PostgreSQL User-Defined Functions to solve the Tree Problem and was based on PostgreSQL 7.4 technology.
We'll repeat the text here for completeness and demonstrate the PostgreSQL 8.4 that solves this and more efficiently.
Continue reading "Using Recursive Common table expressions to represent Tree structures"
Saturday, August 15. 2009
Printer Friendly
Comparison of PostgreSQL 8.4, Microsoft SQL Server 2008, MySQL 5.1
In our May 2008 issue of Postgres OnLine Journal, we cross compared Microsoft SQL Server 2005, MySQL 5, and PostgreSQL 8.3.
Some people mentioned well since 8.4 has now come out, shouldn't we go back and update the reference. We deliberated and decided not to.
To be fair all 3 products have released new versions, so it would seem unfair to compare a newer PostgreSQL against older versions of MS SQL Server and MySQL.
We have therefore decided to repeat our exercise and include parts people felt we should have covered, as well as comparing the latest and greatest stable release of each product.
People ask us time and time again what's the difference why should you care which database you use. We will
try to be very fair in our comparison. We will show equally how PostgreSQL sucks compared to the others. These are the items we
most care about or think others most care about. There are numerous other differences if you get deep into the trenches of each.
Continue reading "Cross Compare of PostgreSQL 8.4, SQL Server 2008, MySQL 5.1"
Thursday, July 16. 2009
Printer Friendly
Common table expressions are perhaps our favorite feature in PostgreSQL 8.4 even more so than windowing functions. Strangely enough I find myself using them more in SQL Server too now that PostgreSQL supports it.
CTEs are not only nice syntactic sugar, but they also produce better more efficient queries. To our knowledge only Firebird (see note below), PostgreSQL,SQL Server, and IBM DB2 support this, though I heard somewhere
that Oracle does too or is planning too UPDATE: As noted below Oracle as of version 9 supports non-recursive CTEs. For recursion you need to use the Oracle proprietary corresponding by syntax.
As far as CTEs go, the syntax between PostgreSQL, SQL Server 2005/2008, IBM DB2 and Firebird
is pretty much the same when not using recursive queries. When using recursive queries, PostgreSQL and Firebird use WITH RECURSIVE to denote a recursive CTE where as SQL Server and IBM DB2 its just WITH.
All 4 databases allow you to have multiple table expressions within one WITH clause anda RECURSIVE CTE expression can have both recursive and non-recursive CTEs. This makes writing complex queries especially where you have the same expressions used multiple times in the query,
a lot easier to debug and also more performant.
In our article on How to force PostgreSQL to use a pre-calculated value
we talked about techniques for forcing PostgreSQL to cache a highly costly function. For PostgreSQL 8.3 and below, the winning solution was using OFFSET which is not terribly cross platform and has the disadvantage of
materializing the subselect. David Fetter had suggested
for 8.4, why not try CTEs. Yes CTEs not only are syntactically nice, more portable, but they help you write more efficient queries. To demonstrate, we shall repeat the same exercise we did in that
article, but using CTEs instead.
Continue reading "PostgresQL 8.4: Common Table Expressions (CTE), performance improvement, precalculated functions revisited"
Monday, July 13. 2009
Printer Friendly
One of the very handy features introduced in PostgreSQL 8.4 is the new aggregate function called array_agg which is a companion function to the unnest function we discussed earlier. This
takes a set of elements similar to what COUNT, SUM etc do and builds an array out of them. This approach is faster than the old used array_append , array_accum since it does not rebuild the array on each iteration.
Sadly it does not appear to be completely swappable with array_append as there does not seem to be a mechanism to use it to build your own custom aggregate functions that need to maintain the set of objects flowing thru the aggregate without venturing into C land. This we tried to do
in our median example but were unsuccessful.
In PostGIS 1.4 Paul borrowed some of this array_agg logic to make the
PostGIS spatial aggregates much much faster with large numbers of geometries. So collecting polygons or making a line out of say 30,000 geometries which normally would have taken 2 minutes or more (just accumulating), got reduced to under 10 seconds in many cases.
That did require C code even when installed against PostgreSQL 8.4. Though in PostGIS you reap the benefits as far as geometries go even
if you are running lower than 8.4.
We had originally thought array_agg was a PostgreSQL only creation, but it turns out that array_agg is a function defined in the ANSI SQL:2008 specs and for one appears to exist in IBM DB2 as well. I don't think
Oracle or any other database supports it as of yet.
As we had demonstrated in the other article, we shall demonstrate the olden days and what array_agg brings to the table to make your life easier.
Continue reading "PostgreSQL 8.4 Faster array building with array_agg"
Thursday, July 09. 2009
Printer Friendly
In this issue we shall be celebrating the arrival of PostgreSQL 8.4 by showcasing the new treats
that PostgreSQL 8.4 has to offer. Although 8.4 has some nice big ticket items like Windowing Functions which we
briefly covered numerous times and Common Table Expressions, it also has some small ticket items. These small ticket items
while small, are perhaps more useful than even the big ticket ones because they are more commonly used constructs.
In this article we shall introduce the new unnest() function which makes converting an array to a table like structure
not only easier, but much more efficient. We will also be covering the new enhancements to our favorite function the
generate_series().
Continue reading "PostgreSQL 8.4 unnest and generate_series"
|