Saturday, August 15. 2009
Printer Friendly
Comparison of PostgreSQL 8.4, Microsoft SQL Server 2008, MySQL 5.1
In our May 2008 issue of Postgres OnLine Journal, we cross compared Microsoft SQL Server 2005, MySQL 5, and PostgreSQL 8.3.
Some people mentioned well since 8.4 has now come out, shouldn't we go back and update the reference. We deliberated and decided not to.
To be fair all 3 products have released new versions, so it would seem unfair to compare a newer PostgreSQL against older versions of MS SQL Server and MySQL.
We have therefore decided to repeat our exercise and include parts people felt we should have covered, as well as comparing the latest and greatest stable release of each product.
People ask us time and time again what's the difference why should you care which database you use. We will
try to be very fair in our comparison. We will show equally how PostgreSQL sucks compared to the others. These are the items we
most care about or think others most care about. There are numerous other differences if you get deep into the trenches of each.
Continue reading "Cross Compare of PostgreSQL 8.4, SQL Server 2008, MySQL 5.1"
Thursday, July 16. 2009
Printer Friendly
Common table expressions are perhaps our favorite feature in PostgreSQL 8.4 even more so than windowing functions. Strangely enough I find myself using them more in SQL Server too now that PostgreSQL supports it.
CTEs are not only nice syntactic sugar, but they also produce better more efficient queries. To our knowledge only Firebird (see note below), PostgreSQL,SQL Server, and IBM DB2 support this, though I heard somewhere
that Oracle does too or is planning too UPDATE: As noted below Oracle as of version 9 supports non-recursive CTEs. For recursion you need to use the Oracle proprietary corresponding by syntax.
As far as CTEs go, the syntax between PostgreSQL, SQL Server 2005/2008, IBM DB2 and Firebird
is pretty much the same when not using recursive queries. When using recursive queries, PostgreSQL and Firebird use WITH RECURSIVE to denote a recursive CTE where as SQL Server and IBM DB2 its just WITH.
All 4 databases allow you to have multiple table expressions within one WITH clause anda RECURSIVE CTE expression can have both recursive and non-recursive CTEs. This makes writing complex queries especially where you have the same expressions used multiple times in the query,
a lot easier to debug and also more performant.
In our article on How to force PostgreSQL to use a pre-calculated value
we talked about techniques for forcing PostgreSQL to cache a highly costly function. For PostgreSQL 8.3 and below, the winning solution was using OFFSET which is not terribly cross platform and has the disadvantage of
materializing the subselect. David Fetter had suggested
for 8.4, why not try CTEs. Yes CTEs not only are syntactically nice, more portable, but they help you write more efficient queries. To demonstrate, we shall repeat the same exercise we did in that
article, but using CTEs instead.
Continue reading "PostgresQL 8.4: Common Table Expressions (CTE), performance improvement, precalculated functions revisited"
Monday, July 13. 2009
Printer Friendly
One of the very handy features introduced in PostgreSQL 8.4 is the new aggregate function called array_agg which is a companion function to the unnest function we discussed earlier. This
takes a set of elements similar to what COUNT, SUM etc do and builds an array out of them. This approach is faster than the old used array_append , array_accum since it does not rebuild the array on each iteration.
Sadly it does not appear to be completely swappable with array_append as there does not seem to be a mechanism to use it to build your own custom aggregate functions that need to maintain the set of objects flowing thru the aggregate without venturing into C land. This we tried to do
in our median example but were unsuccessful.
In PostGIS 1.4 Paul borrowed some of this array_agg logic to make the
PostGIS spatial aggregates much much faster with large numbers of geometries. So collecting polygons or making a line out of say 30,000 geometries which normally would have taken 2 minutes or more (just accumulating), got reduced to under 10 seconds in many cases.
That did require C code even when installed against PostgreSQL 8.4. Though in PostGIS you reap the benefits as far as geometries go even
if you are running lower than 8.4.
We had originally thought array_agg was a PostgreSQL only creation, but it turns out that array_agg is a function defined in the ANSI SQL:2008 specs and for one appears to exist in IBM DB2 as well. I don't think
Oracle or any other database supports it as of yet.
As we had demonstrated in the other article, we shall demonstrate the olden days and what array_agg brings to the table to make your life easier.
Continue reading "PostgreSQL 8.4 Faster array building with array_agg"
Thursday, July 09. 2009
Printer Friendly
In this issue we shall be celebrating the arrival of PostgreSQL 8.4 by showcasing the new treats
that PostgreSQL 8.4 has to offer. Although 8.4 has some nice big ticket items like Windowing Functions which we
briefly covered numerous times and Common Table Expressions, it also has some small ticket items. These small ticket items
while small, are perhaps more useful than even the big ticket ones because they are more commonly used constructs.
In this article we shall introduce the new unnest() function which makes converting an array to a table like structure
not only easier, but much more efficient. We will also be covering the new enhancements to our favorite function the
generate_series().
Continue reading "PostgreSQL 8.4 unnest and generate_series"
Wednesday, July 01. 2009
Printer Friendly
PostgreSQL 8.4 has ANSI SQL:2003 window functions support. These are often classified under the umbrella terms of basic Analytical or Online Application Processing (OLAP) functions.
They are used most commonly for producing cumulative sums, moving averages and generally rolling calculations that need to look at a subset of the overall dataset (a window frame of data) often relative to a particular row.
For users who use SQL window constructs extensively, this may have been one reason in the past to not to give PostgreSQL a second look. While you may not
consider PostgreSQL as a replacement for existing projects because of the cost of migration, recoding and testing, this added new feature is definitely a selling point
for new project consideration.
If you rely heavily on windowing functions, the things you probably want to know most about the new PostgreSQL 8.4 offering are:
- What SQL window functionality is supported?
- How does PostgreSQL 8.4 offering compare to that of the database you are currently using?
- Is the subset of functionality you use supported?
To make this an easier exercise we have curled thru the documents of the other database vendors to distill what the SQL Windowing functionality they provide in their core product.
If you find any mistakes or ambiguities in the below please don't hesitate to let us know and we will gladly amend.
For those who are not sure what this is and what all the big fuss is about, please read our rich commentary on the topic of window functions.
Continue reading "Window Functions Comparison Between PostgreSQL 8.4, SQL Server 2008, Oracle, IBM DB2"
Tuesday, June 30. 2009
Printer Friendly
We have covered this briefly before, but its an important enough concept to cover again in more detail.
Problem: You are running out of disk space on the drive you keep PostgreSQL data on
Solution:
Create a new tablespace on a separate drive and move existing tables to it, or create a new tablespace and use for future tables.
What is a tablespace and how to create a tablespace
A tablespace in PostgreSQL is similar to a tablespace in Oracle and a filegroup in SQL Server. It segments a piece of physical disk space
for use by the PostgreSQL process for holding data. Below are steps to creating a new tablespace. Tablespaces have existed since PostgreSQL 8.0.
More about tablespaces in PostgreSQL is outlined in
the manual PostgreSQL 8.3 tablespaces
While it is possible to create a table index on a different tablespace from the table, we won't be covering that.
Continue reading "Managing disk space using table spaces"
Tuesday, June 09. 2009
Printer Friendly
You'll often hear the term planner statistics thrown around by database geeks. Did you update your statistics. This lingo isn't even limited
to PostgreSQL, but is part and parcel to how most decent databases work. For example in PostgreSQL you do a vacuum analyze to update your planner statistics in addition
to cleaning up dead space. In SQL Server you do an UPDATE STATISTICS . In MySQL you do an
ANALYZE TABLE or a more invasive OPTIMIZE TABLE .
Normally all this "update your stats so your planner can be happy" is usually unnecessary unless
you just did a bulk load or a bulk delete or you are noticing your queries are suddenly slowing down. These stat things are generally updated behind the scenes by most databases
on an as needed basis.
What makes SQL really interesting and a bit different from procedural languages is that it is declarative (like functional and logical programming languages) and relies on the database planner to come up with strategies for navigating the data. Its strategy is not fixed as it is in procedural languages.
A big part of this strategy is decided on by the query planner which looks at distributions of data. Given different WHERE conditions for similar queries, it could come up with vastly different strategies if one value has a significantly
higher distribution in a table than another. This is also the mystery of why it sometimes refuses to use an index on a field because it has decided a table scan is more efficient and also why some people consider HINTS evil because they pollute the imperative nature of the language.
Continue reading "Planner Statistics"
Wednesday, May 27. 2009
Printer Friendly
One thing that is pretty neat about windowing functions in PostgreSQL 8.4 aside from built-in windowing functions (row_number(), rank(), lead(), lag(), dense_rank(), percent_rank(), cume_dist(), first_value, last_value, nth_value) as documented in the manual Windowing Functions is that you can use windows with most aggregate functions (built-in or custom defined) as well as define your own specific windowing functions. In a later article, we'll demonstrate creating custom windowing functions.
In our PGCon 2009 PostGIS presentation one of the last slides demonstrates using lead() and lag() windowing functions
to show a family's income level in the same record with the income levels of the next door neighbors in the fictitious town we created. This is not terribly useful unless you live
in a somewhat dysfunctional neighborhood where everyone is concerned about how rich their neighbors are compared to themselves. Our town was very dysfunctional but mostly geographically dysfunctional. We will have much more useful use cases of this as applied to GIS in our upcoming PostGIS in Action book.
Hitoshi Harada and David Fetter did a presentation of this in PGCon 2009 which sadly we missed since we were giving our own presentation.
Check out the PGCon2009 PostgreSQL 8.4 Windowing Functions Video. Also check out the slides at Introducing Windowing Functions.
Those who have used SQL Server 2005+, Oracle or IBM DBII are probably familar or have run into examples of Windowing functions in those products. Windowing in PostgreSQL 8.4 works more or less the same way. In a prior article, we demonstrated how to return running totals and sums using rudimentary SQL.
To precelebrate the eminent arrival of PostgreSQL 8.4 and the current PostgreSQL 8.4 beta 2 release, we shall demonstrate the same exercise using the new ANSI SQL:2003 Windowing functionality built
into the upcoming PostgreSQL 8.4.
Continue reading "Running totals and sums using PostgreSQL 8.4 Windowing functions"
Thursday, March 05. 2009
Printer Friendly
PostgreSQL 8.2 and above has this pretty neat feature of allowing you to define aggregate functions
that take more than one column as an input. First we'll start off with a rather pointless but easy to relate to example and then we'll follow up with something
a bit more interesting.
For more examples of creating aggregates in PostgreSQL, check out our other articles:
Continue reading "How to create multi-column aggregates"
Tuesday, January 06. 2009
Printer Friendly
Both Josh Berkus and Hubert made blog entries about our last excerpt. In general I will start off by saying that we are more or less in agreement on what is good SQL coding format.
Here are the things I think we can all agree on
- SQL Keywords should be upper cased
- Field names should be prefixed with their tables especially when you have a multi-join statement involved
- Use JOIN syntax instead of stuffing everything in the WHERE though we are highly against just saying JOIN. We prefer INNER JOIN
The major points of contention I think are
- Should you use aliases over tables and if you use aliases should you keep them short or more descriptive. Josh thinks table names should be used where possible and when aliases are used they should be longer than a few characters
and Hubert agrees with us that short aliases are fine and in fact desirable. I think we all agree aliases should be meaningful when used, but our idea of what it means to be meaningful is a little different.
- In use of JOIN syntax -- we prefer using INNER JOIN instead of using just JOIN and in fact find it quite irritating that PostgreSQL rewrites our INNERs as JOIN. I suspect Hubert and Josh and many other PostgreSQL folk are on
the other side of the fence on this. The reason we feel strongly about this is there are so many kinds of JOINS - INNER JOIN, RIGHT JOIN, LEFT JOIN, CROSS JOIN, FULL JOIN, and the horrid NATURAL JOIN (that should be shot and put out of its misery). To just say JOIN to us is just confusing.
- While you can write LEFT OUTER JOIN, the OUTER is kind of pointless because no one goes around writing LEFT INNER JOINS
- Use well supported standards where possible. This means CURRENT_TIMESTAMP instead of now(). now() is not in all databases, but most relational databases support CURRENT_TIMESTAMP
Continue reading "SQL Coding Standards To Each His Own Part II"
Wednesday, December 31. 2008
Printer Friendly
I was reading Josh Berkus last blog post and was intrigued by his last post
Writing Maintainable Queries Part I.
He is right that lots has been said about coding standards in other languages and even right out holy wars have been launched on the
subject, but as far as SQL goes, not quite enough has been said on the subject for us to have a great war to end all wars.
I was also happy to see that we agreed with all his points except his first one. Yes I felt dissed, and thought hmm
if someone as important as Josh thinks our aliases should be very descriptive and we should use the table name rather than
the alias where possible, surely there must be something wrong with me for not believing in this fundamental philosophy.
In the rest of this excerpt I shall make fun of Josh and also set forth some of our own SQL Coding guidelines. Hopefully
Josh won't take too much offense at this small jibe.
Continue reading "SQL Coding Standards To Each His Own"
Wednesday, November 26. 2008
Printer Friendly
We had the pleasure of doing a fresh install of PostgreSQL 8.3.5 on RedHat EL4 box and when using the Yum repository, we noticed a couple of changes from last time we did this.
This could have been an oversight in our documentation before.
Changes to Yum Install for 8.3.5?
In our April 2008 issue we had An Almost Idiot's Guide to PostgreSQL YUM
and that article still seems to be surprisingly popular.
In the first step we had:
yum install postgresql
and that as I recall installed the postgresql server in addition to some client libraries.
For 8.3.5 fresh install it seems they are separated and to get the postgresql server you need to do:
yum install postgresql
yum install postgresql-server
Continue reading "Yum addendum for 8.3.5 and PgAgent"
Monday, November 17. 2008
Printer Friendly
Sometimes when you are testing or setting up a server or just porting things to another server, you just want to install the same set of users as you had before without restoring any databases.
In PostgreSQL, the users (Login Roles) and group roles are stored at the server level and only the permissions to objects are stored at the database level.
Question: How do you restore just the users and roles without having to do a full pg_dumpall of your server?
Continue reading "Backing up Login Roles aka Users and Group Roles"
Wednesday, September 24. 2008
Printer Friendly
We've been playing around with the snapshot builds of PgAdmin III 1.9 and would like to summarize some
of the new nice features added. PgAdmin III 1.9 has not been released yet, but has a couple of neat features brewing.
For those interested in experimenting with the snapshot builds and src tarballs, you can download them from http://www.pgadmin.org/snapshots/
Continue reading "PgAdmin III 1.9 First Glance"
Sunday, September 07. 2008
Printer Friendly
One of the nice things about the PostgreSQL command-line restore tool is the ease with which you can restore
select objects from a backup. We tend to use schemas for logical groupings which are partitioned by context, time, geography etc.
Often times when we are testing things, we just want to restore one schema or set of tables from our backup because restoring a 100 gigabyte database
takes a lot of space, takes more time and is unnecessary for our needs. In order to be able to accomplish such a feat, you need to
create tar or compressed (PG custom format) backups. We usually maintain PG custom backups of each of our databases.
Continue reading "How to restore select tables, select objects, and schemas from Pg Backup"
|