Wednesday, June 02. 2010
Printer Friendly
One of the coolest features of PostgreSQL is the ability to write functions using plain old
SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural mumbo jumbo like
loops and what not. This is cool for two reasons:
- Plain old SQL is the simplest to write and most anyone can write one and is just what the doctor ordered in many cases. PostgreSQL even allows you to write
aggregate functions with plain old SQL. Try to write an aggregate function in SQL Server
you've got to pull out your Visual Studio this and that and do some compiling and loading and you better know C# or VB.NET. Try in MySQL and you better learn C.
Do the same in PostgreSQL (you have a large choice of languages including SQL) and the code is simple to write. Nevermind
with MySQL and SQL Server, you aren't even allowed to do those type of things on a shared server or a server where the IT department is paranoid. The closest
with this much ease would be Oracle, which is unnecessarily verbose.
- Most importantly -- since it is just SQL, for simple user-defined functions, a PostgreSQL sql function can often be in-lined into the overall query plan since
it only uses what is legal in plain old SQL.
This inlining feature is part of the secret sauce that makes PostGIS fast and easy to use.
So instead of writing geom1 && geom2 AND Intersects(geom1,geom2) -- a user can write
ST_Intersects(geom1,geom2) . The short-hand is even more striking when you think of the ST_DWithin function.
With an inlined function, the planner has visibility into the function and breaks apart the
spatial index short-circuit test && from the more exhaustive absolute test Intersects(geom1,geom2)
and has great flexibility in reordering the clauses in the plan.
Continue reading "STRICT on SQL Function Breaks In-lining Gotcha"
Saturday, May 29. 2010
Printer Friendly
PostGIS, SQL Server 2008 R2, Oracle 11G R2
We just completed our compare of the spatial functionality of PostgreSQL 8.4/PostGIS 1.5, SQL Server 2008 R2, Oracle 11G R2 (both its built-in Locator and Spatial add-on).
Most of the compare is focused on what can be gleaned from the manual of each product.
In summary, all products have changed a bit since their prior versions. The core changes:
- PostGIS 1.5 has geodetic support now in the form of geography as well as some beefed up functions and additional distance functions like ST_ClosestPoint, ST_MaxDistance, ST_ShortestLine/LongestLine
- SQL Server 2008 R2 basic spatial support hasn't changed much when compared to SQL Server 2008, but there is a lot more integration going on integrating Spatial into reporting services, Share Point and just integration
in general with SQL Server 2008 R2 and the Office 2010 stack.
- Oracle 11G R2 - has finally offered an uninstall script for Locator folks who do not care to break the law by accidentally using functions only licensed in Oracle spatial,
but innocently exposed in Oracle Locator. If all that were not great enough, you are now allowed to legally do a centroid if you are using Oracle Locator. Doing unions, intersections, and differences is still a legal no no for Oracle Locator folks.
Oracle now provides Affine transform functions, which have long been provided by PostGIS and have been available via the MPL licensed CLR Spatial package of SQL Server 2008.
I still haven't figured out where this R2 convention started. I thought it was just a Microsoft thing, but I see Oracle follows the same convention as well.
Continue reading "PostGIS, SQL Server, Oracle spatial compares and other news"
Monday, May 17. 2010
Printer Friendly
For those people coming from Oracle, SQL Server and MySQL or other databases that have soundex functionality,
you may be puzzled, or even frustrated when you try to do
something like
WHERE soundex('Wushington') = soundex('Washington')
in PostgreSQL and get a function does not exist error.
Well it does so happen that there is a soundex function in PostgreSQL, and yes it is
also called soundex, but is offered as a contrib module and not installed by default. It also has other fuzzy string matching functions in addition to soundex.
One of my favorites, the levenshenstein distance function is included as well. In this article
we'll be covering the contrib module packaged as fuzzystrmatch.sql. Details of the module can be found in FuzzyStrMatch.
The contrib module has been around for sometime, but has changed slightly from PostgreSQL version to PostgreSQL version. We are covering the 8.4 version in this article.
For those unfamiliar with soundex, its a basic approach developed by the US Census in the 1930s as a way of sorting
names by pronounciation. Read Census and Soundex for more gory history details.
Given that it is an approach designed primarily for the English alphabet, it sort of makes sense why its not built-in to PostgreSQL,
which has more of a diverse international concern. For example if you used it to compare two words in Japanese or Chinese,
don't think it would fair too well in any of the database platforms that support this function.
The original soundex algorithm has been improved over the years. Though its still the most common used today, newer variants
exist called MetaPhone developed in the 1990s and Double Metaphone (DMetaPhone) developed in 2000 that support additional
consonants in other languages such as Slavic, Celtic, Italian, Spanish etc.
These two variants are also included in the fuzzystrmatch contrib library. The soundex function still seems to be
the most popularly used at least for U.S. This is perhaps because most of the other databases (Oracle, SQL Server, MySQL) have soundex built-in but not the metaphone variants.
So in a sense soundex is a more portable function. The other reason is that metaphone and dmetaphone take up a bit more space and
are also more processor intensive to compute than soundex. We'll demonstrate some differences between them in this article.
To enable soundex and the other fuzzy string matching functions included, just run the
share/contrib/fuzzystrmatch.sql located in your PostgreSQL install folder. This library is an important piece of arsenal for geocoding and genealogy tracking particularly
the U.S. streets and surnames data sets. I come from a long line of Minors, Miners, Burnettes and Burnets.
For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
Continue reading "Where is soundex and other warm and fuzzy string things"
Thursday, April 01. 2010
Printer Friendly
Today Microsoft unveiled their top secret project code named CatchMe.
This is their new flagship database for Linux and Unix based on predominantly the
PostgreSQL 9.0 code base, but with an emulation layer that makes it behave like SQL Server
2008 R2. Unlike the Windows SQL Server 2008 R2 product, this version is completely free and
open source under the Microsoft Public License (Ms-PL). Downloads for the RCs of these will be available soon. Please stay tuned.
Reporter Dat A. Base managed to get an exclusive interview with the head of the
project, Quasi Modo. The transcript follows:
Continue reading "CatchMe - Microsoft SQL Server for Unix and Linux"
Thursday, March 04. 2010
Printer Friendly
This is a rebuttal to depesz's charx, varcharx, varchar, and text
and David Fetter's varchar(n) considered harmful.
I respect both depesz and David and in fact enjoy reading their blogs. We just have deferring opinions on the topic.
For starters, I am pretty tired of the following sentiments from some PostgreSQL people:
- 99% of the people who choose varchar(x) over text in PostgreSQL in most cases
are just ignorant folk and don't realize that text is just as fast if not faster than varchar in PostgreSQL.
- stuff your most despised database here compatibility is not high on my priority list.
- It is unfortunate you have to work with the crappy tools you work with that can't see the beauty in PostgreSQL text implementation.
Just get something better that treats PostgreSQL as the superior creature it is.
Continue reading "In Defense of varchar(x)"
Friday, November 06. 2009
Printer Friendly
The upcoming version of PostGIS - PostGIS 1.5 will be an exciting one. It has native geodetic support in the form of the new
geography type, similar in concept to SQL Server's geography support. For windows users, we have experimental binary builds hot off the presses for PostgreSQL 8.3 and 8.4
Continue reading "PostGIS does Geography"
Monday, September 07. 2009
Printer Friendly
One of the most common questions people ask is Which tools work with PostgreSQL. In a sense the measure of a database's
maturity/popularity are the number of vendors willing to produce management and development tools for it. Luckily there are a lot of vendors producing tools for PostgreSQL and the list is growing.
One set of tools people are interested in are Database administration, ER diagramming, Query tools, and quickie application generators (RAD).
For this issue of our product showcase, we will not talk about one product, but several that fit in the aforementioned category.
All the listed products work with PostgreSQL and can be used for database administration and/or architecting or provide some sort of
light reporting/rapid application building suite. By light reporting/application building, we mean
a tool with a simple wizard that a novice can use to build somewhat functional applications in minutes or days. This rules out all-purpose development
things like raw PHP, .NET, Visual Studio, database drivers etc. Things we consider in this realm are things like OpenOffice Base and
MS Access. Most of these tools are either free or have 30-day try before you buy options.
You can't really say one tool is absolutely better than another since each has its own strengths and caters to slightly different audiences and also
you may like the way one tool does one important thing really well, though it may be mediocre in other respects. We also left out a lot of products we are not familiar with and may have gotten
some things wrong.
If we left out your favorite product and you feel it meets these criteria, or you feel we made any errors, please let us know, and we'll add or correct it.
We will be including Free open source as well as proprietary products in this mix. If we left out what you consider an
important criteria, please let us know and we'll try to squeeze it in somewhere.
Continue reading "Database Administration, Reporting, and Light application development"
Thursday, July 16. 2009
Printer Friendly
Common table expressions are perhaps our favorite feature in PostgreSQL 8.4 even more so than windowing functions. Strangely enough I find myself using them more in SQL Server too now that PostgreSQL supports it.
CTEs are not only nice syntactic sugar, but they also produce better more efficient queries. To our knowledge only Firebird (see note below), PostgreSQL,SQL Server, and IBM DB2 support this, though I heard somewhere
that Oracle does too or is planning too UPDATE: As noted below Oracle as of version 9 supports non-recursive CTEs. For recursion you need to use the Oracle proprietary corresponding by syntax.
As far as CTEs go, the syntax between PostgreSQL, SQL Server 2005/2008, IBM DB2 and Firebird
is pretty much the same when not using recursive queries. When using recursive queries, PostgreSQL and Firebird use WITH RECURSIVE to denote a recursive CTE where as SQL Server and IBM DB2 its just WITH.
All 4 databases allow you to have multiple table expressions within one WITH clause anda RECURSIVE CTE expression can have both recursive and non-recursive CTEs. This makes writing complex queries especially where you have the same expressions used multiple times in the query,
a lot easier to debug and also more performant.
In our article on How to force PostgreSQL to use a pre-calculated value
we talked about techniques for forcing PostgreSQL to cache a highly costly function. For PostgreSQL 8.3 and below, the winning solution was using OFFSET which is not terribly cross platform and has the disadvantage of
materializing the subselect. David Fetter had suggested
for 8.4, why not try CTEs. Yes CTEs not only are syntactically nice, more portable, but they help you write more efficient queries. To demonstrate, we shall repeat the same exercise we did in that
article, but using CTEs instead.
Continue reading "PostgresQL 8.4: Common Table Expressions (CTE), performance improvement, precalculated functions revisited"
Wednesday, July 01. 2009
Printer Friendly
PostgreSQL 8.4 has ANSI SQL:2003 window functions support. These are often classified under the umbrella terms of basic Analytical or Online Application Processing (OLAP) functions.
They are used most commonly for producing cumulative sums, moving averages and generally rolling calculations that need to look at a subset of the overall dataset (a window frame of data) often relative to a particular row.
For users who use SQL window constructs extensively, this may have been one reason in the past to not to give PostgreSQL a second look. While you may not
consider PostgreSQL as a replacement for existing projects because of the cost of migration, recoding and testing, this added new feature is definitely a selling point
for new project consideration.
If you rely heavily on windowing functions, the things you probably want to know most about the new PostgreSQL 8.4 offering are:
- What SQL window functionality is supported?
- How does PostgreSQL 8.4 offering compare to that of the database you are currently using?
- Is the subset of functionality you use supported?
To make this an easier exercise we have curled thru the documents of the other database vendors to distill what the SQL Windowing functionality they provide in their core product.
If you find any mistakes or ambiguities in the below please don't hesitate to let us know and we will gladly amend.
For those who are not sure what this is and what all the big fuss is about, please read our rich commentary on the topic of window functions.
Continue reading "Window Functions Comparison Between PostgreSQL 8.4, SQL Server 2008, Oracle, IBM DB2"
Wednesday, May 27. 2009
Printer Friendly
One thing that is pretty neat about windowing functions in PostgreSQL 8.4 aside from built-in windowing functions (row_number(), rank(), lead(), lag(), dense_rank(), percent_rank(), cume_dist(), first_value, last_value, nth_value) as documented in the manual Windowing Functions is that you can use windows with most aggregate functions (built-in or custom defined) as well as define your own specific windowing functions. In a later article, we'll demonstrate creating custom windowing functions.
In our PGCon 2009 PostGIS presentation one of the last slides demonstrates using lead() and lag() windowing functions
to show a family's income level in the same record with the income levels of the next door neighbors in the fictitious town we created. This is not terribly useful unless you live
in a somewhat dysfunctional neighborhood where everyone is concerned about how rich their neighbors are compared to themselves. Our town was very dysfunctional but mostly geographically dysfunctional. We will have much more useful use cases of this as applied to GIS in our upcoming PostGIS in Action book.
Hitoshi Harada and David Fetter did a presentation of this in PGCon 2009 which sadly we missed since we were giving our own presentation.
Check out the PGCon2009 PostgreSQL 8.4 Windowing Functions Video. Also check out the slides at Introducing Windowing Functions.
Those who have used SQL Server 2005+, Oracle or IBM DBII are probably familar or have run into examples of Windowing functions in those products. Windowing in PostgreSQL 8.4 works more or less the same way. In a prior article, we demonstrated how to return running totals and sums using rudimentary SQL.
To precelebrate the eminent arrival of PostgreSQL 8.4 and the current PostgreSQL 8.4 beta 2 release, we shall demonstrate the same exercise using the new ANSI SQL:2003 Windowing functionality built
into the upcoming PostgreSQL 8.4.
Continue reading "Running totals and sums using PostgreSQL 8.4 Windowing functions"
Wednesday, April 29. 2009
Printer Friendly
Simon Greener, wrote an article on how to load GPX xml files into Oracle XMLDB. That
got me thinking that I haven't really explored all the XML features that PostgreSQL has to offer
and to some extent I've been reticent about XML processed in any database for that matter.
In this article we shall attempt to perform the same feats that Simon did, but with PostgreSQL instead of
Oracle XMLDB. Note while we are demonstrating this with a GPX file, the same XPath approach can be used to process any XML file.
Continue reading "Loading and Processing GPX XML files using PostgreSQL"
Thursday, February 19. 2009
Printer Friendly
This article is a bit of a companion to our article on Setting up PostgreSQL as a Linked Server in Microsoft SQL Server 64-bit
In this article we shall demonstrate using Microsoft SQL Server 2005/2008 OPENQUERY AND OPENROWSET to add, delete and update data in PostgreSQL.
First we must start by saying there are a number of ways of copying data between databases. While OPENROWSET is not necessarily the fasted,
in certain cases such as when you are wrapping this in a stored procedure, it is one of the most convenient ways of doing this.
Why on earth would you want to copy data back and forth between 2 servers and 2 disparate DBMS systems for that matter?
We all would like to think we are an island and live in a world with one DBMS system, but we don't. There are many reasons for having multiple DBMS providers in
an organization. Some are better for some things than others, some are more integrated in an environment -- (for example in a windows shop the SQL Server drivers are already loaded on all
windows machines, but PostgreSQL provides the advantage of being able to run on more platforms such a FreeBSD/Unix/Linux box and with cheaper cost and more options for PL programming so is often better for a front-facing DMZ accessible database),
and there are numerous other reasons that are too hard to itemize. The other question of why triggering from SQL Server instead of PostgreSQL is because
its just a little easier from Microsoft SQL Server. The OPENROWSET and OPENQUERY logic that SQL Server provides is just simply better and easier to use than the dblink provided for PostgreSQL. Anyrate with that said lets move on with the show.
Although this example is focused on using PostgreSQL with Microsoft SQL Server, the same technique applies when
copying retrieving updating data from other databases such as MySQL or Oracle or DB II.
Continue reading "Using Microsoft SQL Server to Update PostgreSQL Data"
Thursday, January 22. 2009
Printer Friendly
We have mentioned time and time again, one of the great selling points of PostgreSQL is that it has so many languages to choose from
for writing database stored functions and the code you write in those stored functions is almost exactly the same as what you would write when
writing in that language's environment. The reason for that is that PostgreSQL applies a thin layer around the environment the language lives in, so your code is really
running in that environment. The down-side of this approach is you must have that environment installed on the server. This is a bit different
from the Microsoft SQL Server model where code you write in VB.NET, C#, IronPython etc. gets translated into Common Runtime Logic (CLR) so your code is not
really running in the environment it would normally breathe in and if you have dependencies you have to enable them in the SQL Server GAC which is different
from the Server's .NET GAC.
In this section we shall introduce PL/Python - which is a PL language handler for Python that allows you to write PostgreSQL stored functions in Python. First of all I should start off
by saying that we are not proficient Python programmer's so if anyone sees anything wrong with what we say feel free to embarass us.
We are also taking this opportunity to test-drive PostgreSQL 8.4 on both Linux (OpenSUSE) and Windows,
using the EnterpriseDB PostgreSQL 8.4 beta
that Dave Page recently announced on his blog. This install is great if you are running Windows, MacOSX or Linux Desktop, but
sadly does not have PostGIS as part of the stack builder option.
For pure Linux Server CentOS/Redhat EL/Fedora no desktop installs or if you just feel more comfortable at the command-line,
PostgreSQL Yum repository generously maintained by Devrim is the one to go for.
We haven't tested this one out, but I presume the steps are pretty much what we outlined in Using PostgreSQL Yum repository.
Continue reading "Quick Intro to PLPython"
Sunday, September 07. 2008
Printer Friendly
As we mentioned in a previous article, RECURSIVE queries, often referred to in SQL ANSI specs and by DB2 and SQL Server as
Common Table Expressions (CTE) will make it into the 8.4 release and can already be found in the dev source. Technically CTE is a
superset and RECURSIVE queries are a subclass of CTE. Looks like basic windowing functionality will make it in 8.4 as well.
A summary of where your favorite patches are at can be found at the September 2008 PostgreSQL 8.4 commit-fest summary page http://wiki.postgresql.org/wiki/CommitFest:2008-09.
Continue reading "CTEs and Windowing Functions in 8.4"
Printer Friendly
One thing I'm really looking forward to have in the upcoming PostgreSQL 8.4 is the introduction
of the WITH RECURSIVE feature that IBM DB2 and SQL Server 2005 already have. Oracle has it too but in a non-standard CONNECT BY so is much less portable.
This is a feature that is perhaps more important to
us for the kind of work we do than the much complained about lack of windowing functions.
I was recently taking a snoop at IBM DB2 newsletter. Why I read magazines and newsletters on databases I don't even use I guess is to see what I'm missing out on
and to sound remotely educated on the topic when I run into one of those people. I also have a general fascination with magazines.
In it their latest newsletter they had examples of doing Fibonacci and Graphs with Common Table Expressions (CTEs).
Robert Mala's Fibonacci CTE
Robert Mala's Graph CTE
Compare the above to David Fetter's Fibonacci Memoizing
example he posted in our comments way back when.
I'd be interested in seeing what solutions David and others come out with using new features of 8.4. We can see a before 8.4 and after 8.4 recipe.
As a slightly off-topic side note - of all the Database magazines I have read - Oracle Magazine is the absolute worst. SQL Server Magazine and IBM DB2 are pretty decent.
The real problem is that Oracle's magazine is not even a database magazine.
Its a mishmash of every Oracle offering known to man squashed into a compendium that can satisfy no one. You would think that Oracle as big as their database is
would have a magazine dedicated to just that.
Perhaps there is another magazine besides Oracle Magazine, but haven't found it so I would be interested to know if I missed something.
|