Thursday, October 07. 2010
Printer Friendly
Universal Unique Identifiers are 16-byte / 32-hexadecimal digit (with 4 -s for separation)
identifiers standardized by the Open Software Foundation.
The main use as far as databases go is to ensure uniqueness of keys across databases. This is important if you have multiple servers or disperate systems that need to replicate or share
data and each can generate data on its own end. You want some non-centralized mechanism to ensure the ids generated from each server will never overlap.
There are various open standards
for generating these ids and each standard will tie the id based on some unique identifier of the computer or a namespace or just a purely random generator algorithm not tied to anything.
Since this is a question often asked by users coming from Microsoft SQL Server, we will demonstrate in this article the same concept in Microsoft SQL Server and how you would achieve similar functionality in PostgreSQL.
Continue reading "Universal Unique Identifiers PostgreSQL SQL Server Compare"
Tuesday, September 28. 2010
Printer Friendly
The PostGIS development team is proud to announce the availability of PostGIS 1.5.2. Further details are on the postgis.org website
PostGIS 1.5.2 release.
Leo and I are still working on the windows builds. As stated before the 32-bit builds will be out first. We are still preparing our 64-bit test environment on our new 64-bit laptop.
For those who are on 64-bit windows. Sami has some PostGIS 64-bit binaries for PostgreSQL 9.0 windows available on his blog. Though he just has the binaries available so you will need to use the other files from the 32-bit install.
To answer Sami's question, since he has asked it more than once:
I really can't understand why PostGIS developers still want to compile the whole stuff using msys/mingw and that kind of stuff. We have Visual C++ (yes, the compiler is available for free), everything compiles with it and you don't have to whine about how hard it is to compile stuff for Windows because it's not GNU.
There are 3 reasons:
- Believe it or not -- the PostGIS development crew is relatively small and most work on Unix or MacOSX which do compile under GNU. Each builds
there own regression tests. We need
to be able to test consistently on all platforms which means we need a devlopment environment that all regression tests will work on without too much fuss. As much of a pain as we whine about with mSys -- its the most like what everyone else uses and mimicks the environment most consistently.
- Supporting VC means supporting yet another set of make and configure files and yuck project files. I don't even think express can deal with solution files. etc. GEOS does it and it was a pain for them. I know because a while back I would point out all the issues I was having compiling under VC++ (not to mention I don't need VC++) -- cause I'm a webdeveloper -- so don't have it normally installed). It took Mat some time to revise packaged scripts to even get PostGIS to compile under VS. There are people that bicker, but no one steps up to the plate wanting to support VS/VC++.
- Leo and I are predominantly web developers and database professionals; frankly in my ideal world everything would be interpreted or Just in time compiled (JIT) by the server. MingW / VS slash anything that needs compiling is just a big pain however I look at it and they are of equal pain to me. I got out of desktop development so I wouldn't need to deal with compiling stuff.
Okay we have whined enough. There are talks in the PostGIS and GEOS group of switching to CMake and to have a process that builds said make / project files so that we can more easily support GNU and VS without hopefully not adding too much extra work on anyone's plate. We will see how that goes. Will we compile the 64-bit version under Msys64 or VS -- we would like to do both and compare the 2. :)
Printer Friendly
UPDATE: We have changed the licensing on the cheat sheet to Creative Commons per request
To celebrate the arrival of the long awaited PostgreSQL 9.0, we have prepared a multi-page PostgreSQL 9.0 cheat sheet that covers prior PostgreSQL constructs plus new 9.0 features.
PDF version of this cheat sheet is available at PostgreSQL 9.0 Cheat sheet in PDF 8/12 by 11",
PostgreSQL 9.0 Cheat sheet in PDF A4
and the PostgreSQL 9.0 Cheat sheet in HTML.
We took some advice from several readers and this time broke the cheatsheet into multiple pages. Hopefully you won't need magnifying glasses to read this one. We also switched to landscape and put all the examples at the end.
If you find any errors or anything major we left out, please let us know and we'll amend.
Monday, September 20. 2010
Printer Friendly
Over the past two weeks, the PostGIS development team has been working hard to get out PostGIS 1.5.2 in time for the PostgreSQL 9.0 release. This release contains fixes allowing PostGIS to compile against 9.0. Due to an unfortunate turn of events, we missed the cut by a couple of days and are currently
experiencing technical difficulties with the postgis.org website. These should be resolved soon and barring no further difficulties, we should have the final PostGIS 1.5.2 ready late this week.
On the plus side, we do have a PostGIS 1.5.2 rc1 available for download from our PostGIS Wiki Release Candidate Downloads section. Please feel free to test these
out so that we have a smooth release.
Paul's related post is here
Here are the details of what is fixed:
- This is a bug fix release, addressing issues that have been
filed since the 1.5.1 release.
- Bug Fixes
- Loader: fix handling of empty (0-verticed) geometries in shapefiles.
(Sandro Santilli)
- #536, Geography ST_Intersects, ST_Covers, ST_CoveredBy and
Geometry ST_Equals not using spatial index (Regina Obe, Nicklas Aven)
- #573, Improvement to ST_Contains geography
- Loader: Add support for command-q shutdown in Mac GTK build (Paul Ramsey)
- #393, Loader: Add temporary patch for large DBF files
(Maxime Guillaud, Paul Ramsey)
- #507, Fix wrong OGC URN in GeoJSON and GML output (Olivier Courtin)
- spatial_ref_sys.sql Add datum conversion for projection SRID 3021
(Paul Ramsey)
- Geography - remove crash for case when all geographies are out of
the estimate (Paul Ramsey)
- #469, Fix for array_aggregation error (Greg Stark, Paul Ramsey)
- #532, Temporary geography tables showing up in other user sessions
(Paul Ramsey)
- #562, ST_Dwithin errors for large geographies (Paul Ramsey)
- #513, shape loading GUI tries to make spatial index when loading DBF only
mode (Paul Ramsey)
- #527, shape loading GUI should always append log messages
(Mark Cave-Ayland)
- #504 shp2pgsql should rename xmin/xmax fields (Sandro Santilli)
- #458 postgis_comments being installed in contrib instead of
version folder (Mark Cave-Ayland)
- #474 Analyzing a table with geography column crashes server
(Paul Ramsey)
- #581 LWGEOM-expand produces inconsistent results
(Mark Cave-Ayland)
- #471 DocBook dtd errors (Olivier Courtin)
- Fix further build issues against PostgreSQL 9.0
(Mark Cave-Ayland)
- #572 Password whitespace for Shape File to PostGIS
Import not supported (Mark Cave-Ayland)
- #603 shp2pgsql: "-w" produces invalid WKT for MULTI* objects.
(Mark Cave-Ayland)
- Enhancement
- #513 Add dbf filter to shp2pgsql-gui and allow uploading dbf only
(Paul Ramsey)
We should have windows binaries available a short time after release. Unfortunately we do not have the 64-bit windows build ready yet, so you still have to use the 32-bit version of PostgreSQL 9.0 if you need PostGIS on windows.
Thursday, August 26. 2010
Printer Friendly
In part 1 of this series on PostgreSQL 9.0 planner outputs, we demonstrated how to render explain plans in YAML, JSON, and XML using the new explain features in PostgreSQL 9.0. In this second part,
we'll demonstrate how to build a user interface that allows you input a JSON formatted explain plan and have it render into a printable, navigateable display using JQuery, javascript and a little bit of HTML coding.
In part 3 we'll do something similar using XML and XSLT programming.
For those who aren't familiar with JQuery, it is an MIT licensed javascript library that is fairly light weight and allows you to inspect and change html elements with fairly intuitive syntax, has some nice ajax methods and tools for converting xml/json to native objects that can be manipulated.
You can check it out at JQUERY.
We are not experts in JQuery, but from what we have used of it, we really like it and the fact the base package is MIT licensed, fairly light weight and lots of plugins available for it are real pluses.
The most difficult thing I think most people find about reading explain plans is that they are upside down; it starts with a conclusion and backtracks how to arrive at it. Humans by nature think about planning steps from start to finish.
In order to make an explain plan understandable to mere mortals, we generally display them upside down or having the child-nodes shown left most. We shall follow that approach.
Continue reading "Explain Plans PostgreSQL 9.0 - Part 2: JSON and JQuery Plan Viewer"
Sunday, August 22. 2010
Printer Friendly
PostgreSQL offers several options for displaying and querying tree like structures.
In Using Recursive Common Table Expressions (CTE) to represent tree structures
we demonstrated how to use common table expressions to display a tree like structure. Common Table Expressions required PostgreSQL 8.4 and above but was fairly ANSI standards compliant. In addition to that
approach you have the option of using recursive functions. There is yet another common approach for this which is specific to PostgreSQL. This is using the ltree contrib datatype
that has been supported for sometime in PostgreSQL. For one of our recent projects, we chose ltree over the other approaches because the performance is much better when you need to do ad-hoc queries over the tree since it can take advantage of btree and gist indexes
and also has built-in tree query expressions that make ad-hoc queries simpler to do; similar in concept to the tsearch query syntax for querying text.
In this article we'll demonstrate how to use ltree and along the way also show the PostgreSQL 9.0 new features conditional triggers and ordered aggregates.
Continue reading "Using LTree to Represent and Query Hierarchy and Tree Structures"
Thursday, August 12. 2010
Printer Friendly
This is a question that comes up quite often by windows users, so thought we would share how we normally do it. The question is can you run a PostgreSQL server on your windows desktop/server box without having to install anything?
The answer is yes and quite easily. Why would you need to do this. There are a couple of cases -- one you are developing a single user app that you want users to be able to run from anywhere without having to install it first.
The other common reason is, you aren't allowed to install anything on a user's pc and you also want to package along a database you already have created.
For our purposes, many of our developers develop on portable WAMP like things, and for some of our applications, they need to work in both MySQL and PostgreSQL, so we need an easy way during development to swap one out for the other.
Continue reading "Starting PostgreSQL in windows without install"
Thursday, July 29. 2010
Printer Friendly
One of the new features of PostgreSQL 9.0 is the ability to specify the format of an explain plan.
In prior versions your only choice was text (and graphic explain with tools like PgAdmin III and other GUIS), but in 9.0 on, you have the additional options of
Javascript Object Notation (JSON) which some people have a thing or two to say about them, YAML Ain't Markup Language (YAML) or eXtended Markup Language (XML). The new explain options are itemized in PostgreSQL 9.0 EXPLAIN.
The main benefit of the JSON, XML, YAML formats is that they are easier
to machine parse than the default text version. This will allow for creative renderings of planner trees with minimal coding.
In Part 1 of this series, we'll demonstrate how to output the plans in these various formats and what they look like.
In later parts of this series -- we'll demonstrate how to use Javascript, XSL and other scripting/markup languages
to transform these into works of art you can hang on your wall.
-- START POSTGIS IN ACTION ASIDE --
We just submitted the third major revision of Chapter 3 Data Modeling
of our upcoming PostGIS in Action book.
The second major revision we never submitted and threw it out because it wasn't worldly enough and was too involved. We may
use it later on for an example.
Chapter 3 should be up on Manning Early Access Program (MEAP) soon. If you haven't bought the book yet Buy now.
You don't want to miss out on a major masterpiece in the making. Okay we exaggerate a bit.
-- END POSTGIS IN ACTION ASIDE --
Continue reading "Explain Plans PostgreSQL 9.0 Text, JSON, XML, YAML - Part 1: You Choose"
Friday, July 23. 2010
Printer Friendly
When it comes to naming things in databases and languages, there are various common standards. For many languages the
camel family of namings is very popular. For unix based databases
usually UPPER or lower _ is the choice and for databases such as SQL Server and MySQL which allow you to name your columns with mixed casing
but couldn't care less what case you express them in selects, you get a mish mush of styles depending on what camp the database user originated from.
So to summarize the key styles and the family of people
- camelCase : lastName - employed by SmallTalk, Java, Flex, C++ and various C derivative languages.
- Pascal Case: (a variant of Camel Case) -- LastName which is employed by C#, VB.NET, Pascal (and Delphi), and SQL Server (and some MySQL windows converts). Also often used for class names by languages that use standard camelCase for function names.
- lower case _ last_name : often found in C, a favorite among PostgreSQL database users. (some MySQL)
- upper case _ LAST_NAME : a favorite among Oracle Users (some MySQL Oracle defectors)
Being at the cross roads of all the above, we often have to deal with the various above as well as having internal schizophrenic strife and external fights.
The internal turmoil is the worst and is worse than an ambidextrous person trying to figure out which hand to use in battle. For these exercises, we'll demonstrate one way how to convert between the various conventions. These
are the first thoughts that came to our mind, so may not be the most elegant.
Continue reading "Of Camels and People: Converting back and forth from Camel Case, Pascal Case to underscore lower case"
Wednesday, July 21. 2010
Printer Friendly
In an earlier article Where is Soundex and other Fuzzy string things we covered the PostgreSQL contrib module fuzzstrmatch which contains the very popular function
soundex that is found in other popular relational databases. We also covered the more powerful levenshtein distance, metaphone and
dmetaphone functions included in fuzzstrmatch, but rarely found in other relational databases.
As far as fuzzy string matching goes, PostgreSQL has other functions up its sleeves. This time we will cover
the contrib module pg_trgm which was introduced in PostgreSQL 8.3. pgtrgm uses a concept called trigrams for doing string comparisons. The pg_trgm module has several functions and gist/gin operators.
Like other contrib modules, you just need to run the /share/contrib/pg_trgm.sql file packaged in your PostgreSQL install to enable it in your database.
For this set of exercises, we'll use trigrams to compare words using the same set of data we tested
with soundex and metaphones. For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
The most useful are the similarity function and the
% operator. The % operator allows for using a GIST/GIN index and the similarity function allows for narrowing your filter similar to what
levenshtein did for us in fuzzstrmatch.
Continue reading "Fuzzy string matching with Trigram and Trigraphs"
Monday, July 05. 2010
Printer Friendly
Gathering from the number of hits we got from our What's new in PostgreSQL 9.0,
and the large number of slashdot responses we got
as well as the fair number of reddit responses,
I guess a lot of people are really excited about the upcoming PostgreSQL 9.0 or at least
have a lot of opinions about what is still missing in it.
For this discussion, we would like to point out one of the companion adminstration tools that
will be packaged in with PostgreSQL 9.0 (and currently packaged in beta 2). This is PgAdmin III, which
we will affectionately refer to as the Administrative tool for mere mortals. It is the first administrative
tool that most users new to PostgreSQL use and gives them a user-friendly interface to the
power behind PostgreSQL. I would say if it were not for this tool and its web cousin PhpPgAdmin, many
a scared newbie user would be running away at the vast unencumbered freedom that PostgreSQL/psql and sibling commandline tools offer.
Continue reading "What is new in PgAdmin III 1.12.0"
Wednesday, June 16. 2010
Printer Friendly
PostgreSQL has various levels of encryption to choose from. In this article we'll go over the basics built-in and the more advanced provided by the contrib module pgcrypto. When encrypting data, as a general rule the harder you make it to
keep people out of your data, the easier it is for you to lock yourself out of your data. Not only does encryption make it difficult to read data, it
also takes more resources to query and decrypt. With those rules of thumb, its important to pick your encryption strategies based on the sensitivity of your data.
There are two basic kinds of encryption, one way and two way. In one way you don't ever care about decrypting the data into readable form, but you just want to verify the user knows what the underlying secret text is. This is normally used for passwords. In two way encryption, you want the ability to encrypt data as well as allow authorized users to decrypt it into a meaningful form. Data such as credit cards and SSNs would fall in this category.
One way encryption
Normally when people want one way encryption and just want a basic simple level of encryption, they use the md5 function which is built into PostgreSQL by default. The md5 function is equivalent to using the PASSWORD function in MySQL. If you want anything beyond that, you'll want to install
the pgcrypto contrib module.
pgcrypto comes packaged with most PostgreSQL installs including windows, and can be installed into a database by running the script in share/contrib/pgcrypto.sql of your PostgreSQL install. For PostgreSQL 8.4+, this adds 34 someodd functions to your list of options. For maintainability we like to install it in a
separate schema say crypto, and add this schema to our database search path.
For one way encryption, the crypt function packaged in pgcrypto provides an added level of security above the md5 way. The reason is that with md5, you can tell who has the same password because there is no salt so all people with the same password will have the same encoded md5 string.
With crypt, they will be different. To demonstrate lets create a table with two users who have happened to have chosen the same password.
Continue reading "Encrypting data with pgcrypto"
Tuesday, June 08. 2010
Printer Friendly
PostgreSQL 9.0 beta 2 just got released this week. We may see another beta before 9.0 is finally released, but it looks like PostgreSQL 9.0 will be here probably sometime this month.
Robert Treat has a great slide presentation showcasing all the new features. The slide share for those on Robert Treat's slide share page.
We'll list the key ones with our favorites at the top:
Our favorites
- The window function functionality has been enhanced to support ROWS PRECEDING and FOLLOWING. Recall we discussed this in Running totals and sums using PostgreSQL 8.4
a hack for getting around the lack of ROWS x PRECEDING and FOLLOWING. No more need for that. This changes our comparison we did Window Functions Comparison Between PostgreSQL 8.4, SQL Server 2008, Oracle, IBM DB2.
Now the syntax is inching even closer to Oracle's window functionality, far superior to SQL Server 2005/2008, and about on par with IBM DB2. We'll do updated compare late this month or early next month.
Depesz has an example of this in Waiting for 9.0 – extended frames for window functions
- Ordered Aggregates. This is extremely useful for spatial aggregates and ARRAY_AGG, STRING_AGG, and medians where you care about the order of the aggregation. Will have to give it a try.
For example if you are building a linestring using ST_MakeLine, a hack you normally do would be to order your dataset a certain way and then run ST_MakeLine. This will allow you to do
ST_MakeLine(pt_geom ORDER BY track_time) or ARRAY_AGG(student ORDER BY score) This is very very cool. Depesz has some examples of ordered aggregates.
- Join removal -- this is a feature that will remove joins from the execution plans where they are not needed. For example where you have a left join that doesn't appear in a where or as a column in select. This is important
for people like us that rely on views to allow less skilled users to be able to write meaningful queries without knowing too much about joins or creating ad-hoc query tools that allow users to pick from multiple tables. Check out Robert Haas why join removal is cool for more use cases.
- GRANT/REVOKE ON ALL object IN SCHEMA and ALTER DEFAULT PRIVILEGES. This is just a much simpler user-friendly way of applying permissions. I can't tell you how many times we get beat up by MySQL users who find the PostgreSQL security management tricky and tedious to get right.
Of course you can count on Depesz to have an example of this too Waiting for 9.0 - GRANT ALL
Continue reading "What is new in PostgreSQL 9.0"
Wednesday, June 02. 2010
Printer Friendly
One of the coolest features of PostgreSQL is the ability to write functions using plain old
SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural mumbo jumbo like
loops and what not. This is cool for two reasons:
- Plain old SQL is the simplest to write and most anyone can write one and is just what the doctor ordered in many cases. PostgreSQL even allows you to write
aggregate functions with plain old SQL. Try to write an aggregate function in SQL Server
you've got to pull out your Visual Studio this and that and do some compiling and loading and you better know C# or VB.NET. Try in MySQL and you better learn C.
Do the same in PostgreSQL (you have a large choice of languages including SQL) and the code is simple to write. Nevermind
with MySQL and SQL Server, you aren't even allowed to do those type of things on a shared server or a server where the IT department is paranoid. The closest
with this much ease would be Oracle, which is unnecessarily verbose.
- Most importantly -- since it is just SQL, for simple user-defined functions, a PostgreSQL sql function can often be in-lined into the overall query plan since
it only uses what is legal in plain old SQL.
This inlining feature is part of the secret sauce that makes PostGIS fast and easy to use.
So instead of writing geom1 && geom2 AND Intersects(geom1,geom2) -- a user can write
ST_Intersects(geom1,geom2) . The short-hand is even more striking when you think of the ST_DWithin function.
With an inlined function, the planner has visibility into the function and breaks apart the
spatial index short-circuit test && from the more exhaustive absolute test Intersects(geom1,geom2)
and has great flexibility in reordering the clauses in the plan.
Continue reading "STRICT on SQL Function Breaks In-lining Gotcha"
Monday, May 17. 2010
Printer Friendly
For those people coming from Oracle, SQL Server and MySQL or other databases that have soundex functionality,
you may be puzzled, or even frustrated when you try to do
something like
WHERE soundex('Wushington') = soundex('Washington')
in PostgreSQL and get a function does not exist error.
Well it does so happen that there is a soundex function in PostgreSQL, and yes it is
also called soundex, but is offered as a contrib module and not installed by default. It also has other fuzzy string matching functions in addition to soundex.
One of my favorites, the levenshenstein distance function is included as well. In this article
we'll be covering the contrib module packaged as fuzzystrmatch.sql. Details of the module can be found in FuzzyStrMatch.
The contrib module has been around for sometime, but has changed slightly from PostgreSQL version to PostgreSQL version. We are covering the 8.4 version in this article.
For those unfamiliar with soundex, its a basic approach developed by the US Census in the 1930s as a way of sorting
names by pronounciation. Read Census and Soundex for more gory history details.
Given that it is an approach designed primarily for the English alphabet, it sort of makes sense why its not built-in to PostgreSQL,
which has more of a diverse international concern. For example if you used it to compare two words in Japanese or Chinese,
don't think it would fair too well in any of the database platforms that support this function.
The original soundex algorithm has been improved over the years. Though its still the most common used today, newer variants
exist called MetaPhone developed in the 1990s and Double Metaphone (DMetaPhone) developed in 2000 that support additional
consonants in other languages such as Slavic, Celtic, Italian, Spanish etc.
These two variants are also included in the fuzzystrmatch contrib library. The soundex function still seems to be
the most popularly used at least for U.S. This is perhaps because most of the other databases (Oracle, SQL Server, MySQL) have soundex built-in but not the metaphone variants.
So in a sense soundex is a more portable function. The other reason is that metaphone and dmetaphone take up a bit more space and
are also more processor intensive to compute than soundex. We'll demonstrate some differences between them in this article.
To enable soundex and the other fuzzy string matching functions included, just run the
share/contrib/fuzzystrmatch.sql located in your PostgreSQL install folder. This library is an important piece of arsenal for geocoding and genealogy tracking particularly
the U.S. streets and surnames data sets. I come from a long line of Minors, Miners, Burnettes and Burnets.
For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
Continue reading "Where is soundex and other warm and fuzzy string things"
|