Postgres OnLine Journal

Monday, June 28. 2010

Importing data into PostgreSQL using Open Office Base 3.2

Recommended Books: Getting Started with OpenOffice.org 3 written by OO.org group Database Programming with OO Base Beginning OpenOffice 3 from Novice to Professional

A while ago we demonstrated how to use Open Office Base to connect to a PostgreSQL server using both the native PostgreSQL SBC and the PostgreSQL JDBC driver.

The routine for doing the same in Open Office Base 3.2 is pretty much the same as it was in the 2.3 incarnation. In this excerpt, we'll demonstrate how to import data into PostgreSQL using Open Office Base, as we had promised to do in Database Administration, Reporting, and Light Applicaton Development and some stumbling blocks to watch out for.

Use Case

Command line lovers are probably scratching there head, why you want to do this. After all stumbling your way thru a commandline and typing stuff is much more fun and you can automate it after you are done. For our needs, we get stupid excel or some other kind of tab delimeted data from somebody, and we just want to cut and paste that data in our database. These files are usually small (under 5000 records) and the column names are never consistent. We don't want to fiddle with writing code to do these one off type exercises.

For other people, who are used to using GUIs or training people afraid of command lines, the use cases are painfully obvious, so we won't bore you.

Importing Data with Open Office Base Using copy and paste

Open Office has this fantastic feature called Copy and Paste (no kidding), and we will demonstrate in a bit, why their copy and paste is better than Microsoft Access's Copy and Paste particularly when you want to paste into some database other than a Microsoft one. It is worthy of a metal if I dear say.

Continue reading "Importing data into PostgreSQL using Open Office Base 3.2"

Posted by Leo Hsu and Regina Obe in beginner, ms access, oobase, oracle, product showcase at 03:42 | Comments (6) | Trackbacks (0)

Tuesday, June 22. 2010

NOT IN NULL Uniqueness trickery

Printer Friendly

Recommended Books: SQL Cookbook SQL Hacks

I know a lot has been said about this beautiful value we affectionately call NULL, which is neither here nor there and that manages to catch many of us off guard with its casual neither here nor thereness. Database analysts who are really just back seat mathematicians in disguise like to philosophize about the unknown and pat themselves on the back when they feel they have mastered the unknown better than any one else. Of course database spatial analysts, the worst kind of back seat mathematicians, like to talk not only about NULL but about EMPTY and compare notes with their brethren and write dissertations about what to do about something that is neither here nor there but is more known than the unknown, but not quite as known as the empty string.

Okay getting to the point, one of our clients asked us about a peculiar problem they had with a query, and the strange results they were getting. We admit this still manages to catch us off guard every once in a while.

Continue reading "NOT IN NULL Uniqueness trickery"

Posted by Leo Hsu and Regina Obe in basics, beginner, postgresql versions, sql server at 04:19 | Comments (2)

Wednesday, June 16. 2010

Encrypting data with pgcrypto

Printer Friendly

PostgreSQL has various levels of encryption to choose from. In this article we'll go over the basics built-in and the more advanced provided by the contrib module pgcrypto. When encrypting data, as a general rule the harder you make it to keep people out of your data, the easier it is for you to lock yourself out of your data. Not only does encryption make it difficult to read data, it also takes more resources to query and decrypt. With those rules of thumb, its important to pick your encryption strategies based on the sensitivity of your data.

There are two basic kinds of encryption, one way and two way. In one way you don't ever care about decrypting the data into readable form, but you just want to verify the user knows what the underlying secret text is. This is normally used for passwords. In two way encryption, you want the ability to encrypt data as well as allow authorized users to decrypt it into a meaningful form. Data such as credit cards and SSNs would fall in this category.

One way encryption

Normally when people want one way encryption and just want a basic simple level of encryption, they use the md5 function which is built into PostgreSQL by default. The md5 function is equivalent to using the PASSWORD function in MySQL. If you want anything beyond that, you'll want to install the pgcrypto contrib module.

pgcrypto comes packaged with most PostgreSQL installs including windows, and can be installed into a database by running the script in share/contrib/pgcrypto.sql of your PostgreSQL install. For PostgreSQL 8.4+, this adds 34 someodd functions to your list of options. For maintainability we like to install it in a separate schema say crypto, and add this schema to our database search path.

For one way encryption, the crypt function packaged in pgcrypto provides an added level of security above the md5 way. The reason is that with md5, you can tell who has the same password because there is no salt so all people with the same password will have the same encoded md5 string. With crypt, they will be different. To demonstrate lets create a table with two users who have happened to have chosen the same password.

Continue reading "Encrypting data with pgcrypto"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, 9.0, contrib spotlight, pgcrypto, postgresql versions at 06:26 | Comments (12) | Trackbacks (0)

Tuesday, June 08. 2010

What is new in PostgreSQL 9.0

Printer Friendly

Recommended Books:

PostgreSQL 9.0 High Performance

PostgreSQL 9.0 Admin Cookbook

PostGIS in Action

PostgreSQL 9.0 beta 2 just got released this week. We may see another beta before 9.0 is finally released, but it looks like PostgreSQL 9.0 will be here probably sometime this month. Robert Treat has a great slide presentation showcasing all the new features. The slide share for those on Robert Treat's slide share page.

We'll list the key ones with our favorites at the top:

Our favorites

The window function functionality has been enhanced to support ROWS PRECEDING and FOLLOWING. Recall we discussed this in Running totals and sums using PostgreSQL 8.4 a hack for getting around the lack of ROWS x PRECEDING and FOLLOWING. No more need for that. This changes our comparison we did Window Functions Comparison Between PostgreSQL 8.4, SQL Server 2008, Oracle, IBM DB2. Now the syntax is inching even closer to Oracle's window functionality, far superior to SQL Server 2005/2008, and about on par with IBM DB2. We'll do updated compare late this month or early next month. Depesz has an example of this in Waiting for 9.0 – extended frames for window functions
Ordered Aggregates. This is extremely useful for spatial aggregates and ARRAY_AGG, STRING_AGG, and medians where you care about the order of the aggregation. Will have to give it a try. For example if you are building a linestring using ST_MakeLine, a hack you normally do would be to order your dataset a certain way and then run ST_MakeLine. This will allow you to do ST_MakeLine(pt_geom ORDER BY track_time) or ARRAY_AGG(student ORDER BY score) This is very very cool. Depesz has some examples of ordered aggregates.
Join removal -- this is a feature that will remove joins from the execution plans where they are not needed. For example where you have a left join that doesn't appear in a where or as a column in select. This is important for people like us that rely on views to allow less skilled users to be able to write meaningful queries without knowing too much about joins or creating ad-hoc query tools that allow users to pick from multiple tables. Check out Robert Haas why join removal is cool for more use cases.
GRANT/REVOKE ON ALL object IN SCHEMA and ALTER DEFAULT PRIVILEGES. This is just a much simpler user-friendly way of applying permissions. I can't tell you how many times we get beat up by MySQL users who find the PostgreSQL security management tricky and tedious to get right. Of course you can count on Depesz to have an example of this too Waiting for 9.0 - GRANT ALL

Continue reading "What is new in PostgreSQL 9.0"

Posted by Leo Hsu and Regina Obe in 9.0, new in postgresql, postgis, postgresql versions at 22:11 | Comments (13) | Trackback (1)

Wednesday, June 02. 2010

STRICT on SQL Function Breaks In-lining Gotcha

Printer Friendly

One of the coolest features of PostgreSQL is the ability to write functions using plain old SQL. This feature it has had for a long time. Even before PostgreSQL 8.2. No other database to our knowledge has this feature. By SQL we mean sans procedural mumbo jumbo like loops and what not. This is cool for two reasons:

Plain old SQL is the simplest to write and most anyone can write one and is just what the doctor ordered in many cases. PostgreSQL even allows you to write aggregate functions with plain old SQL. Try to write an aggregate function in SQL Server you've got to pull out your Visual Studio this and that and do some compiling and loading and you better know C# or VB.NET. Try in MySQL and you better learn C. Do the same in PostgreSQL (you have a large choice of languages including SQL) and the code is simple to write. Nevermind with MySQL and SQL Server, you aren't even allowed to do those type of things on a shared server or a server where the IT department is paranoid. The closest with this much ease would be Oracle, which is unnecessarily verbose.
Most importantly -- since it is just SQL, for simple user-defined functions, a PostgreSQL sql function can often be in-lined into the overall query plan since it only uses what is legal in plain old SQL.

This inlining feature is part of the secret sauce that makes PostGIS fast and easy to use. So instead of writing geom1 && geom2 AND Intersects(geom1,geom2) -- a user can write ST_Intersects(geom1,geom2) . The short-hand is even more striking when you think of the ST_DWithin function.

With an inlined function, the planner has visibility into the function and breaks apart the spatial index short-circuit test && from the more exhaustive absolute test Intersects(geom1,geom2) and has great flexibility in reordering the clauses in the plan.

Continue reading "STRICT on SQL Function Breaks In-lining Gotcha"

Posted by Leo Hsu and Regina Obe in 8.3, 8.4, 9.0, basics, intermediate, mysql, oracle, postgis, postgresql versions, sql functions, sql server at 05:06 | Comments (3) | Trackback (1)

Saturday, May 29. 2010

PostGIS, SQL Server, Oracle spatial compares and other news

Printer Friendly

PostGIS, SQL Server 2008 R2, Oracle 11G R2

We just completed our compare of the spatial functionality of PostgreSQL 8.4/PostGIS 1.5, SQL Server 2008 R2, Oracle 11G R2 (both its built-in Locator and Spatial add-on). Most of the compare is focused on what can be gleaned from the manual of each product.

In summary, all products have changed a bit since their prior versions. The core changes:

PostGIS 1.5 has geodetic support now in the form of geography as well as some beefed up functions and additional distance functions like ST_ClosestPoint, ST_MaxDistance, ST_ShortestLine/LongestLine
SQL Server 2008 R2 basic spatial support hasn't changed much when compared to SQL Server 2008, but there is a lot more integration going on integrating Spatial into reporting services, Share Point and just integration in general with SQL Server 2008 R2 and the Office 2010 stack.
Oracle 11G R2 - has finally offered an uninstall script for Locator folks who do not care to break the law by accidentally using functions only licensed in Oracle spatial, but innocently exposed in Oracle Locator. If all that were not great enough, you are now allowed to legally do a centroid if you are using Oracle Locator. Doing unions, intersections, and differences is still a legal no no for Oracle Locator folks. Oracle now provides Affine transform functions, which have long been provided by PostGIS and have been available via the MPL licensed CLR Spatial package of SQL Server 2008.

I still haven't figured out where this R2 convention started. I thought it was just a Microsoft thing, but I see Oracle follows the same convention as well.

Continue reading "PostGIS, SQL Server, Oracle spatial compares and other news"

Posted by Leo Hsu and Regina Obe in 8.4, editor note, oracle, postgis, sql server at 20:56 | Comments (3) | Trackbacks (0)

Monday, May 17. 2010

Where is soundex and other warm and fuzzy string things

Printer Friendly

For those people coming from Oracle, SQL Server and MySQL or other databases that have soundex functionality, you may be puzzled, or even frustrated when you try to do something like
WHERE soundex('Wushington') = soundex('Washington')
in PostgreSQL and get a function does not exist error.

Well it does so happen that there is a soundex function in PostgreSQL, and yes it is also called soundex, but is offered as a contrib module and not installed by default. It also has other fuzzy string matching functions in addition to soundex. One of my favorites, the levenshenstein distance function is included as well. In this article we'll be covering the contrib module packaged as fuzzystrmatch.sql. Details of the module can be found in FuzzyStrMatch. The contrib module has been around for sometime, but has changed slightly from PostgreSQL version to PostgreSQL version. We are covering the 8.4 version in this article.

For those unfamiliar with soundex, its a basic approach developed by the US Census in the 1930s as a way of sorting names by pronounciation. Read Census and Soundex for more gory history details.

Given that it is an approach designed primarily for the English alphabet, it sort of makes sense why its not built-in to PostgreSQL, which has more of a diverse international concern. For example if you used it to compare two words in Japanese or Chinese, don't think it would fair too well in any of the database platforms that support this function.

The original soundex algorithm has been improved over the years. Though its still the most common used today, newer variants exist called MetaPhone developed in the 1990s and Double Metaphone (DMetaPhone) developed in 2000 that support additional consonants in other languages such as Slavic, Celtic, Italian, Spanish etc. These two variants are also included in the fuzzystrmatch contrib library. The soundex function still seems to be the most popularly used at least for U.S. This is perhaps because most of the other databases (Oracle, SQL Server, MySQL) have soundex built-in but not the metaphone variants. So in a sense soundex is a more portable function. The other reason is that metaphone and dmetaphone take up a bit more space and are also more processor intensive to compute than soundex. We'll demonstrate some differences between them in this article.

To enable soundex and the other fuzzy string matching functions included, just run the share/contrib/fuzzystrmatch.sql located in your PostgreSQL install folder. This library is an important piece of arsenal for geocoding and genealogy tracking particularly the U.S. streets and surnames data sets. I come from a long line of Minors, Miners, Burnettes and Burnets.

For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.

Continue reading "Where is soundex and other warm and fuzzy string things"

Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, 9.0, beginner, contrib spotlight, fuzzystrmatch, mysql, oracle, postgresql versions, sql server at 16:53 | Comments (2) | Trackbacks (3)

Output parameters, custom data type gotchas

Printer Friendly

Pierre Racine has been diligently working on PostGIS WKT Raster development. He was recently creating an sql function that uses output parameters. That was all nice and well, except he couldn't figure out how to output the output parameters as columns.

The function looked something like this:


CREATE FUNCTION somefunction(rast raster, OUT field1 integer, OUT field2 sometype, etc.) AS
	$$ blah blah blah $$
LANGUAGE 'sql';

Continue reading "Output parameters, custom data type gotchas"

Posted by Leo Hsu and Regina Obe in 8.4, pl programming, postgis, sql functions at 16:22 | Comments (2) | Trackbacks (0)

Wednesday, May 12. 2010

Windows PostGIS 1.5.2 SVN and WKT Raster available for Windows PostgreSQL 9.0 beta 1

Printer Friendly

We have just packaged up PostGIS binaries for Windows PostgreSQL 9.0 beta 1. These are binaries for PostGIS 1.5 current stable branch and WKT Raster raster support.

You can download these from the PostGIS Windows Experimental Builds section.

When PostGIS 1.5.2 is officially released, we'll be adding PostGIS 1.5.2 for PostgreSQL 9.0 on stack builder section along with the 8.3 and 8.4 versions.

Posted by Leo Hsu and Regina Obe in 9.0, contrib spotlight, gis, postgis, postgresql versions at 16:24 | Comments (0) | Trackbacks (0)

Friday, April 23. 2010

Import fixed width data into PostgreSQL with just PSQL

Printer Friendly

Fixed width data is probably the most annoying data to import because you need some mechanism to break the columns at the column boundaries. A lot of people bring this kind of data into a tool such as OpenOffice, Excel or MS Access, massage it into a delimeted format and then pull it in with PostgreSQL copy command or some other means. There is another way and one that doesn't require anything else aside from what gets packaged with PostgreSQL. We will demonstrate this way.

Its quite simple. Pull each record in as a single column and then spit it into the columns you want with plain old SQL. We'll demonstrate this by importing Census data places fixed width file.

Although this technique we have is focused on PostgreSQL, its pretty easy to do the same steps in any other relational database.

Both David Fetter and Dimitri Fontaine have demonstrated other approaches of doing this as well so check theirs out.

UPDATE

David Fetter - psql, Paste, Perl: Pefficiency!
Dimitri Fontaine - Import fixed width data with pgloader

Continue reading "Import fixed width data into PostgreSQL with just PSQL"

Posted by Leo Hsu and Regina Obe in basics, beginner at 17:38 | Comment (1) | Trackbacks (0)

Saturday, April 17. 2010

PostGIS Raster its on: 10 things you can do NOW with raster

Printer Friendly

We just finished the first draft of the last chapter of our book: First look at PostGIS WKT Raster. This completes our hard-core writing and now on to more drafting, polishing all the chapters. In Chapter 13 we demonstrate how to use PostGIS WKT Raster functions by example and cross breed with PostGIS geometry functionality. I was pleasantly surprised to see how nicely the raster and geometry functions play together.

We had intended this chapter to be short about 20 pages in length, because how much can one say about pixels and pictures. As it turns out, a lot. Rasters are more versatile than their picture portrayal on a screen. Rasters are a class of structured storage suitable for representing any numeric, cell based data where each cell has one or more numeric properties (the bands). This covers quite a bit of data you collect with remote sensing and other electronic instrumentation. We had to stretch to over 30 pages; even then we felt we were missing some critical examples.

There is a lot of useful functionality in PostGIS WKT Raster already and should make a lot of people looking for raster support in PostgreSQL very happy. Although the chapter may portray some scenes of violence and torture inflicted on elephants, you can rest assured that it is pure illusion and no real elephants or blue elephant dolls were harmed in the making of this chapter.

As a side note -- our book is now listed on Amazon PostGIS in Action. It is not available in hard-copy yet,but you can pre-order and of course you can order from PostGIS in Action from Manning directly to get the chapter drafts we have posted, updates as we polish them, and the final book when it comes out in hard print.

The Amazon listing would have been so much more exciting, had they not stripped me of my last name or had Leo married to himself.
UPDATE: It appears I now have a last name again
In hind sight, I suppose OBE is more commonly seen as a title of honor rather than a last name, so its only fitting that I should be stripped of mine and Tim Berners-Lee gets it tacked on at the end of his name.

To find out more about PostGIS WKT Raster, we encourage you to check out these links.

Now we'll itemize 10 things you can do now with PostGIS WKT Raster. In order to use PostGIS WKT Raster, you need PostGIS 1.3.5 or above. Preferably 1.4 or 1.5 or 2.0 alpha.

PostGIS WKT Raster is currently packaged as a separate library and we have windows binaries available.

Continue reading "PostGIS Raster its on: 10 things you can do NOW with raster"

Posted by Leo Hsu and Regina Obe in 8.3, 8.4, 9.0, contrib spotlight, gis, postgis at 17:08 | Comments (0) | Trackback (1)

Thursday, April 01. 2010

CatchMe - Microsoft SQL Server for Unix and Linux

Printer Friendly

Today Microsoft unveiled their top secret project code named CatchMe. This is their new flagship database for Linux and Unix based on predominantly the PostgreSQL 9.0 code base, but with an emulation layer that makes it behave like SQL Server 2008 R2. Unlike the Windows SQL Server 2008 R2 product, this version is completely free and open source under the Microsoft Public License (Ms-PL). Downloads for the RCs of these will be available soon. Please stay tuned.

Reporter Dat A. Base managed to get an exclusive interview with the head of the project, Quasi Modo. The transcript follows:

Continue reading "CatchMe - Microsoft SQL Server for Unix and Linux"

Posted by Leo Hsu and Regina Obe in 9.0, joke, new in postgresql, oracle, sql server at 12:42 | Comments (2) | Trackback (1)

Friday, March 05. 2010

What is New in PostGIS Land

Printer Friendly

This month we we will be giving two mini-tutorials at PgCon East 2010 on Saturday, March 27th. The topic of the talks will be, you guessed it, PostGIS. We have changed our Beyond talk to PostGIS: Adding spatial support to PostgreSQL to a beginner focus instead of an intermediate focus. Topic content will be more or less the same but focused more on people new to spatial database analysis. Our web applications talk will cater more to the web developer trying to integrate PostGIS in their web applications.

Marcus Rouhani of the Federal Aviation Administation will also be talking about the Airport GIS project and migration from Oracle to PostgreSQL.

On a somewhat related note, we also hope to be finished with all the chapters of our upcoming book this month. We just completed the first draft of our Chapter 10: PostgreSQL Add-ons and ancillary tools. After some back and forth with our editor, this will be up on MEAP, available for read and comments for early book buyers. Still two more chapters to finish after that before we get to the polishing of the text, images, layout and final print version.

Our publisher Manning is running a 50% off sale this Friday (tomorrow or is it today) on any MEAP book and they have a lot of interesting ones in the pipeline (including ours).

Waiting for PostGIS 2.0

The OSGEO just completed a recent coding sprint in New York. The New York sprint was a meeting of the minds of OSGEO people from various projects -- PostGIS, Mapserver, Geoserver, OpenLayers, GDAL, and some others were represented. Sadly we were not able to attend this one. A summary of the sprint with a PostGIS bent can be found on Olivier Courtin's New York sprint summary (Original French Version) and Olivier Courtin's New York sprint summary (Google English translation) and Paul's New York sprint summary.

Continue reading "What is New in PostGIS Land"

Posted by Leo Hsu and Regina Obe in editor note, postgis at 02:09 | Comments (0) | Trackbacks (0)

Thursday, March 04. 2010

In Defense of varchar(x)

Printer Friendly

This is a rebuttal to depesz's charx, varcharx, varchar, and text and David Fetter's varchar(n) considered harmful. I respect both depesz and David and in fact enjoy reading their blogs. We just have deferring opinions on the topic.

For starters, I am pretty tired of the following sentiments from some PostgreSQL people:

99% of the people who choose varchar(x) over text in PostgreSQL in most cases are just ignorant folk and don't realize that text is just as fast if not faster than varchar in PostgreSQL.
stuff your most despised database here compatibility is not high on my priority list.
It is unfortunate you have to work with the crappy tools you work with that can't see the beauty in PostgreSQL text implementation. Just get something better that treats PostgreSQL as the superior creature it is.

Continue reading "In Defense of varchar(x)"

Posted by Leo Hsu and Regina Obe in basics, mysql, oracle, sql server at 19:23 | Comments (15) | Trackbacks (0)

Sunday, February 14. 2010

Regular Expressions in PostgreSQL

Printer Friendly

Recommended Books: Regular Expressions Pocket Reference Mastering Regular Expressions Sed and Awk

Every programmer should embrace and use regular expressions (INCLUDING Database programmers). There are many places where regular expressions can be used to reduce a 20 line piece of code into a 1 liner. Why write 20 lines of code when you can write 1.

Regular expressions are a domain language just like SQL. Just like SQL they are embedded in many places. You have them in your program editor. You see it in sed, grep, perl, PHP, Python, VB.NET, C#, in ASP.NET validators and javascript for checking correctness of input. You have them in PostgreSQL as well where you can use them in SQL statements, domain definitions and check constraints. You can mix regular expressions with SQL. When you mix the two domain languages, you can do enchanting things with a flip of a wrist that would amaze your less informed friends. Embrace the power of domain languages and mix it up. PostgreSQL makes that much easier than any other DBMS we can think of.

For more details on using regular expressions in PostgreSQL, check out the manual pages Pattern Matching in PostgreSQL

The problem with regular expressions is that they are slightly different depending on what language environment you are running them in. Different enough to be frustrating. We'll just focus on their use in PostgreSQL, though these lessons are applicable to other environments.

Continue reading "Regular Expressions in PostgreSQL"

Posted by Leo Hsu and Regina Obe in 8.4, beginner, q&a at 11:43 | Comments (12) | Trackbacks (3)

Monday, June 28. 2010

Use Case

Importing Data with Open Office Base Using copy and paste

Tuesday, June 22. 2010

Wednesday, June 16. 2010

One way encryption

Tuesday, June 08. 2010

Our favorites

Wednesday, June 02. 2010

Saturday, May 29. 2010

PostGIS, SQL Server 2008 R2, Oracle 11G R2

Monday, May 17. 2010

Wednesday, May 12. 2010

Friday, April 23. 2010

Saturday, April 17. 2010

Thursday, April 01. 2010

Friday, March 05. 2010

Waiting for PostGIS 2.0

Thursday, March 04. 2010

Sunday, February 14. 2010

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration