Monday, July 05. 2010
Printer Friendly
Gathering from the number of hits we got from our What's new in PostgreSQL 9.0,
and the large number of slashdot responses we got
as well as the fair number of reddit responses,
I guess a lot of people are really excited about the upcoming PostgreSQL 9.0 or at least
have a lot of opinions about what is still missing in it.
For this discussion, we would like to point out one of the companion adminstration tools that
will be packaged in with PostgreSQL 9.0 (and currently packaged in beta 2). This is PgAdmin III, which
we will affectionately refer to as the Administrative tool for mere mortals. It is the first administrative
tool that most users new to PostgreSQL use and gives them a user-friendly interface to the
power behind PostgreSQL. I would say if it were not for this tool and its web cousin PhpPgAdmin, many
a scared newbie user would be running away at the vast unencumbered freedom that PostgreSQL/psql and sibling commandline tools offer.
Continue reading "What is new in PgAdmin III 1.12.0"
Monday, June 28. 2010
Printer Friendly
A while ago we demonstrated how to use Open Office Base to connect to a PostgreSQL server using both the native PostgreSQL SBC and the PostgreSQL JDBC driver.
The routine for doing the same in Open Office Base 3.2 is pretty much the same as it was in the 2.3 incarnation. In this excerpt, we'll demonstrate how to import data into PostgreSQL using Open Office Base, as we had promised to do in
Database Administration, Reporting, and Light Applicaton Development and some stumbling blocks to watch out for.
Use Case
Command line lovers are probably scratching there head, why you want to do this. After all stumbling your way thru a commandline and typing stuff is much more fun and you can automate it after you are done.
For our needs, we get stupid excel or some other kind of tab delimeted data
from somebody, and we just want to cut and paste that data in our database. These files are usually small (under 5000 records) and the column names are never consistent. We don't want to fiddle with writing code to do these one off type exercises.
For other people, who are used to using GUIs or training people afraid of command lines, the use cases are painfully obvious, so we won't bore you.
Importing Data with Open Office Base Using copy and paste
Open Office has this fantastic feature called Copy and Paste (no kidding), and we will demonstrate in a bit, why their copy and paste is better than Microsoft Access's Copy and Paste particularly when you want to paste into some database other than a Microsoft one.
It is worthy of a metal if I dear say.
Continue reading "Importing data into PostgreSQL using Open Office Base 3.2"
Tuesday, June 22. 2010
Printer Friendly
I know a lot has been said about this beautiful value we affectionately call NULL, which is neither here nor there and that manages to catch many of us
off guard with its casual neither here nor thereness. Database analysts who are really just back seat mathematicians in disguise like to philosophize
about the unknown and pat themselves on the back when they feel they have mastered the unknown better than any one else. Of course database spatial analysts, the worst kind of back seat mathematicians,
like to talk not only about NULL but about EMPTY and compare notes with their brethren and write dissertations
about what to do about something that is neither here nor there
but is more known than the unknown, but not quite as known as the empty string.
Okay getting to the point, one of our clients asked us about a peculiar problem they had with a query, and the strange results they were getting. We admit this still manages to catch us off guard every once in a while.
Continue reading "NOT IN NULL Uniqueness trickery"
Monday, May 17. 2010
Printer Friendly
For those people coming from Oracle, SQL Server and MySQL or other databases that have soundex functionality,
you may be puzzled, or even frustrated when you try to do
something like
WHERE soundex('Wushington') = soundex('Washington')
in PostgreSQL and get a function does not exist error.
Well it does so happen that there is a soundex function in PostgreSQL, and yes it is
also called soundex, but is offered as a contrib module and not installed by default. It also has other fuzzy string matching functions in addition to soundex.
One of my favorites, the levenshenstein distance function is included as well. In this article
we'll be covering the contrib module packaged as fuzzystrmatch.sql. Details of the module can be found in FuzzyStrMatch.
The contrib module has been around for sometime, but has changed slightly from PostgreSQL version to PostgreSQL version. We are covering the 8.4 version in this article.
For those unfamiliar with soundex, its a basic approach developed by the US Census in the 1930s as a way of sorting
names by pronounciation. Read Census and Soundex for more gory history details.
Given that it is an approach designed primarily for the English alphabet, it sort of makes sense why its not built-in to PostgreSQL,
which has more of a diverse international concern. For example if you used it to compare two words in Japanese or Chinese,
don't think it would fair too well in any of the database platforms that support this function.
The original soundex algorithm has been improved over the years. Though its still the most common used today, newer variants
exist called MetaPhone developed in the 1990s and Double Metaphone (DMetaPhone) developed in 2000 that support additional
consonants in other languages such as Slavic, Celtic, Italian, Spanish etc.
These two variants are also included in the fuzzystrmatch contrib library. The soundex function still seems to be
the most popularly used at least for U.S. This is perhaps because most of the other databases (Oracle, SQL Server, MySQL) have soundex built-in but not the metaphone variants.
So in a sense soundex is a more portable function. The other reason is that metaphone and dmetaphone take up a bit more space and
are also more processor intensive to compute than soundex. We'll demonstrate some differences between them in this article.
To enable soundex and the other fuzzy string matching functions included, just run the
share/contrib/fuzzystrmatch.sql located in your PostgreSQL install folder. This library is an important piece of arsenal for geocoding and genealogy tracking particularly
the U.S. streets and surnames data sets. I come from a long line of Minors, Miners, Burnettes and Burnets.
For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
Continue reading "Where is soundex and other warm and fuzzy string things"
Friday, April 23. 2010
Printer Friendly
Fixed width data is probably the most annoying data to import because you need some mechanism to break the columns at the column boundaries. A lot of people bring this kind of data
into a tool such as OpenOffice, Excel or MS Access, massage it into a delimeted format and then pull it in with PostgreSQL copy command or some other means. There is another way and one that doesn't require
anything else aside from what gets packaged with PostgreSQL. We will demonstrate this way.
Its quite simple. Pull each record in as a single column and then spit it into the columns you want with plain old SQL. We'll demonstrate this by importing Census data places fixed width file.
Although this technique we have is focused on PostgreSQL, its pretty easy to do the same steps in any other relational database.
Both David Fetter and Dimitri Fontaine have demonstrated other approaches of doing this as well
so check theirs out.
UPDATE
Continue reading "Import fixed width data into PostgreSQL with just PSQL"
Sunday, February 14. 2010
Printer Friendly
Every programmer should embrace and use regular expressions (INCLUDING Database programmers).
There are many places where regular expressions can be used to reduce a 20 line piece of code into a
1 liner. Why write 20 lines of code when you can write 1.
Regular expressions are a domain language just like SQL. Just like SQL they are embedded in many places. You have them in your program editor.
You see it in sed, grep, perl, PHP, Python, VB.NET, C#,
in ASP.NET validators and javascript for checking correctness of input. You have them in PostgreSQL as well where
you can use them in SQL statements, domain definitions and check constraints. You can mix
regular expressions with SQL. When you mix the two domain languages, you can do enchanting things with a flip of a wrist that
would amaze your less informed friends. Embrace the power of domain languages and mix it up. PostgreSQL
makes that much easier than any other DBMS we can think of.
For more details on using regular expressions in PostgreSQL, check out the manual pages Pattern Matching in PostgreSQL
The problem with regular expressions is that they are slightly different depending on what language environment you are
running them in. Different enough to be frustrating. We'll just focus on their use in PostgreSQL, though these lessons
are applicable to other environments.
Continue reading "Regular Expressions in PostgreSQL"
Thursday, January 21. 2010
Printer Friendly
Every once in a while, especially if you have a fairly large database, you may find the need to do select backups of certain tables.
Your criteria might be based on name or how relatively recently data has changed in the table.
Below are some of the tricks we use. Some use our favorite hack of scripting command line scripts with SQL.
Continue reading "Making backups of select tables"
Thursday, December 24. 2009
Printer Friendly
PgAdmin 1.9+ has a simple Plug-In architecture which makes it relatively simple to introduce new plugins. It is pretty much all controlled by the file plugins.ini. In that file you can register any executable you want in there. If you want the executable to get passed database configuration settings, there is an option for that and you just have to build your executable to accept commandline switches.
You can download the windows version from http://postgis.net/windows_downloads. If you just want it without the PostGIS 1.5 binaries -- just download the one labeled PostGIS ESRI Shapefile GUI.
Continue reading "PgAdmin III Plug-in Registration: PostGIS Shapefile and DBF Loader"
Saturday, November 28. 2009
Printer Friendly
In this exercise, we'll go thru our steps for upgrading a Redhat Enterprise Linux 64-bit PostgreSQL
box from PostgreSQL 8.3 to PostgreSQL 8.4. If you don't have any kind of PostgreSQL installed on your box,
you can skip the Upgrade step.
UPDATE - since Devrim's move from Command prompt - he has started a new yum repository. You may want to use
this one instead since it seems more up to date than the other. http://yum.pgrpms.org/
UPDATEWe have instructions for installing PostgreSQL 9.0 via yum.
Updgrading from PostgreSQL 8.* to PostgreSQL 8.4
If you are starting from scratch -- just skip this section.
If you are upgrading from 8.4.0 to 8.4.1 you can get away with a simple
yum update postgresql and skip the rest of this article.
If you are upgrading from PostgreSQL 8.3 to 8.4, in theory you can use PgMigrator, but in practice,
particularly with a Yum install, you are bound to run into obstacles. If you are running an older version, you must dump and restore.
Continue reading "An almost idiot's guide to Install and Upgrade to PostgreSQL 8.4 with Yum"
Monday, October 26. 2009
Printer Friendly
This is an unfortunate predicament that many people find themselves in and does cause a bit of frustration. You bring in some tables into your PostgreSQL
database using some column name preserving application, and the casings are all preserved from the source data store. So now you have to quote all the fields
everytime you need to use them. In these cases, we usually rename the columns to be all lower case using a script. There are two approaches we have seen/can think of for doing this
one to run a script that generates the appropriate alter table statements and the other is to update the pg_attribute system catalog table directly.
Continue reading "Lowercasing table and column names"
Wednesday, October 21. 2009
Printer Friendly
Vacuuming and analyzing is the process that removes dead rows and also updates the statistics of a table.
As of PostgreSQL 8.3, auto vacuuming (the process that runs around cleaning up tables), is on by default. If you are
creating a lot of tables and bulk loading data, the vacuumer sometimes gets in your way. One way to get around that is to
disable auto vacuuming on the tables you are currently working on and then reenable afterward.
You can also do this from the PgAdmin III management console.
Continue reading "Enable and Disable Vacuum per table"
Wednesday, September 09. 2009
Printer Friendly
One of the small little treats provided in PostgreSQL 8.4 is the new pg_terminate_backend function. In the
past when we wanted to kill runaway postgresql queries issued by a database or user or hmm us, we would call the pg_cancel_backend function.
The problem with that is it would simply cancel the query in the backend process, but often times the offending application would simply launch the same query again.
In PostgreSQL 8.4 a new function was introduced called pg_terminate_backend. This doesn't completely replace pg_cancel_backend, but basically does
what you do when you go into say a Windows Task manager and kill the offending postgres process or on Linux, you call a kill command on a postgres process. Its nicer
in the sense that you can do it all within PostgreSQL and you can use the pg_stat_activity query to help you out a bit. Also you don't run the risk as easily of
killing the root postgres process and killing the postgres service all together.
Continue reading "Terminating Annoying Back Ends"
Monday, September 07. 2009
Printer Friendly
One of the most common questions people ask is Which tools work with PostgreSQL. In a sense the measure of a database's
maturity/popularity are the number of vendors willing to produce management and development tools for it. Luckily there are a lot of vendors producing tools for PostgreSQL and the list is growing.
One set of tools people are interested in are Database administration, ER diagramming, Query tools, and quickie application generators (RAD).
For this issue of our product showcase, we will not talk about one product, but several that fit in the aforementioned category.
All the listed products work with PostgreSQL and can be used for database administration and/or architecting or provide some sort of
light reporting/rapid application building suite. By light reporting/application building, we mean
a tool with a simple wizard that a novice can use to build somewhat functional applications in minutes or days. This rules out all-purpose development
things like raw PHP, .NET, Visual Studio, database drivers etc. Things we consider in this realm are things like OpenOffice Base and
MS Access. Most of these tools are either free or have 30-day try before you buy options.
You can't really say one tool is absolutely better than another since each has its own strengths and caters to slightly different audiences and also
you may like the way one tool does one important thing really well, though it may be mediocre in other respects. We also left out a lot of products we are not familiar with and may have gotten
some things wrong.
If we left out your favorite product and you feel it meets these criteria, or you feel we made any errors, please let us know, and we'll add or correct it.
We will be including Free open source as well as proprietary products in this mix. If we left out what you consider an
important criteria, please let us know and we'll try to squeeze it in somewhere.
Continue reading "Database Administration, Reporting, and Light application development"
Tuesday, June 30. 2009
Printer Friendly
We have covered this briefly before, but its an important enough concept to cover again in more detail.
Problem: You are running out of disk space on the drive you keep PostgreSQL data on
Solution:
Create a new tablespace on a separate drive and move existing tables to it, or create a new tablespace and use for future tables.
What is a tablespace and how to create a tablespace
A tablespace in PostgreSQL is similar to a tablespace in Oracle and a filegroup in SQL Server. It segments a piece of physical disk space
for use by the PostgreSQL process for holding data. Below are steps to creating a new tablespace. Tablespaces have existed since PostgreSQL 8.0.
More about tablespaces in PostgreSQL is outlined in
the manual PostgreSQL 8.3 tablespaces
While it is possible to create a table index on a different tablespace from the table, we won't be covering that.
Continue reading "Managing disk space using table spaces"
Tuesday, June 09. 2009
Printer Friendly
You'll often hear the term planner statistics thrown around by database geeks. Did you update your statistics. This lingo isn't even limited
to PostgreSQL, but is part and parcel to how most decent databases work. For example in PostgreSQL you do a vacuum analyze to update your planner statistics in addition
to cleaning up dead space. In SQL Server you do an UPDATE STATISTICS . In MySQL you do an
ANALYZE TABLE or a more invasive OPTIMIZE TABLE .
Normally all this "update your stats so your planner can be happy" is usually unnecessary unless
you just did a bulk load or a bulk delete or you are noticing your queries are suddenly slowing down. These stat things are generally updated behind the scenes by most databases
on an as needed basis.
What makes SQL really interesting and a bit different from procedural languages is that it is declarative (like functional and logical programming languages) and relies on the database planner to come up with strategies for navigating the data. Its strategy is not fixed as it is in procedural languages.
A big part of this strategy is decided on by the query planner which looks at distributions of data. Given different WHERE conditions for similar queries, it could come up with vastly different strategies if one value has a significantly
higher distribution in a table than another. This is also the mystery of why it sometimes refuses to use an index on a field because it has decided a table scan is more efficient and also why some people consider HINTS evil because they pollute the imperative nature of the language.
Continue reading "Planner Statistics"
|