Friday, December 07. 2007
One of the annoying things about PostgreSQL unlike some other databases we have worked with is that simple views are not automatically updateable. There is some work involved to make views updateable. For simple views, this is annoying, but for more complex views it is a benefit to be able to control how things are updated. In a later version of PostgreSQL perhaps 8.4 or 8.5 this will be ratified and PostgreSQL will enjoy the same simplicity of creating simple updateable views currently offered by MySQL and SQL Server and other DBMSs, but still allow for defining how things should be updated for more complex views. For this exercise we are using PostgreSQL 8.2.5, but most of it should work for lower versions with slight modification.
For this exercise, we shall create a fairly complex updateable view to demonstrate how one goes about doing this.
Here is a scenario where being able to control how a view is updated comes in very handy.
We all know relational databases are great because they give you great mobility on how you slice and dice information. At times for data entry purposes, the good old simple flat file is just more user-friendly.
Problem: You are developing an inventory application for a molecular biology lab and they have the following requirements:
They have 2 projects going on. One on Multiple Sclerosis Research (MS) and one on Alzheimer's. Each is funded by different grants and for grant cost allocation purposes, they need to keep track of the supplies they use on each project.
In our system we have 2 tables for simplicity. inventory and inventory_flow. I know we should have a project lookup table or in 8.3 possibly use an ENUM, but to make this short, we are skipping that.
Now look at what happens when we insert and update our view
The slick thing about this is if you were to create a linked table in something like say Microsoft Access and designated item_id as the primary key, then the user could simply open up the table and update as normally and behind the scenes the rules would be working to do the right thing.
Thursday, December 06. 2007
PostgreSQL 8.3 is currently in Beta 4 and promises to offer some whoppingly neat features. First before we go over the new features we are excited about in this upcoming release, we'd like to briefly cover what was added in past releases.
The big 8.0 Highlights
8.3 upcoming Highlights
8.3 has numerous highlights just as previous versions, but we shall focus on our favorite ones.
Wednesday, December 05. 2007
One of the features in PostgreSQL designed to enhance index performance is the use of a clustered index. For people coming from MS SQL Server shops, this may look familiar to you and actually serves the same purpose, but is implemented differently and this implementation distinction is very important to understand and be aware of. In PostgreSQL 8.3 the preferred syntax of how you cluster has changed. For details check out 8.3 CLUSTER 8.2 CLUSTER 8.0 CLUSTER. A lot of what I'm going to say is somewhat of a regurgitation of the docs, but in slightly different words.
First in short - clustering on an index forces the physical ordering of the data to be the same as the index order of the index chosen. Since you can have only one physical order of a table, you can have only one clustered index per table and should carefully pick which index you will use to cluster on or if you even want to cluster. Unlike Microsoft SQL Server, clustering on an index in PostgreSQL does not maintain that order. You have to reapply the CLUSTER process to maintain the order. Clustering helps by reducing page seeks. Once an index search is done and found, pulling out the data on the same page is vastly faster since once you find the start point all successive data nearby is easy picking.
As a corrollary to the above, it doesn't help too much for non-range queries. E.g. if you have dummy ids for records and you are just doing single record select queries, clustering is fairly useless to you. It is only really useful if you are doing range queries like between date ranges or spatial ranges or queries where the neighboring data to an index match is likely to be pulled. For example if you have an order items table, then clustering on a compound index such as order_id,order_item_id may prove useful since neighboring data is something you likely want to pull for range and summations.
Now lets see how we create a clustered index and then talk about the pros and gotchas
Once a clustered index is created to force a recluster, you simply do this
To force a cluster on all tables that have clustered indexes, you do this
What is FillFactor and how does it affect clustering?
Again those coming from Microsoft SQL Server will recognize FILLFACTOR syntax. IBM Informix also has a FILLFACTOR syntax that serves the same purpose as the SQL Server and PostgreSQL ones. For more details here PostgreSQL docs: Create Index. FillFactor basically creates page gaps in an index page. So a Fill Factor of 80 means leave 20% of an index page empty for updates and inserts to the index so minimal reshuffering of existing data needs to happen as new records are added or indexed fields are updated in the system. This is incorporated into the index creation statement.
After an index is created on a table, this information is then used in several scenarios
Why should you care?
First for fairly static tables such as large lookup tables, that rarely change or when they change are bulk changes, there is little point in leaving blank space in pages. It takes up disk space and causes Postgres to scan thru useless air. In these cases - you basically want to set your FillFactor high to like 99.
Then there are issues of how data is inserted, if you have only one index and new data usually resides at the end of the index and the indexed field are rarely updated, again having a low fill factor is probably not terribly useful even if the data is updated often. You'll never be using that free space so why have it.
For fairly updated data that changes such that you are randomly adding 10% new data per week or so in middle of page, then a fill factor of say 90 is the general rule of thumb.
Cluster approach benefits and Gotchas
The approach PostgreSQL has taken to cluster means that unlike the SQL Server approach, there is no additional penalty during transactions of having a clustered index. It is simply used to physically order the data and all new data goes to the end of the table. In the SQL Server approach all non-clustered indexes are keyed by the clustered index, which means any change to a clustered field requires rebuilding of the index records for the other indexes and also any insert or update may require some amount of physical shuffling. There are also other consequences with how the planner uses this information that are too detailed to get into.
The Bad and the Ugly
The bad is that since there is no additional overhead aside from the usual index key creation during table inserts and updates, you need to schedule reclustering to maintain your fine order and the clustering causes a table lock. The annoying locking hopefully will be improved in later versions. Scheduling a cluster can be done with a Cron Job or the more OS agnostic PgAgent approach. In another issue, we'll cover how to use PgAgent for backup and other scheduling maintenance tasks such as this.
It is often handy to create indexes that are based on calculations of a function. One reason is that instead of storing the calculated value in the table along with the actual value, you save a bit on table scan speed since your row is thinner and also saves some disk space. It helps search speed if its a common function search.
Case in point, PostgreSQL is case sensitive so in order to do a simple search you will often resort to using upper or ILIKE. In those cases its useful to have an index on upper or lower cased text. Here is an example.
Here is another example taken from PostGIS land. Often times you provide your data in various transformation, but for space savings and row seek reasons, you want to only transform your data to the less used projections as needed. One way to do this is to create functional indexes on the commonly used transformations and create views or just write raw SQL that uses these alternative transformations.
So now when I do a select like this that lists all buildings within 100 meters of my NAD 83 MA Meter State Plane point of interest:
it will use indexes
Friday, November 30. 2007
Perhaps one of the most unique and exciting things that makes PostgreSQL stand out from other database systems, are the numerous choices of languages one can use to create database functions, triggers and define new aggregate functions with. Not only can you use various languages to write your database stored functions with, but often times the code you write lives right in the database. You have no idea how cool this is until you see it in action.
The other interesting thing about the PostgreSQL language architecture is the relative ease with which new languages can be incorporated in the system.
Native Languages of PostgreSQL
There are 3 languages that come packaged with PostgreSQL (2 non-PL ones are installed automatically and not even listed as languages (C and SQL) in the languages section of a db). The defacto PL/PgSQL procedural language is available for install in all PostgreSQL distributions, but need not be installed in a db by default .
The PL languages
Aside from PL/pgSQL there are numerous other procedural languages that one can use to create database stored functions and triggers. Some of these languages are fairly stable and even more are experimental. Some are only supported on Unix/Linux, but many are supported on Unix/Linux/MacOS/windows. In any case there are 3 key components needed before you can start using a new language:
Registering a language in a Database
For pl/pgsql items 1 and 2 are already done if you have a working PostgreSQL install. In order to accomplish item 3, you may need to do the following from psql or PgAdmin III query window.
Alternatively you can run createlang plpgsql somedb from commandline. Note createlang is a command line program that is located in the bin folder of your PostgreSQL install.
To see a list of procedural languages that you already have call handlers registered for in PostgreSQL. These are the languages you can register in your specific database - do a
A Flavor of the Procedural Languages (PLs)
In this section, we'll show a brief sampling of what functions look like written in various PLs. These are not to suggest they are the only ones that exist. For these examples, I'm going to use the $ quoting syntax introduced in PostgreSQL 8.0 which allows for not having to escape out single quotes.
SQL - the not PL language
For basic CRUD stuff,selects and simple functions, nothing hits the spot like just plain old SQL. Since this is such a common choice and often the best choice - here are 3 examples.
For details on using out parameters, check out Robert Treat's out parameter sql & plpgsql examples
PLPGSQL - a real PL Language
For more complex logic and massaging of results before sending back. You need something more powerful than standard SQL. Below are some examples using PLPGSQL.
Using PL/R a language and environment for statistics
One of my favorite PL languages to program is PL/R. The reason for this is that the R statistical environment is such a rich environment for doing statistical processing. It now is also supported on windows as well as Mac and Linux.
To learn more about R and installing PL/R. Check out our Boston GIS article PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide
Below is the classic median aggregate function in R. It uses the native median function in the R environment to create a PostgreSQL aggregate median function
We will be covering PLR in greater detail in another article.
Friday, November 23. 2007
In the next couple of sections we will outline the various things one will find in a PostgreSQL database. Many of these exist in other DBMS systems, but some of these are quite unique to PostgreSQL.
Exploring PostgreSQL with PgAdmin III
PgAdmin III is the Administrative console that comes packaged with PostgreSQL. It works equally well on most OSes - Linux, Unix, Windows, MacOS and any OS supported by WsWidgets. It is an extremely nice and capable management tool. PostgreSQL server comes packaged with this, but if you want to install this on a computer that doesn't have PostgreSQL server installed or you want the bleeding edge version or latest version, I suggest downloading from PgAdmin Site: http://www.pgadmin.org/download/. We will be exploring PostgreSQL with the newest stable releaseof PgAdmin III - 1.8.
When you first launch PgAdmin III and register your postgres server, you may be amazed at the number of things shown. In fact what is shown may not be all the objects that exist in PostgreSQL. PgAdmin III 1.8 and above hides a lot of things by default. For this exercise we will turn these settings on so we can see these objects and explore them.
To do so do the following
The Anatomy Lesson Begins
When you expand the Server tree, you will be first confronted with 4 groups of objects. As outlined below:
In the next couple of sections, we will explore these areas a little deeper.
The first thing you will notice is that there are 3 system databases (databases you did not create) and they are postgres, template0, template1. These are outlined below
Question: Why the heck are their 2 template databases?
template1 is the default template used for new databases. Most people will use template1 as a template for their databases or create more derivative templates. template0 is basically a pristine template unadulterated by any thing except the core postgres stuff. In fact you can not change template0, but you can change template1.
Tablespaces as I mentioned, represent physical locations on disk where things reside. There are 2 tablespaces installed by default:
If you look at the location property of these 2 tablespaces, you will see nothing there. That is because these are always stored in the same location as where you initialized your PostgreSQL database cluster. Tablespaces that are user created on the other hand, can be stored anywhere on any disk and these you will see location information for.
In general there is rarely a reason to create new tablespaces and such unless you are creating a system with massive numbers of users, databases, and intensive queries. Tablespaces gives you the flexibility to leverage OS space in interesting ways - e.g. fast disks for commonly used tables, ability to have multiple disks in different RAID configurations for maximum seek performance, recoverability, or stability etc. Finding Optimum tables placement in 2 tablespace situation by Hubert Lubaczewski is particularly interesting. Also check out Robert Treat's tablespace configuration variable tweaking tips.
There are a couple of facts I would like to close with on the topic of tablespaces.
Group Roles and Login Roles
Prior to PostgreSQL 8.1, there existed Users and Groups, in 8.1 these were deprecated and replaced with Roles in order to be more ANSI compliant. This is actually a simplification of the security model. For more details check out the Chapter 18. Database Roles and Privileges.
I'll summarize a few key facts about Group Roles and Login Roles
Coming Next Database Objects
In the next issue of this journal, we will go over database objects. In fact there are tons of these. I will leave you with a snapshot to wet your appetite.
Wednesday, November 21. 2007
If you have say a set of orders from customers and for each customer you wanted to return the details of the last order for each customer, how would you do it?Answer
In databases that don't support the handy SQL DISTINCT ON or equivalent clause, you'd have to resort to doing subselects and MAXes as we described in SQL Cheat Sheet: Query By Example - Part 2. If you are in PostgreSQL however you can use the much simpler to write DISTINCT ON construct - like so
Sunday, November 18. 2007
A lot of databases structures people setup seem to store dates using the Unix Timestamp format (AKA Unix Epoch). The Unix Timestamp format in short is the number of seconds elapse since january 1, 1970. The PostgreSQL timestamp is a date time format (for those coming from SQL Server and MySQL) and of course is much richer than Unix Timestamp.Question: How do you convert this goofy seconds elapsed since January 1, 1970 to a real date?
PostgreSQL has lots of nice date time functions to perform these great feats. Lets say someone handed us a unix timestamp of the form 1195374767. We can convert it to a real date time simply by doing this:
Now lets say we had a table of said goofy numbers and we prefer real date structures. We can create a view that shields us from this mess by doing the following
To convert back to unix timestamp you can use date_part:
Recommended Books: PostgreSQL Developer's Handbook PostgreSQL 8.2 SQL Language Reference PostgreSQL 8.2 Server Administration Guide PostgreSQL 8.2 Programming Guide
Welcome to PostgreSQL OnLine Journal.
PostgreSQL is an extremely rich object relational database system and has a regal lineage that dates back almost to the beginning of the existence of relational databases.
If we were to look at the family tree of PostgreSQL it would look something like this
(Ingres, System-R) Postgres Illustra Informix IBM Informix Postgres95 PostgreSQL
In fact PostgreSQL is a cousin of the databases Sybase and Microsoft SQL Server because the people that started Sybase came from UC Berkeley and worked on the Ingres and/or Postgres projects with Michael Stonebraker. Later on the source code of Sybase SQL Server was later licensed to Microsoft to produce Microsoft SQL Server.
Here is an interesting diagram done by Oleg Bartunov that shows the various relational database pedigrees.
The main focus of this journal is to educate users and potential users about the numerous capabilities and uses of this powerful database management system.
Over the years we have watched PostgreSQL grow and reach a wider audience. Each day brings newer features, more stability, more environments supported and more Off-the-Shelf (OTS) applications that support this DBMS.
This Journal is a bit of a literary experiment for us. Technology is very fast-paced and we find that most of the new information we ingest these days comes via fast-paced sources such as Blogs and Magazine/Periodical channels. I like the free form of the blog structure and ability to comment, but I also appreciate the more disciplined, carefully categorized, walk away with a booklet format of the Periodical. Our hope is to combine these two literary instruments into a blogo-periodical that has 2 faces:
Since this is what we call a blogo-periodical rather than a plain blog, we shall continually make edits to prior entries that are within the span of our editing issue in progress. So you may find if you are viewing it as a blog, that entries you have already read suddenly change.
After we complete each issue, we hope to provide each journal issue as a downloadable PDF magazine/periodical. Issues in progress or completed will always be available as html ebooks.
In each issue of this journal, we hope to cover the following areas
« previous page (Page 24 of 24, totaling 354 entries)
Syndicate This Blog
Show tagged entries