Friday, February 08. 2008
One of our favorite features of PgAdmin is the graphical explain plan feature. While a graphical explain plan is not a complete substitute for EXPLAIN or EXPLAIN ANALYZE text plans, it does provide a quick and easy to read view that can be used for further analysis. In this article, we'll walk thru using the explain plan to troubleshoot query performance.
To use the graphical explain plan feature in PgAdmin III - do the following
Continue reading "Reading PgAdmin Graphical Explain Plans"
Monday, February 04. 2008
Recommended Books: PostGIS In Action PostgreSQL 8.4 Official The SQL Language PostgreSQL 8.4 Server Administration
Question: How do you move tables and views from one schema to another?
Often times when you start a new database, you put all your tables and views in the public schema. For databases with few tables and all relatively commonly grouped data, the benefits of using schemas for logical groupings vs. the downside of having to reference it with the schema name is more trouble than its worth.
As time goes by and with organic growth, this simple database you had that does one thing suddenly starts doing a lot of other things you hadn't initially planned for. Now to keep your sanity you really need the benefit of schemas for logical groupings. How do you retroactively do this? The answer is not quite as easy as one would hope. Ideally you would want to do a RENAME from public.sometable to newschema.sometable, but that doesn't work.
Continue reading "Moving tables from one schema to another"
Friday, February 01. 2008
Recommended Books: PostGIS In Action PostgreSQL 8.4 Official The SQL Language PostgreSQL 8.4 Server Administration
Welcome to our third Issue (March 2008) of Postgres Online Journal. We folded January and February together because February is a short month and we felt we had already covered quite a bit of ground. This also gives us time to fill March.
We have made a couple of changes to the underlying PDF output structure that we would like to cover.
In this issue as mentioned we hope to cover creating a lite Representational State Transfer (REST) application to showcase the new Full Text and XML features introduced in 8.3. Someone suggested we create a Java based server version to compliment our proposed ASP.NET and PHP versions since a lot of PostgreSQL users are Java Programmers. It has been a while since we've programmed with Java Servlets and JSP so not sure if we'll embarass ourselves too much by attempting such an example, but we'll think about it.
We are looking forward to PostgreSQL 8.3 and have started to test out 8.3 RC2 with our existing apps. So far so good, except for some little glitch we had using PostGIS in conjunction with UMN Mapserver. That particular problem seems to be more an issue with the UMN Mapserver Postgis driver with the way its reading the version number in PostgreSQL than anything else (the RC seems to be throwing it off). This issue only affects our more complicated view-based layers and we suspect will be a non-issue when PostgreSQL 8.3 has a bonafide version name e.g. 8.3.0 instead of 8.3 RC2. Aside from that we have noticed speed improvements among other things.
Check out Stefan Kaltenbrunner's Feature Matrix described here and displayed here. It is really quite a useful chart for those thinking of upgrading or wondering why they should or planning to move over from some other DBMS and waiting for a specific feature to be available in Postgres before they can make the jump.
Thursday, January 31. 2008
Recommended Books: Access 2007 - The Missing Manual Access 2007 for Dummies Access 2007 VBA Programming for Dummies
Many in the PostgreSQL community use Microsoft Access as a front-end to their PostgreSQL databases. Although MS Access is strictly a windows application and PostgreSQL has its roots in Unix, the two go well together. A large part of that reason is because the PostgreSQL ODBC driver is well maintained and has frequent updates. You can expect one new ODBC driver release every 4-6 months. There exist only 32-bit production quality drivers. The 64-bit driver is of alpha quality. In addition to other front-ends to PostgreSQL that utilize the ODBC driver used by Windows developers, there is VB 6 (VB.NET/C# use the ADO.NET driver also very well maintained), Visual FoxPro, Delphi, to name a few).
People who have never used Microsoft Access or anything like it and consider themselves hard-core programmers or database purists, dismiss Microsoft Access as a dangerous child's toy, causing nothing but grief when real programmers and database administrators have to debug the disorganized mess of amateurs. They dream of the day when this nuisance is rid of and their company can be finally under the strict bureaucratic control of well-designed apps that no one cares to use.
Beneath the croft of this dinkiness/dangerous toy is a RAD and Reporting tool that
can connect to any database with an ODBC or ADO driver. It serves the unique niche of
Continue reading "Using MS Access with PostgreSQL"
Thursday, January 24. 2008
In the first part of this series, The Anatomy of PostgreSQL - Part 1, we covered PostgreSQL Server object features. In this part, we shall explore the database and dissect the parts.
Here we see a snapshot of what a standard PostgreSQL database looks like from a PgAdmin interface.
Continue reading "The Anatomy of PostgreSQL - Part 2 - Database Objects"
Friday, January 18. 2008
What is PgAgent?
PgAgent is a basic scheduling agent that comes packaged with PgAdmin III (since pre-8.0 or so) and that can be managed by PgAdmin III. PgAdmin III is the database administration tool that comes packaged with PostgreSQL. For those familiar with unix/linux cronjobs and crontab structure, PgAgent's scheduling structure should look very familiar. For those familiar with using Microsoft SQL Server Scheduling Agent or Windows Scheduling Tasks, but not used to crontab structure, the PgAdmin III Job Agent interface to PgAgent should look very welcoming, but the schedule tab may look a little unfamiliar.
PgAgent can run both PostgreSQL stored functions and sql statements as well as OS shell commands and batch tasks.
Continue reading "Setting up PgAgent and Doing Scheduled Backups"
Wednesday, January 16. 2008
Recommended Books: MySQL High Performance Optimization, Backups, Replication
MySQL and Sun?
We just read that Sun is purchasing MySQL for a little under a billion. We are a little shocked and not quite sure what to make of it or how this affects Sun's investment in PostgreSQL. Further comments on the deal on Jignesh Shah's blog and Josh Berkus blogs. Jignesh and Josh both work at Sun and do PostgreSQL work as well.
Couple of random thoughts
Overall we think the move should prove positive for both camps.
PostgreSQL 8.3 really around the corner
8.3 is now on release candidate 8.3 RC1 and as Bruce Momjian noted, it looks like there might be an RC2.
We've been playing around with the 8.3 betas and RCs and really like the integrated Full Text Indexing and XML features. The new features make it possible to do a quickie REST service-based application. In the next issue of this journal, we hope to demonstrate creating REST services using 8.3 with server side - (PHP and/or ASP.NET) and front-end Adobe FLEX. We would have liked to demonstrate SilverLight/MoonLight as well, but we want to wait till Silverlight 2.0 hits release. We'll try to use the Pagila demo database for the upcoming demo app as Robert Treat has suggested.
Saturday, January 12. 2008
How do you delete duplicate rows in a table and still maintain one copy of the duplicate?Answer:
There are a couple of ways of doing this and approaches vary based on how big your table is, whether you have constraints in place, how programming intensive you want to go, whether you have a surrogate key and whether or not you have the luxury of taking a table down. Approaches vary from using subselects, dropping a table and rebuilding using a distinct query from temp table, and using non-set based approaches such as cursors.
The approach we often use is this one:
Continue reading "Deleting Duplicate Records in a Table"
Wednesday, January 09. 2008
Question: What is the answer to SELECT 3/2?
Answer: In integer math, it is 1. A lot of people especially those coming from MySQL or MS Access backgrounds are surprised to find out that in PostgreSQL
In actuality, the fact that 3/2 = 1 and 1/3 = 0 is part of the ANSI/ISO-SQL standard that states mathematical operations between two values must be of the same data type of one of the values (not necessarily the same scale and precision though). This is not some idiosyncracy specific to PostgreSQL. If you try the same operation in SQL Server, SQLite,FireBird, and some other ANSI/ISO SQL compliant databases, you will get the same results. So it seems MySQL and MS Access are the odd-balls in this arena, but arguably more practical.
Continue reading "SQL Math Idiosyncracies"
Saturday, January 05. 2008
Below is a Thumbnail view of a PostgreSQL 8.3 Cheat Sheet that covers prior PostgreSQL constructs plus new 8.3 features. PDF version of this cheat sheet is available at PostgreSQL 8.3 Cheat sheet in PDF 8/12 by 11", PostgreSQL 8.3 Cheat sheet in PDF A4 and the PostgreSQL 8.3 Cheat sheet in HTML.
We apologize for the small size of the fonts. We tried to keep it to a minimum but had trouble deciding what to leave out to keep it to a one page sheet. So perhaps it would have been better as a 2 part cheatsheet. Anyrate we hope people find it useful.
For people who love cheat sheets, here are two other PostgreSQL cheat sheets on the web which we have found useful and cover slightly different set of things than we do in ours.
Wednesday, January 02. 2008
Welcome to the January 2008 Issue of Postgres OnLine Journal Magazine. In this issue we will have a special feature PostgreSQL 8.3 Cheatsheet to commemorate the upcoming PostgreSQL 8.3 release and the new year. This cheat sheet will look similar in format to the Postgis Cheatsheet and will cover standard PostgreSQL features as well as new features added to the 8.3 release.
In future issues we hope to provide similar cheatsheets that highlight certain PostgreSQL advanced and specialty features. Any thoughts on what topics people would like to see in a cheatsheet are welcome.
Other interesting topics that will be covered in this issue to name a few
On another note - check out Andrew Dunstan's, minimum update Trigger. It will be nice to see this make it into the PostgreSQL 8.4 release. Granted we haven't had much of a need of this feature, but when you need it, it comes in very handy as demonstrated in Hubert Lubaczewski's related article Avoiding Empty Updates. We remember the first time we started working on MySQL a long long time ago - MySQL had this built in, but you couldn't turn it off. In certain situations such as when you have triggers this feature is often a misfeature. Granted I guess there are only a few cases where having this automatically on could be annoying especially when all the other Databases you work with don't do this and there is probably some overhead involved with checking which may not always outweigh the update/logging cost. Any rate as far as check-off lists goes for people who consider this a feature, it will be nice to cross this off the list as one reason why one would choose MySQL over PostgreSQL and better yet in PostgreSQL it is optional.
Monday, December 31. 2007
Have you ever noticed that in PostgreSQL you can put set returning functions in the SELECT part of an sql statement if the function is written in language SQL or C. Try the same trick for PL written functions such as plpgsql, plperl, plr etc, and you get a slap on the wrist of the form ERROR: set-valued function called in context that cannot accept a set. For Plpgsql and other PL languages you must put the set returning function in the FROM clause.
Below is a simple example:
So it appears that PostgreSQL is not quite as democratic as we would like.
All interesting, but so what? you may ask. It is bad practice to put set returning functions in a SELECT clause. Such things are commonly mistakes and should be avoided.
Functional Row Expansion
It turns out that there are a whole class of problems in SQL where the simplest way to achieve the desired result is via a technique we shall call Functional Row Expansion. By that, we mean that for each record in a given set, we want to return another set of records that can not be expressed as a constant join expression. Basically the join expression is different for each record or the function we want to apply is too complicated to be expressed as a static join statement or join at all.
Taking the above example. Lets say for each record in test, you want to return the 4 records preceding including the current one. So basically you want to explode each row into 5 or fewer rows. Your general gut reaction would be do something as follows:
these give error: ERROR: function expression in FROM cannot refer to other relations of same query level
--But this does what you want
Keep in mind what makes the above tricky is that you want to return at most 4 of the preceding plus current. If you want to return all the preceding plus current, then you can do a trivial self join as follows:
So as you can see - its sometimes tricky to tell when you need to use this technique and when you don't.
For this trivial example, writing the function as an SQL only function works fine and is the best to use. SQL functions unfortunately lack the ability to define dynamic sql statements, among other deficiencies so resorting to using a pl language is often easier which means you lose this useful feature of sql functions. Stuffing a pl function in an SQL function just might do the trick. We haven't tried this on other pl languages except plpgsql, but we suspect it should work the same.
Friday, December 28. 2007
Recommended Books: Fundamentals of Database Design
Question: Does PostgreSQL support stored procedures?
Short Answer: Sort Of as Stored functions.
For all intents and purposes, PostgreSQL has less of a need for CREATE PROCEDURE than other databases aside from looking more like other databases. For example in SQL Server -> 2005 - although you can write functions that return tables and so forth, you have to resort to writing CLR functions marked as unsafe to actually update data in a stored function. This gets pretty messy and has its own limitations so you have no choice but to use a stored procedures, which can not be called from within an SQL query. In MySQL 5.1 the abilities of functions are even more limiting - they can't even return a dataset. In PostgreSQL, you can write a function marked as VOLATILE that updates data and that can do all sorts of wacky things that are useful but considered by some to be perverse such as the following:
Another thing stored procedures can usually do that functions can not is to return multiple result sets. PostgreSQL can simulate such behavior by creating a function that returns a set of refcursors. See this .NET example Getting full results in a DataSet object: Using refcursors way down the page, that demonstrates creating a postgresql function that returns a set of refcursors to return multiple result sets using the Npgsql driver.
Prior to PostgreSQL 8.1, people could yell and scream, but PostgreSQL doesn't support Output Parameters. As weird as it is for a function to support such a thing, PostgreSQL 8.1+ do support output parameters and ODBC drivers and such can even use the standard CALL interface to grab those values.
At a glance it appears that PostgreSQL functions do all that stored procedures do plus more. So the question is, is there any reason for PostgreSQL to support bonafide stored procedures aside from the obvious To be more compatible with other databases and not have to answer the philosophical question, But you really don't support stored procedures?.
There must be some efficiency benefits to declaring something as a store procedure and having it called in that way. Not quite sure if anyone has done benchmarks on that. So for the time being PostgreSQL functions have the uncanny role of having a beak like a duck and the flexibility of a beaver, but having the makeup of a Platypus.
Thursday, December 27. 2007
Recommended Books: PostgreSQL 8.4 Internals and Appendixes (contribs) SQL Queries for Mere Mortals SQL Visual Quick Start
The generic way of doing cross tabs (sometimes called PIVOT queries) in an ANSI-SQL database such as PostgreSQL is to use CASE statements which we have documented in the article What is a crosstab query and how do you create one using a relational database?.
In this particular issue, we will introduce creating crosstab queries using PostgreSQL tablefunc contrib.
Tablefunc is a contrib that comes packaged with all PostgreSQL installations - we believe from versions 7.4.1 up (possibly earlier). We will be assuming the one that comes with 8.2 for this exercise. Note in prior versions, tablefunc was not documented in the standard postgresql docs, but the new 8.3 seems to have it documented at http://www.postgresql.org/docs/8.3/static/tablefunc.html.
Often when you create crosstab queries, you do it in conjunction with GROUP BY and so forth. While the astute reader may conclude this from the docs, none of the examples in the docs specifically demonstrate that and the more useful example of crosstab(source_sql,category_sql) is left till the end of the documentation.
To install tablefunc simply open up the share\contrib\tablefunc.sql in pgadmin and run the sql file. Keep in mind that the functions are installed by default in the public schema.
If you want to install in a different schema - change the first line that reads
Alternatively you can use psql to install tablefunc using something like the following command:
We will be covering the following functions
There are a couple of key points to keep in mind which apply to both crosstab functions.
Setting up our test data
For our test data, we will be using our familiar inventory, inventory flow example. Code to generate structure and test data is shown below.
Using crosstab(source_sql, category_sql)
For this example we want to show the monthly usage of each inventory item for the year 2007 regardless of project. The crosstab we wish to achieve would have columns as follows: item_name, jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec
--Resulting crosstab query --Note: For this we don't need the order by month since the order of the columns is determined by the category_sql row order
The output of the above crosstab looks as follows:
crosstab(source_sql) is much trickier to understand and use than the crosstab(source_sql, category_sql) variant, but in certain situations and certain cases is faster and just as effective. The reason why is that crosstab(source_sql) is not guaranteed to put same named buckets in the same columns especially for sparsely populated data. For example - lets say you have data for CSCL for Jan Mar Apr and data for Phenol for Apr. Then Phenols Apr bucket will be in the same column as CSCL Jan's bucket. This in most cases is not terribly useful and is confusing.
To skirt around this inconvenience one can write an SQL statement that guarantees you have a row for each permutation of Item, Month by doing a cross join. Below is the above written so item month usage fall in the appropriate buckets.
In actuality the above query if you have an index on action_date is probably more efficient for larger datasets than the crosstab(source, category) example since it utilizes a date range condition for each month match.
There are a couple of situations that come to mind where the standard behavior of crosstab of not putting like items in same column is useful. One example is when its not necessary to distiguish bucket names, but order of cell buckets is important such as when doing column rank reports. For example if you wanted to know for each item, which projects has it been used most in and you want the column order of projects to be based on highest usage. You would have simple labels like item_name, project_rank_1, project_rank_2, project_rank_3 and the actual project names would be displayed in project_rank_1, project_rank_2, project_rank_3 columns.
Output of the above looks like:
Tricking crosstab to give you more than one row header column
Recall we said that crosstab requires exactly 3 columns output in the sql source statement. No more and No less. So what do you do when you want your month crosstab by Item, Project, and months columns. One approach is to stuff more than one Item in the item slot by either using a delimeter or using an Array. We shall show the array approach below.
Result of the above looks as follows:
Building your own custom crosstab function
If month tabulations are something you do often, you will quickly become tired of writing out all the months. One way to get around this inconvenience - is to define a type and crosstab alias that returns the well-defined type something like below:
Then you can write the above query as
Adding a Total column to the crosstab query
Adding a total column to a crosstab query using crosstab function is a bit tricky. Recall we said the source sql should have exactly
3 columns (row header, bucket, bucketvalue). Well that wasn't entirely accurate. The crosstab(source_sql, category_sql) variant of the function
allows for a source that has columns row_header, extraneous columns, bucket, bucketvalue.
Don't get extraneous columns confused with row headers. They are not the same and if you try to use it as we did for creating multi row columns, you will
be leaving out data. For simplicity here is a fast rule to remember.
Resulting output of our cross tabulation with total column looks like this:
If per chance you wanted to have a total row as well you could do it with a union query in your source sql. Unfotunately PostgreSQL does not support windowing functions that would make the row total not require a union. We'll leave that one as an exercise to figure out.
Another not so obvious observation. You can define a type that say returns 20 bucket columns, but your actual crosstab need not return up to 20 buckets. It can return less and whatever buckets that are not specified will be left blank. With that in mind, you can create a generic type that returns generic names and then in your application code - set the heading based on the category source. Also if you have fewer buckets in your type definition than what is returned, the right most buckets are just left off. This allows you to do things like list the top 5 colors of a garment etc.
Tuesday, December 18. 2007
Recommended Books: Getting Started with OpenOffice.org 3 written by OO.org group Database Programming with OO Base
For those who are not familiar with OpenOffice Base. OpenOffice Base is the equivalent of Microsoft Access in the OpenOffice Open source suite. While it is not as feature rich as Microsoft Access, it has been getting increasingly better and has some unique features that even Microsoft Access lacks. Unfortuantely you can't just convert an access mdb to its format like you can with other Open office suite products - Word to Writer Writer to Word etc. However you can open MS Access databases in OOBase, but you can't take advantage of the forms and reports in an MS Access Database.
One thing I always liked about Microsoft Access was the ease with which you could link to various different kinds of datasources and generate rapid queries and so forth. Microsoft Access has a particular feature called Access Projects which ties it very closely with Microsoft SQL Server. What an MS Access Project does is connect you with a specific SQL Server database and allow you to browse all the objects, create forms and reports etc against the objects etc. Unfortunately MS Access Project only works with SQL Server. For other datasources you need to use linked tables and can't make design changes and browse a database as you can with Access Projects.
We had looked at Openoffice Base a while ago and thought they are making progress, but still not quite good enough to put to daily use. When we revisited Open Office Base recently, we were surprised to find a couple of neat nuggets.
In the next couple of sections we'll lay out how to quickly setup OpenOffice, install the native PostgreSQL driver and JDBC PostgreSQL driver and connect to a PostgreSQL database in OpenOffice Base. Please forgive us for using Windows paths in this. We figured it would be easier for people to follow since most users coming to this site are windows users and a lot of Linux users already use OO and paths are too different from Linux/Mac OSX installs.
Continue reading "Using OpenOffice Base 2.3.1 with PostgreSQL"
Syndicate This Blog
Show tagged entries