Friday, December 24. 2010
Recommended Books: PostgreSQL 9 SQL Reference Manual SQL Server MVP Deep Dives High Performance MySQL
Question: You have a table of people and a table that specifies the activities each person is involved in. You want to return a result that has one record per person and a column that has a listing of activities for each person separated by semicolons and alphabetically sorted by activity. You also want the whole set alphabetically sorted by person's name.
This is a question we are always asked and since we mentor on various flavors of databases, we need to be able to switch gears and provide an answer that works on the client's database. Most often the additional requirement is that you can't install new functions in the database. This means that for PostgreSQL/SQL Server that both support defining custom aggregates, that is out as an option.
Normally we try to come up with an answer that works in most databases, but sadly the only solution that works in most is to push the problem off to the client front end and throw up your hands and proclaim -- "This ain't something that should be done in the database and is a reporting problem." That is in fact what many database purists do, and all I can say to them is wake up and smell the coffee before you are out of a job. We feel that data transformation is an important function of a database, and if your database is incapable of massaging the data into a format your various client apps can easily digest, WELL THAT's A PROBLEM.
We shall now document this answer rather than trying to answer for the nteenth time. For starter's PostgreSQL has a lot of answers to this question, probably more so than any other, though some are easier to execute than others and many depend on the version of PostgreSQL you are using. SQL Server has 2 classes of answers neither of which is terribly appealing, but we'll go over the ones that don't require you to be able to install .NET stored functions in your database since we said that is often a requirement. MySQL has a fairly simple, elegant and very portable way that it has had for a really long time.
Continue reading "String Aggregation in PostgreSQL, SQL Server, and MySQL"
Posted by Leo Hsu and Regina Obe in 8.2, 8.3, 8.4, 9.0, cte, db2, intermediate, mysql, oracle, postgresql versions, q&a, sql server, window functions at 11:24 | Comments (14) | Trackbacks (0)
Monday, December 20. 2010
Recommended Books: Free R reference card / Cheat Sheet PostGIS In Action R in Action R In a Nutshell
In Part 2 of PL/R we covered how to build PL/R functions that take arrays and output textual outputs of generated R objects. We then used this in an aggregate SQL query using array_agg. Often when you are building PL/R functions you'll have R functions that you want to reuse many times either inside a single PL/R function or across various PL/R functions.
Unfortunately, if you wanted to call a PL/R function from another PL/R function, this is not possible unless you are doing it from spi.execute call. There is another way to embed reusable R code in a PostgreSQL database. In order to be able to share databases stored R code across various PL/R functions, PL/R has a feature called a plr_module. In this tutorial we'll learn how to create and register shareable R functions with plr_module. In the next part of this series we'll start to explore generating graphs with PL/R.
Continue reading "PL/R Part 3: Sharing Functions across PL/R functions with plr_module"
Friday, December 10. 2010
Recommended Books: Free R reference card / Cheat Sheet PostGIS In Action R in Action R In a Nutshell
In Intro to PL/R and R, we covered how to enable PL/R language in the database and wrote some PL/R functions that rendered plain text reports using the R environment. What makes combining R and PostgreSQL in PL/R most powerful is when you can start writing SQL summary queries that use R functions like any other SQL function.
In this next example, we'll be using PostGIS test runs from tests we autogenerated from the Official PostGIS documentation (Documentation Driven Testing (DDT)) as described in the Garden Test section of the PostGIS Developer wiki. We've also updated some of our logging generator and test patterns so future results may not represent what we demonstrated in the last article.
On a side note: Among the changes in the tests was to introduce more variants of the Empty Geometry now supported by PostGIS 2.0. Our beloved PostGIS 2.0 trunk is at the moment somewhat unstable when working with these new forms of emptiness and stuffing geometries in inappropriate places. At the moment it doesn't survive through the mindless machine gun battery of tests we have mercilessly inflicted. It's been great fun trying to build a better dummy while watching Paul run around patching holes to make the software more dummy proof as the dummy stumbles across questionable but amusing PostGIS use cases not gracefully handled by his new serialization and empty logic.
On yet another side note, it's nice to see that others are doing similar wonderful things with documentation. Check out Euler's comment on catalog tables where he uses the PostgreSQL SGML documentation to autogenerate PostgreSQL catalog table comments using OpenJade's OSX to convert the SGML to XML and then XSL similar to what we did with PostGIS documentation to autogenerate PostGIS function/type comments and as a platform for our test generator.
For our next exercises we'll be using the power of aggregation to push data into R instead of pg.spi.execute. This will make our functions far more reusable and versatile.
Continue reading "PL/R Part 2: Functions that take arguments and the power of aggregation"
Sunday, November 28. 2010
Recommended Books: Free R reference card / Cheat Sheet PostGIS In Action R in Action R In a Nutshell
In this article we'll provide a summary of what PL/R is and how to get running with it. Since we don't like repeating ourselves, we'll refer you to an article we wrote a while ago which is still fairly relevant today called Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide and just fill in the parts that have changed. We should note that particular series was more geared toward the spatial database programmer (PostGIS in particular). There is a lot of overlap between the PL/R, R, and PostGIS user-base which is comprised of many environmental scientists and researchers in need of powerful charting and stats tools to analyse their data who are high on the smart but low on the money human spectrum.
This series will be more of a general PL/R user perspective. We'll follow more of the same style we did with Quick Intro to PL/Python. We'll end our series with a PL/R cheatsheet similar to what we had for PL/Python.
As stated in our State of PostGIS article, we'll be using log files we generated from our PostGIS stress tests. These stress tests were auto-generated from the PostGIS official documentation. The raster tests are comprised of 2,095 query executions exercising all the pixel types supported. The geometry/geograpy tests are comprised of 65,892 spatial SQL queries exercising every PostGIS geometry/geography supported in PostGIS 2.0 -- yes this includes TINS, Triangles,Polyhedral Surfaces, Curved geometries and all dimensions of them. Most queries are unique. If you are curious to see what these log tables look like or want to follow along with these exercises, you can download the tables from here.
What is R and PL/R and why should you care?
R is both a language and an environment for doing statistics and generating graphs and plots. It is GNU-licensed and a common favorite of Universities and Research institutions. PL/R is a procedural language for PostgreSQL that allows you to write database stored functions in R. R is a set-based and domain specific language similar to SQL except unlike the way relational databases treat data, it thinks of data as matrices, lists and vectors. I tend to think of it as a cross between LISP and SQL though more experienced Lisp and R users will probably disagree with me on that. This makes it easier in many cases to tabulate data both across columns as well as across rows. The examples we will show in these exercises, could be done in SQL, but they are much more succinct to write in R. In addition to the language itself, there are a whole wealth of statistical and graphing functions available in R that you will not find in any relational database. These functions are growing as more people contribute packages. Its packaging system called Comprehensive R Archive (CRAN) is similar in concept to Perl's CPAN and the in the works PGXN for PostgreSQL.
Continue reading "Quick Intro to R and PL/R - Part 1"
Tuesday, November 23. 2010
I've always enjoyed dismantling things. Deconstruction was a good way of analyzing how things were built by cataloging all the ways I could dismantle or destroy them. I experimented with mechanical systems, electrical circuitry, chemicals and biological systems sometimes coming close to bodily harm. In later years I decided to play it safe and just stick with programming and computer simulation as a convenient channel to enjoy my destructive pursuits. Now getting to the point of this article.
In later articles, I'll start to demonstrate the use of PL/R, the procedural language for PostgreSQL that allows you to program functions in the statistical language and Environment R. To make these examples more useful, I'll be analyzing data generated from PostGIS tests I've been working on for stress testing the upcoming PostGIS 2.0. PostGIS 2.0 is a major and probably the most exciting release for us. Paul Ramsey did a summary talk recently of Past, Present, Future of PostGIS at State of PostGIS FOSS4G Japan http://www.ustream.tv/recorded/10667125 which provides a brief glimpse of what's in store in 2.0.
Continue reading "The State of PostGIS, Joys of Testing, and PLR the Prequel"
Sunday, November 21. 2010
Backup and Restore is probably the most important thing to know how to do when you have a database with data you care about.
The utilities in PostgreSQL that accomplish these tasks are pg_restore, pg_dump, pg_dumpall, and for restore of plain text dumps - psql.
A lot of the switches used by pg_dump, pg_restore, pg_dumpall are common to all three. You use pg_dump to do backups of a single database or select database objects and pg_restore to restore it either to another database or to recover portions of a database. You use pg_dumpall to dump all your databases in plain text format.
Rather than trying to keep track of which switch works with which, we decided to combine all into a single cheat sheet with a column denoting which utility the switch is supported in. Pretty much all the text is compiled from the --help switch of each.
We created a similar Backup and Restore cheatsheet for PostgreSQL 8.3 and since then some new features have been added such as the jobs parallel restore feature in 8.4. We have now created an updated sheet to comprise all features present in PostgreSQL 9.0 packaged pg_dump, pg_restore, pg_dumpall command line utilities.
As usual please let us know if you find any errors or omissions and we'll be happy to correct.
Sunday, November 14. 2010
Many people have been concerned with Oracle's stewardship of past Sun Microsystems open source projects. There are Java, MySQL, OpenSolaris to name a few. Why are people concerned? Perhaps the abandoning of projects such as OpenSolaris, the suing of Google over Java infringements, the marshalling out of many frontline contributors of core Open Source projects from Oracle, the idea of forking over license rights to a single company so they can relicense your code. We have no idea. All we know is that there is an awful lot of forking going on.
To Oracle's defense, many do feel that they have done a good job with progressing the advancements of some of the Open Source projects they have shepherded. For example getting MySQL patches more quickly in place etc. For some projects where there is not much of a monetary incentive, many feel they have at best neglected e.g. OpenSolaris. Perhaps it's more Oracle's size and the size that Sun was before takeover that has made people take notice that no Open Source project is in stable hands when its ecosystem is predominantly controlled by the whims of one big gorilla.
One new fork we were quite interested to hear about is LibreOffice, which is a fork of OpenOffice. In addition to the fork, there is a new organization called Document Foundation to cradle the new project. Document Foundation is backed by many OpenOffice developers and corporate entities (Google, Novell,Canonical) to name a few. The Document Foundation mission statement is outlined here. There is even a document foundation planet for LibreOfficerians to call home.
The LibreOffice starter screen looks similar to the OpenOffice starter screen, except instead of the flashy Oracle logo we have come to love and fear, it has a simple text Document Foundation below the basic multi-colored Libre Office title. Much the same tools found in OpenOffice are present. The project has not forked too much in a user-centric way from its OpenOffice ancestor yet. The main changes so far are the promise of not having to hand over license assignment rights to a single company as described in LibreOffice - A fresh page for OpenOffice as well as some general cleanup and introduction of plugins that had copy assignment issues such as some from RedHat and Go-OO. My favorite quote listed in the above article is It feels like Oracle is "a mother who loves her child but is not aware that her child wants to walk alone." by André Schnabel. So perhaps Oracle's greatest contribution and legacy to Open Source and perhaps the biggest that any for-profit company can make for an Open Source project is to force its offspring to grow feet to walk away.
In later posts we'll test drive Libreoffice with PostgreSQL to see how it compares to its OO ancestor and what additional surprises it has in store.
Slight correction: Seems unlikely that Oracle will donate the OpenOffice name or join the Document Foundation movement. Details on Oracle kicks LibreOffice supporters out of OpenOffice
Friday, November 05. 2010
Recommended Books: PostgreSQL 8.4 Official The SQL Language MySQL Administrator's Bible SQL Server MVP Deep Dives Oracle 11G Essentials
Problem: You have a set of numbers, or characters or whatever and you are trying to find the max or min of this set?
If the values are separate records in a table or query, the answer is well known and respected across all relational databases -- use the aggregate MAX and MIN functions.
But what if you have a set of free wheeling numbers or text not in separate records, and you want the max or min of each. Here is where the least and greatest functions come in handy.
PostgreSQL has had these functions for as far back as I can remember and is not the only database to sport these marvelous functions. Our beloved MySQL and Oracle database have these functions as well. Sadly our more beloved SQL Server even in the SQL Server 2008 variant - lacks these functions.
Okay how to use these functions -- you use it like this:
We would classify these functions along the lines of COALESCE. They are like COALESCE because they take an arbitrary number of arguments and the datatype that is returned is highest datatype that all arguments in the function can be autocast to. If there is no autocast then well you get an error. To demonstrate, guess what happens when you do this:
Well do this in PostgreSQL at least in 8.3+, you get a nice slap if you haven't installed any deprecated autocasts:ERROR: invalid input syntax for integer: "Kitty" LINE 1: SELECT least('Kitty', -1)
Do this in MySQL - so friendly and forgiving, and great reader of minds and you get:
I apologize for the ambiguous sarcasm, its just sometimes I want my mind read and sometimes I don't and I just can't figure out whether today is one of those days or the other day.
Wednesday, November 03. 2010
I was excited when PostgreSQL 9 Admin Cookbook by Simon Riggs and Hannu Krosing and the companion book PostgreSQL 9 High Performance, by Greg Smith were available. All three authors are well known experts in the PostgreSQL community and consultants at 2ndQuadrant, so you are sure to learn a lot from both books. Both books are published by Packt Publishing and can be bought directly from Packt Publishing or via Amazon. Packt is currently running a 50% off sale if you buy both books (e-Book version) directly from Packt. In addition Packt offers free shipping for US, UK, Europe and select Asian countries. The pair of books make attractive companions.
The main thing I felt missing in this duo was a book dedicated to PostgreSQL: The platform that would cover all the various PL languages and the various neat ways PostgreSQL is used and has been extended by many to do things one would not normally expect of a database. Some day perhaps someone will write such a book.
This article is a review about PostgreSQL 9 Admin Cookbook and we'll be following up later with PostgreSQL 9 High Performance.
This is my first book review. I have a lot of patience for writing, but little patience when it comes to reading. That said, I found PostgreSQL 9 Admin Cookbook an easy and enjoyable read, and a book that I managed to learn more tricks from than I care to admit. It is a handy book to have for reference regardless of if you consider yourself a novice, intermediate or advanced user.
As the book title suggests, it's a cookbook, but a cookbook that combines a question and answer style with a discussion style of writing. The tasks are neatly categorized into 12 chapters and each task smoothly builds on previous tasks discussed. It is still categorized in such a way that you can jump to a particular task you are currently having problems with without having read the other parts of the book.
Although it is titled PostgreSQL 9 -- it covers earlier versions as well.
Continue reading "PostgreSQL 9 Admin Cookbook Book Review"
Friday, October 29. 2010
pgAdmin has this feature called a pgScript. Its a very simple scripting language for running tasks in a pgAdmin SQL window. The documentation is PgScript manual.
Why would you use it over say writing a plpgsql function?
One main reason we use it is to run quick ad-hoc batch jobs such as geocoding addresses and so forth. The benefit it has over running a stored function is that you don't have to run it as a single transaction.
This is important for certain kinds of tasks where you just want to run something in a loop and have each loop commit separately. To us the syntax with the @ resembles SQL Server Transact-SQL more than it does any PostgreSQL language. WhenI first saw pgScript I thought Wow PgAdmin talks Transact-SQL; now -- what will they think of next :).
Continue reading "pgAdmin pgScript"
Monday, October 11. 2010
Recommended Books: PostGIS in Action
One of the neat changes already present in the PgAdmin III 1.13dev, is the change in plugin architecture. Version 1.13 dev allows for multiple plugin*.ini files. How does this work. Well if you have a plugins.d folder in your PgAdmin III version folder, it will read all the inis in that folder and load them as plugins.
Recall in PgAdmin III Plug-in Registration: PostGIS Shapefile and DBF Loader, we demonstrated how to load the PostGIS shapefile and dbf loader as a plugin in PgAdmin III, well this time we will demonstrate how to do it using PgAdmin version 1.13. Better yet, we'll show you the new and improved PgAdmin III Shapefile and DBF Loader in the works for PostGIS 2.0.
Continue reading "PgAdmin III 1.13 - change in plugin architecture and PostGIS Plugins"
Thursday, October 07. 2010
The main use as far as databases go is to ensure uniqueness of keys across databases. This is important if you have multiple servers or disperate systems that need to replicate or share data and each can generate data on its own end. You want some non-centralized mechanism to ensure the ids generated from each server will never overlap. There are various open standards for generating these ids and each standard will tie the id based on some unique identifier of the computer or a namespace or just a purely random generator algorithm not tied to anything. Since this is a question often asked by users coming from Microsoft SQL Server, we will demonstrate in this article the same concept in Microsoft SQL Server and how you would achieve similar functionality in PostgreSQL.
Continue reading "Universal Unique Identifiers PostgreSQL SQL Server Compare"
Tuesday, September 28. 2010
The PostGIS development team is proud to announce the availability of PostGIS 1.5.2. Further details are on the postgis.org website PostGIS 1.5.2 release.
Leo and I are still working on the windows builds. As stated before the 32-bit builds will be out first. We are still preparing our 64-bit test environment on our new 64-bit laptop.
For those who are on 64-bit windows. Sami has some PostGIS 64-bit binaries for PostgreSQL 9.0 windows available on his blog. Though he just has the binaries available so you will need to use the other files from the 32-bit install.
To answer Sami's question, since he has asked it more than once:
I really can't understand why PostGIS developers still want to compile the whole stuff using msys/mingw and that kind of stuff. We have Visual C++ (yes, the compiler is available for free), everything compiles with it and you don't have to whine about how hard it is to compile stuff for Windows because it's not GNU.
Okay we have whined enough. There are talks in the PostGIS and GEOS group of switching to CMake and to have a process that builds said make / project files so that we can more easily support GNU and VS without hopefully not adding too much extra work on anyone's plate. We will see how that goes. Will we compile the 64-bit version under Msys64 or VS -- we would like to do both and compare the 2. :)
UPDATE: We have changed the licensing on the cheat sheet to Creative Commons per request
To celebrate the arrival of the long awaited PostgreSQL 9.0, we have prepared a multi-page PostgreSQL 9.0 cheat sheet that covers prior PostgreSQL constructs plus new 9.0 features. PDF version of this cheat sheet is available at PostgreSQL 9.0 Cheat sheet in PDF 8/12 by 11", PostgreSQL 9.0 Cheat sheet in PDF A4 and the PostgreSQL 9.0 Cheat sheet in HTML.
We took some advice from several readers and this time broke the cheatsheet into multiple pages. Hopefully you won't need magnifying glasses to read this one. We also switched to landscape and put all the examples at the end.
If you find any errors or anything major we left out, please let us know and we'll amend.
Monday, September 20. 2010
Recommended Books: PostGIS In Action
Over the past two weeks, the PostGIS development team has been working hard to get out PostGIS 1.5.2 in time for the PostgreSQL 9.0 release. This release contains fixes allowing PostGIS to compile against 9.0. Due to an unfortunate turn of events, we missed the cut by a couple of days and are currently experiencing technical difficulties with the postgis.org website. These should be resolved soon and barring no further difficulties, we should have the final PostGIS 1.5.2 ready late this week.
On the plus side, we do have a PostGIS 1.5.2 rc1 available for download from our PostGIS Wiki Release Candidate Downloads section. Please feel free to test these out so that we have a smooth release.
Paul's related post is here
Here are the details of what is fixed:
- This is a bug fix release, addressing issues that have been filed since the 1.5.1 release. - Bug Fixes - Loader: fix handling of empty (0-verticed) geometries in shapefiles. (Sandro Santilli) - #536, Geography ST_Intersects, ST_Covers, ST_CoveredBy and Geometry ST_Equals not using spatial index (Regina Obe, Nicklas Aven) - #573, Improvement to ST_Contains geography - Loader: Add support for command-q shutdown in Mac GTK build (Paul Ramsey) - #393, Loader: Add temporary patch for large DBF files (Maxime Guillaud, Paul Ramsey) - #507, Fix wrong OGC URN in GeoJSON and GML output (Olivier Courtin) - spatial_ref_sys.sql Add datum conversion for projection SRID 3021 (Paul Ramsey) - Geography - remove crash for case when all geographies are out of the estimate (Paul Ramsey) - #469, Fix for array_aggregation error (Greg Stark, Paul Ramsey) - #532, Temporary geography tables showing up in other user sessions (Paul Ramsey) - #562, ST_Dwithin errors for large geographies (Paul Ramsey) - #513, shape loading GUI tries to make spatial index when loading DBF only mode (Paul Ramsey) - #527, shape loading GUI should always append log messages (Mark Cave-Ayland) - #504 shp2pgsql should rename xmin/xmax fields (Sandro Santilli) - #458 postgis_comments being installed in contrib instead of version folder (Mark Cave-Ayland) - #474 Analyzing a table with geography column crashes server (Paul Ramsey) - #581 LWGEOM-expand produces inconsistent results (Mark Cave-Ayland) - #471 DocBook dtd errors (Olivier Courtin) - Fix further build issues against PostgreSQL 9.0 (Mark Cave-Ayland) - #572 Password whitespace for Shape File to PostGIS Import not supported (Mark Cave-Ayland) - #603 shp2pgsql: "-w" produces invalid WKT for MULTI* objects. (Mark Cave-Ayland) - Enhancement - #513 Add dbf filter to shp2pgsql-gui and allow uploading dbf only (Paul Ramsey)
We should have windows binaries available a short time after release. Unfortunately we do not have the 64-bit windows build ready yet, so you still have to use the 32-bit version of PostgreSQL 9.0 if you need PostGIS on windows.
Syndicate This Blog
Show tagged entries