Saturday, February 19. 2011
Printer Friendly
QuestionYou have a system of products and categories and you want a product to be allowed to be in multiple categories, but you want a product to only be allowed to be in one main category.
How do you enforce this rule in the database?
Some people will say -- why can't you just deal with this in your application logic. Our general reason is that much of our updating doesn't happen at our application level. We like enforcing rules at the database
level because it saves us from ourselves. We are in the business of massaging data. For this particular kind of example we wanted to make sure the database would provide us a nice safety net so that
we wouldn't accidentally assign a product in two main categories.
Answer
There are two approaches we thought of. One is the obvious have a primary category column and a bridge table that has secondary categories. That is an ugly solution because when you do a query you have to do a union
and always treat the secondary categories as different from the main. For most use-cases we don't usually care about distinguisihing primary from secondary category.
The solution we finally settled on was to have one bridge table with a boolean field for if its the main category. We enforce the only one main category requirement using a partial index. Now not all databases support partial indexes
This is one major value of using PostgreSQL that you have so many more options for implementing logic.
As some people noted in the comments and the reddit entry. SQL Server 2008 has a similar feature called Filtered Indexes. Though PostgreSQL has had partial indexes for as far back as I can remember. This brings up an interesting point which I have observed -- if you were using PostgreSQL before, you would already know how to use the Filtered Indexes, Multi row inserts introduced in SQL Server 2008, and the SEQUENCES feature coming in SQL Server 2010. So we should all use PostgreSQL, because it teaches us how to use the newer versions of SQL Server before they come out. :)
So how does the partial index solution look: NOTE for simplicity, we are leaving out all the complimentary tables and the foreign key constraints that we also have in place.
CREATE TABLE products_categories
(
category_id integer NOT NULL,
product_id integer NOT NULL,
main boolean NOT NULL DEFAULT false,
orderby integer NOT NULL DEFAULT 0,
CONSTRAINT products_categories_pkey PRIMARY KEY (category_id, product_id)
);
CREATE UNIQUE INDEX idx_products_categories_primary
ON products_categories
USING btree
(product_id)
WHERE main = true;
Testing it out. It saves us and gives us a nice informative message to boot.
INSERT INTO products_categories(category_id, product_id, main)
VALUES (1,2,true), (3,2,false), (3,3,true), (4,2,true);
ERROR: duplicate key value violates unique constraint "idx_products_categories_primary"
DETAIL: Key (product_id)=(2) already exists.
Tuesday, January 18. 2011
Printer Friendly
In our last article we talked about String Aggregation implementing in PostgreSQL, SQL Server, and MySQL. This task is one that makes purist relational database programmers
a bit squeamish. In this article we'll talk about the reverse of that, how do you deal with data that someone hands you delimeted in a single field and that you are asked to explode or re-sort based on some lookup table.
What are the benefits of having a structure such as? : p_name | activities
--------+--------------------------------
Dopey | Tumbling
Humpty | Cracking;Seating;Tumbling
Jack | Fishing;Hiking;Skiing
Jill | Bear Hunting;Hiking
Well for the casual programmer or simple text file database that knows nothing about JOINS and so forth, it makes it simple to pull a list of people who like Tumbling.
You simply do a WHERE ';' || activities || ';' LIKE '%;Tumbling;%'. It's great for security too because you can determine security with a simple like check and also list all the security groups a member belongs in without doing anything.
Quite easy for even the least data-skilled of programmers to work with because most procedural languages have a split function that can easily parse these into an array useful for stuffing into drop down lists and so forth. As a consultant of semi-techie people
I'm often torn by the dilemma of "What is the way I would program for myself vs. the way that provides the most autonomy to the client". By that I mean
for example I try to avoid heavy-weight things like Wizards that add additional bloated dependencies or slow the speed down of an application. These bloated dependencies may provide ease to the client but make my debugging life harder. So I weight the options
and figure out which way works now and also provides me an easy escape route should things like speed or complexity become more of an issue.
This brings us to the topic of, what is wrong with this model? It can be slow because the LIKE condition you have can't easily take advantage of an index unless using a full text index so not ideal where this is the primary filtering factor. It's also prone to pollution because
you can't easily validate that the values in the field are in your valid set of lookups or if your lookup changes, the text can be forced to change with a CASCADE UPDATE/DELETE RULE etc. In cases where this is of minor consequence
which is many if referential integrity is not high on your list of requirements, this design is not bad. It might make a purist throw up but oh well there is always dramamine to fall back on. As long as you have done your cost benefit analysis, I don't think there should be any shame of following this
less than respected route.
While you may despise this model, it has its place and it's a fact of life that one day someone will hand it to you and you may need to flip it around a bit. We shall demonstrate how to do that in this article.
Continue reading "Reverse String Aggregation: Explode concatenated data into separate rows In PostgreSQL, SQL Server and MySQL"
Sunday, November 21. 2010
Printer Friendly
Backup and Restore is probably the most important thing to know how to do when you have a database with data you care about.
The utilities in PostgreSQL that accomplish these tasks are pg_restore, pg_dump, pg_dumpall, and for restore of plain text dumps - psql.
A lot of the switches used by pg_dump, pg_restore, pg_dumpall are common to all three. You use pg_dump to do backups of a single database or select database objects and pg_restore to restore it either to another database or to recover portions of a database. You use pg_dumpall to dump all your databases in plain text format.
Rather than trying to keep track of which switch works with which, we decided to combine all into a single cheat sheet with a column denoting which utility the switch is supported in.
Pretty much all the text is compiled from the --help switch of each.
We created a similar Backup and Restore cheatsheet for PostgreSQL 8.3 and since then some new features have been added such as the jobs parallel restore feature in 8.4. We have now created an updated sheet to comprise all features present in PostgreSQL 9.0 packaged pg_dump, pg_restore, pg_dumpall command line utilities.
PDF Portrait version 8.5 x 11" of this cheatsheet is available at PostgreSQL 9.0 Dump Restore 8.5 x 11 and also available in
PDF A4 format and HTML.
As usual please let us know if you find any errors or omissions and we'll be happy to correct.
Friday, November 05. 2010
Printer Friendly
Problem: You have a set of numbers, or characters or whatever and you are trying to find the max or min of this set?
If the values are separate records in a table or query, the answer is well known and respected across all relational databases -- use the aggregate MAX and MIN functions.
But what if you have a set of free wheeling numbers or text not in separate records, and you want the max or min of each. Here is where the
least and greatest functions come in handy.
PostgreSQL has had these functions for as far back as I can remember and is not the only database to sport these marvelous functions. Our beloved MySQL and Oracle database have these functions as well. Sadly our more beloved SQL Server even in the SQL Server 2008
variant - lacks these functions.
Okay how to use these functions -- you use it like this:
SELECT least(1,-2,5) As num_least, greatest('Bobby', 'Catty', 'Kitty') As greatest_cat;
Yields:
num_least| greatest_cat
----------+----------
-2 | Kitty
We would classify these functions along the lines of COALESCE. They are like COALESCE because they take an arbitrary number of arguments and the datatype that is returned
is highest datatype that all arguments in the function can be autocast to. If there is no autocast then well you get an error. To demonstrate, guess what happens when you do this:
SELECT least(-1, 'Kitty');
Well do this in PostgreSQL at least in 8.3+, you get a nice slap if you haven't installed any deprecated autocasts:
ERROR: invalid input syntax for integer: "Kitty"
LINE 1: SELECT least('Kitty', -1)
Do this in MySQL - so friendly and forgiving, and great reader of minds and you get:
-1
OF COURSE
I apologize for the ambiguous sarcasm, its just sometimes I want my mind read and sometimes I don't and I just can't figure out whether today is one of those days or the other day.
Monday, October 11. 2010
Printer Friendly
One of the neat changes already present in the PgAdmin III 1.13dev, is the change in plugin architecture. Version 1.13 dev allows for multiple plugin*.ini files. How does this work.
Well if you have a plugins.d folder in your PgAdmin III version folder, it will read all the inis in that folder and load them as plugins.
Recall in PgAdmin III Plug-in Registration: PostGIS Shapefile and DBF Loader,
we demonstrated how to load the PostGIS shapefile and dbf loader as a plugin in PgAdmin III, well this time we will demonstrate how to do it using PgAdmin version 1.13. Better yet, we'll show you the new and improved
PgAdmin III Shapefile and DBF Loader in the works for PostGIS 2.0.
Continue reading "PgAdmin III 1.13 - change in plugin architecture and PostGIS Plugins"
Thursday, October 07. 2010
Printer Friendly
Universal Unique Identifiers are 16-byte / 32-hexadecimal digit (with 4 -s for separation)
identifiers standardized by the Open Software Foundation.
The main use as far as databases go is to ensure uniqueness of keys across databases. This is important if you have multiple servers or disperate systems that need to replicate or share
data and each can generate data on its own end. You want some non-centralized mechanism to ensure the ids generated from each server will never overlap.
There are various open standards
for generating these ids and each standard will tie the id based on some unique identifier of the computer or a namespace or just a purely random generator algorithm not tied to anything.
Since this is a question often asked by users coming from Microsoft SQL Server, we will demonstrate in this article the same concept in Microsoft SQL Server and how you would achieve similar functionality in PostgreSQL.
Continue reading "Universal Unique Identifiers PostgreSQL SQL Server Compare"
Thursday, August 12. 2010
Printer Friendly
This is a question that comes up quite often by windows users, so thought we would share how we normally do it. The question is can you run a PostgreSQL server on your windows desktop/server box without having to install anything?
The answer is yes and quite easily. Why would you need to do this. There are a couple of cases -- one you are developing a single user app that you want users to be able to run from anywhere without having to install it first.
The other common reason is, you aren't allowed to install anything on a user's pc and you also want to package along a database you already have created.
For our purposes, many of our developers develop on portable WAMP like things, and for some of our applications, they need to work in both MySQL and PostgreSQL, so we need an easy way during development to swap one out for the other.
Continue reading "Starting PostgreSQL in windows without install"
Thursday, July 29. 2010
Printer Friendly
One of the new features of PostgreSQL 9.0 is the ability to specify the format of an explain plan.
In prior versions your only choice was text (and graphic explain with tools like PgAdmin III and other GUIS), but in 9.0 on, you have the additional options of
Javascript Object Notation (JSON) which some people have a thing or two to say about them, YAML Ain't Markup Language (YAML) or eXtended Markup Language (XML). The new explain options are itemized in PostgreSQL 9.0 EXPLAIN.
The main benefit of the JSON, XML, YAML formats is that they are easier
to machine parse than the default text version. This will allow for creative renderings of planner trees with minimal coding.
In Part 1 of this series, we'll demonstrate how to output the plans in these various formats and what they look like.
In later parts of this series -- we'll demonstrate how to use Javascript, XSL and other scripting/markup languages
to transform these into works of art you can hang on your wall.
-- START POSTGIS IN ACTION ASIDE --
We just submitted the third major revision of Chapter 3 Data Modeling
of our upcoming PostGIS in Action book.
The second major revision we never submitted and threw it out because it wasn't worldly enough and was too involved. We may
use it later on for an example.
Chapter 3 should be up on Manning Early Access Program (MEAP) soon. If you haven't bought the book yet Buy now.
You don't want to miss out on a major masterpiece in the making. Okay we exaggerate a bit.
-- END POSTGIS IN ACTION ASIDE --
Continue reading "Explain Plans PostgreSQL 9.0 Text, JSON, XML, YAML - Part 1: You Choose"
Friday, July 23. 2010
Printer Friendly
When it comes to naming things in databases and languages, there are various common standards. For many languages the
camel family of namings is very popular. For unix based databases
usually UPPER or lower _ is the choice and for databases such as SQL Server and MySQL which allow you to name your columns with mixed casing
but couldn't care less what case you express them in selects, you get a mish mush of styles depending on what camp the database user originated from.
So to summarize the key styles and the family of people
- camelCase : lastName - employed by SmallTalk, Java, Flex, C++ and various C derivative languages.
- Pascal Case: (a variant of Camel Case) -- LastName which is employed by C#, VB.NET, Pascal (and Delphi), and SQL Server (and some MySQL windows converts). Also often used for class names by languages that use standard camelCase for function names.
- lower case _ last_name : often found in C, a favorite among PostgreSQL database users. (some MySQL)
- upper case _ LAST_NAME : a favorite among Oracle Users (some MySQL Oracle defectors)
Being at the cross roads of all the above, we often have to deal with the various above as well as having internal schizophrenic strife and external fights.
The internal turmoil is the worst and is worse than an ambidextrous person trying to figure out which hand to use in battle. For these exercises, we'll demonstrate one way how to convert between the various conventions. These
are the first thoughts that came to our mind, so may not be the most elegant.
Continue reading "Of Camels and People: Converting back and forth from Camel Case, Pascal Case to underscore lower case"
Monday, July 05. 2010
Printer Friendly
Gathering from the number of hits we got from our What's new in PostgreSQL 9.0,
and the large number of slashdot responses we got
as well as the fair number of reddit responses,
I guess a lot of people are really excited about the upcoming PostgreSQL 9.0 or at least
have a lot of opinions about what is still missing in it.
For this discussion, we would like to point out one of the companion adminstration tools that
will be packaged in with PostgreSQL 9.0 (and currently packaged in beta 2). This is PgAdmin III, which
we will affectionately refer to as the Administrative tool for mere mortals. It is the first administrative
tool that most users new to PostgreSQL use and gives them a user-friendly interface to the
power behind PostgreSQL. I would say if it were not for this tool and its web cousin PhpPgAdmin, many
a scared newbie user would be running away at the vast unencumbered freedom that PostgreSQL/psql and sibling commandline tools offer.
Continue reading "What is new in PgAdmin III 1.12.0"
Monday, June 28. 2010
Printer Friendly
A while ago we demonstrated how to use Open Office Base to connect to a PostgreSQL server using both the native PostgreSQL SBC and the PostgreSQL JDBC driver.
The routine for doing the same in Open Office Base 3.2 is pretty much the same as it was in the 2.3 incarnation. In this excerpt, we'll demonstrate how to import data into PostgreSQL using Open Office Base, as we had promised to do in
Database Administration, Reporting, and Light Applicaton Development and some stumbling blocks to watch out for.
Use Case
Command line lovers are probably scratching there head, why you want to do this. After all stumbling your way thru a commandline and typing stuff is much more fun and you can automate it after you are done.
For our needs, we get stupid excel or some other kind of tab delimeted data
from somebody, and we just want to cut and paste that data in our database. These files are usually small (under 5000 records) and the column names are never consistent. We don't want to fiddle with writing code to do these one off type exercises.
For other people, who are used to using GUIs or training people afraid of command lines, the use cases are painfully obvious, so we won't bore you.
Importing Data with Open Office Base Using copy and paste
Open Office has this fantastic feature called Copy and Paste (no kidding), and we will demonstrate in a bit, why their copy and paste is better than Microsoft Access's Copy and Paste particularly when you want to paste into some database other than a Microsoft one.
It is worthy of a metal if I dear say.
Continue reading "Importing data into PostgreSQL using Open Office Base 3.2"
Tuesday, June 22. 2010
Printer Friendly
I know a lot has been said about this beautiful value we affectionately call NULL, which is neither here nor there and that manages to catch many of us
off guard with its casual neither here nor thereness. Database analysts who are really just back seat mathematicians in disguise like to philosophize
about the unknown and pat themselves on the back when they feel they have mastered the unknown better than any one else. Of course database spatial analysts, the worst kind of back seat mathematicians,
like to talk not only about NULL but about EMPTY and compare notes with their brethren and write dissertations
about what to do about something that is neither here nor there
but is more known than the unknown, but not quite as known as the empty string.
Okay getting to the point, one of our clients asked us about a peculiar problem they had with a query, and the strange results they were getting. We admit this still manages to catch us off guard every once in a while.
Continue reading "NOT IN NULL Uniqueness trickery"
Monday, May 17. 2010
Printer Friendly
For those people coming from Oracle, SQL Server and MySQL or other databases that have soundex functionality,
you may be puzzled, or even frustrated when you try to do
something like
WHERE soundex('Wushington') = soundex('Washington')
in PostgreSQL and get a function does not exist error.
Well it does so happen that there is a soundex function in PostgreSQL, and yes it is
also called soundex, but is offered as a contrib module and not installed by default. It also has other fuzzy string matching functions in addition to soundex.
One of my favorites, the levenshenstein distance function is included as well. In this article
we'll be covering the contrib module packaged as fuzzystrmatch.sql. Details of the module can be found in FuzzyStrMatch.
The contrib module has been around for sometime, but has changed slightly from PostgreSQL version to PostgreSQL version. We are covering the 8.4 version in this article.
For those unfamiliar with soundex, its a basic approach developed by the US Census in the 1930s as a way of sorting
names by pronounciation. Read Census and Soundex for more gory history details.
Given that it is an approach designed primarily for the English alphabet, it sort of makes sense why its not built-in to PostgreSQL,
which has more of a diverse international concern. For example if you used it to compare two words in Japanese or Chinese,
don't think it would fair too well in any of the database platforms that support this function.
The original soundex algorithm has been improved over the years. Though its still the most common used today, newer variants
exist called MetaPhone developed in the 1990s and Double Metaphone (DMetaPhone) developed in 2000 that support additional
consonants in other languages such as Slavic, Celtic, Italian, Spanish etc.
These two variants are also included in the fuzzystrmatch contrib library. The soundex function still seems to be
the most popularly used at least for U.S. This is perhaps because most of the other databases (Oracle, SQL Server, MySQL) have soundex built-in but not the metaphone variants.
So in a sense soundex is a more portable function. The other reason is that metaphone and dmetaphone take up a bit more space and
are also more processor intensive to compute than soundex. We'll demonstrate some differences between them in this article.
To enable soundex and the other fuzzy string matching functions included, just run the
share/contrib/fuzzystrmatch.sql located in your PostgreSQL install folder. This library is an important piece of arsenal for geocoding and genealogy tracking particularly
the U.S. streets and surnames data sets. I come from a long line of Minors, Miners, Burnettes and Burnets.
For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
Continue reading "Where is soundex and other warm and fuzzy string things"
Friday, April 23. 2010
Printer Friendly
Fixed width data is probably the most annoying data to import because you need some mechanism to break the columns at the column boundaries. A lot of people bring this kind of data
into a tool such as OpenOffice, Excel or MS Access, massage it into a delimeted format and then pull it in with PostgreSQL copy command or some other means. There is another way and one that doesn't require
anything else aside from what gets packaged with PostgreSQL. We will demonstrate this way.
Its quite simple. Pull each record in as a single column and then spit it into the columns you want with plain old SQL. We'll demonstrate this by importing Census data places fixed width file.
Although this technique we have is focused on PostgreSQL, its pretty easy to do the same steps in any other relational database.
Both David Fetter and Dimitri Fontaine have demonstrated other approaches of doing this as well
so check theirs out.
UPDATE
Continue reading "Import fixed width data into PostgreSQL with just PSQL"
Sunday, February 14. 2010
Printer Friendly
Every programmer should embrace and use regular expressions (INCLUDING Database programmers).
There are many places where regular expressions can be used to reduce a 20 line piece of code into a
1 liner. Why write 20 lines of code when you can write 1.
Regular expressions are a domain language just like SQL. Just like SQL they are embedded in many places. You have them in your program editor.
You see it in sed, grep, perl, PHP, Python, VB.NET, C#,
in ASP.NET validators and javascript for checking correctness of input. You have them in PostgreSQL as well where
you can use them in SQL statements, domain definitions and check constraints. You can mix
regular expressions with SQL. When you mix the two domain languages, you can do enchanting things with a flip of a wrist that
would amaze your less informed friends. Embrace the power of domain languages and mix it up. PostgreSQL
makes that much easier than any other DBMS we can think of.
For more details on using regular expressions in PostgreSQL, check out the manual pages Pattern Matching in PostgreSQL
The problem with regular expressions is that they are slightly different depending on what language environment you are
running them in. Different enough to be frustrating. We'll just focus on their use in PostgreSQL, though these lessons
are applicable to other environments.
Continue reading "Regular Expressions in PostgreSQL"
|