Table Of Contents
PostgreSQL Q & A
Of Camels and People: Converting back and forth from Camel Case, Pascal Case to underscore lower case Beginner
When it comes to naming things in databases and languages, there are various common standards. For many languages the camel family of namings is very popular. For unix based databases usually UPPER or lower _ is the choice and for databases such as SQL Server and MySQL which allow you to name your columns with mixed casing but couldn't care less what case you express them in selects, you get a mish mush of styles depending on what camp the database user originated from.
So to summarize the key styles and the family of people
- camelCase : lastName - employed by SmallTalk, Java, Flex, C++ and various C derivative languages.
- Pascal Case: (a variant of Camel Case) -- LastName which is employed by C#, VB.NET, Pascal (and Delphi), and SQL Server (and some MySQL windows converts). Also often used for class names by languages that use standard camelCase for function names.
- lower case _ last_name : often found in C, a favorite among PostgreSQL database users. (some MySQL)
- upper case _ LAST_NAME : a favorite among Oracle Users (some MySQL Oracle defectors)
Being at the cross roads of all the above, we often have to deal with the various above as well as having internal schizophrenic strife and external fights. The internal turmoil is the worst and is worse than an ambidextrous person trying to figure out which hand to use in battle. For these exercises, we'll demonstrate one way how to convert between the various conventions. These are the first thoughts that came to our mind, so may not be the most elegant.
Converting from PostgreSQL common field names to camel case and others
For these exercises, lets pretend we have a table in PostgreSQL called people
CREATE TABLE people(person_id serial PRIMARY KEY,
first_name varchar(75), last_name varchar(75));
Question: How do you convert from lower case _ to Pascal case?
Answer:-- pascal case
SELECT column_name, replace(initcap(replace(column_name, '_', ' ')), ' ', '') As pascal_case
FROM information_schema.columns
WHERE table_name = 'people';
column_name | pascal_case
------------+-------------
person_id | PersonId
first_name | FirstName
last_name | LastName
Question: How do you convert from lower case _ to camel case?
Answer:SELECT column_name, lower(substring(pascal_case,1,1)) || substring(pascal_case,2) As camel_case
FROM (
SELECT column_name, replace(initcap(replace(column_name, '_', ' ')), ' ', '') As pascal_case
FROM information_schema.columns
WHERE table_name = 'people' ) As foo;
column_name | camel_case
------------+------------
person_id | personId
first_name | firstName
last_name | lastName
Converting from PostgreSQL lower _ to Oracle upper _
Of course this one is trivial, but we include for completeness.
Answer:
-- PostgreSQL to Oracle
SELECT column_name, upper(column_name) As oracle_name
FROM information_schema.columns
WHERE table_name = 'people';
column_name | oracle_name
-------------+-------------
person_id | PERSON_ID
first_name | FIRST_NAME
last_name | LAST_NAME
Converting from camel case and pascal case to lowercase underscore
Of course you often have the problem of some camel lover creating table columns in your database. You can fix this with SQL DDL hacks and PostgreSQL regular expressions.
CREATE TABLE "People"("personId" serial PRIMARY KEY,
"firstName" varchar(75), "lastName" varchar(75));
For converting the other way, using regular expressions (thank goodness PostgreSQL has this feature) seems to be the shortest way we can think to do this.
-- camel case to lower underscore
-- take all capital letters A-Z (we wrap a () so we can backreference
-- replace each capital with _ and original capital, g for greedy replace
-- then lower case it all
SELECT column_name,
lower(regexp_replace(column_name, E'([A-Z])', E'\_\\1','g')) As regular_pgstyle
FROM information_schema.columns
WHERE table_name = 'People';
column_name | regular_pgstyle
-------------+-----------------
personId | person_id
firstName | first_name
lastName | last_name
Basics
What is new in PgAdmin III 1.12.0 Beginner
Gathering from the number of hits we got from our What's new in PostgreSQL 9.0, and the large number of slashdot responses we got as well as the fair number of reddit responses, I guess a lot of people are really excited about the upcoming PostgreSQL 9.0 or at least have a lot of opinions about what is still missing in it.
For this discussion, we would like to point out one of the companion adminstration tools that will be packaged in with PostgreSQL 9.0 (and currently packaged in beta 2). This is PgAdmin III, which we will affectionately refer to as the Administrative tool for mere mortals. It is the first administrative tool that most users new to PostgreSQL use and gives them a user-friendly interface to the power behind PostgreSQL. I would say if it were not for this tool and its web cousin PhpPgAdmin, many a scared newbie user would be running away at the vast unencumbered freedom that PostgreSQL/psql and sibling commandline tools offer.
Yes I can hear the die-hard PostgreSQL super users screaming what about PSQL, PSQL is the coolest. Yes PSQL is cool and has new features too. I will say as far as commandline database tools go, PSQL, Pg_restore and Pg_Dump are the best we've ever seen packaged in a database product. MySQL and SQL Server should be ashamed of themselves for packaging comparitively much crappier commandline tools. We haven't used Oracle much so can't speak for Oracle tools. We'll get to PSQL and its commandline friends in another article. I will say though, that even if for whatever reason you can't upgrade to PostgreSQL 9.0 which many will not be able to because of backward compatibility reasons, you can still enjoy many of the new enhancements in PgAdmin III 1.12.0 and the new PSQL and related command line clients that will be packaged with PostgreSQL 9.0.
PhpPgAdmin is cool too. This is the web interface similar to phpMyAdmin and patterned after that except with bells and whistles specific to PostgreSQL (such as schemas, tablespaces, create types etc). We'll cover that in another article.
Okay what's new in PgAdmin III
All the new features in PgAdmin III 1.12.0 are outlined at PgAdmin III 1.12 Visual Tour. Here we'll give a run down of our favorites in priority.
- Support for PostgreSQL 9.0. You can't read 9.0 database with the older PgAdmin III's
- The new backup interface. It gives you so much power on what you can backup and compression and so forth. Since
we are writing a book, where we want to selectively include some tables and not others, this is especially nice.
Note that these features have been available in the pg_dump commandline for a while, but its really nice to have a graphical interface to those features
so you don't have to figure out how to write out those command line switches and paths. The objects tab is our favorite
tab on backup.
- Restore options - this allows you to set the number of jobs and better interface for selecting what you want to restore. Its a bit tricky, so we'll show it in the next section of this article.
- Connection Handling - client losses connection to server you get the option to reconnect now.
- Support for new column level triggers
- Server groups -- ability to group registered PostgreSQL servers
- Ability to quickly save graphic explain plans as a PNG file.
- SQL Editor automatically remembers past run queries and allows you to toggle to them to rerun them
- Support for the new 9.0 extended priviledges (DEFAULT priviledges)
Selective Restore
For compressed backups and tars, you can selectively restore just like you can with the pg_restore commandline. Here are the steps.
- Select the backup file. You can set the number of parallel jobs you want to use for restore.

- Switch to the Objects tab.
and click the View button. - Double-click on the backup file name and you should see all the schemas and if you expand a schema the objects in that schema

- Then selectively select (if you don't want to retore the whole thing) what you want to restore and click the OK button
Using PostgreSQL Contribs
Fuzzy string matching with Trigram and Trigraphs Intermediate
In an earlier article Where is Soundex and other Fuzzy string things we covered the PostgreSQL contrib module fuzzstrmatch which contains the very popular function soundex that is found in other popular relational databases. We also covered the more powerful levenshtein distance, metaphone and dmetaphone functions included in fuzzstrmatch, but rarely found in other relational databases.
As far as fuzzy string matching goes, PostgreSQL has other functions up its sleeves. This time we will cover the contrib module pg_trgm which was introduced in PostgreSQL 8.3. pgtrgm uses a concept called trigrams for doing string comparisons. The pg_trgm module has several functions and gist/gin operators. Like other contrib modules, you just need to run the /share/contrib/pg_trgm.sql file packaged in your PostgreSQL install to enable it in your database.
For this set of exercises, we'll use trigrams to compare words using the same set of data we tested with soundex and metaphones. For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
The most useful are the similarity function and the % operator. The % operator allows for using a GIST/GIN index and the similarity function allows for narrowing your filter similar to what levenshtein did for us in fuzzstrmatch.
What do PostgreSQL trigrams look like
A text string is composed of n number of trigrams which in PostgreSQL is displayed as an array of 3 character text values stored which consist of all 3 character consecutive groups that can be formed from the text. You can inspect this with the show_trgm function.
Its probably easiest to get a sense of what they are by looking at some. Below are some texts in random order with accompanying trigrams for each text. We chose only locations of less than 10 characters so that we could easily display the trigrams.
One of the simplest and my favorite is that for Leo.
SELECT show_trgm('Leo');
Which yields:
{" l"," le","eo ",leo}
SELECT loc_name, show_trgm(loc_name) As tg
FROM places
WHERE length(loc_name) < 10
ORDER BY random() limit 10;
-- Result is much more appealing
loc_name | tg
-----------+-------------------------------------------------------
Spade CDP | {" c"," s"," cd"," sp",ade,cdp,"de ","dp ",pad,spa}
Espy CDP | {" c"," e"," cd"," es",cdp,"dp ",esp,"py ",spy}
Paia CDP | {" c"," p"," cd"," pa",aia,cdp,"dp ","ia ",pai}
Keys CDP | {" c"," k"," cd"," ke",cdp,"dp ",eys,key,"ys "}
Ulen town | {" t"," u"," to"," ul","en ",len,own,tow,ule,"wn "}
Lead city | {" c"," l"," ci"," le","ad ",cit,ead,ity,lea,"ty "}
Wray city | {" c"," w"," ci"," wr","ay ",cit,ity,ray,"ty ",wra}
Mona town | {" m"," t"," mo"," to",mon,"na ",ona,own,tow,"wn "}
Derby CDP | {" c"," d"," cd"," de","by ",cdp,der,"dp ",erb,rby}
Sells CDP | {" c"," s"," cd"," se",cdp,"dp ",ell,lls,"ls ",sel}
Building GIST/GIN indexes for trigram
Before we can take full advantage of these functions and opertors, we need to put an index on our text field. pgtrgm supports both GIST and GIN very similar in concept to the TSearch. The main difference between the two indexes is that GIST is slower to query, but faster to build and GIN gives better search performance but slower to build. for our relatively small dataset of 35,000 some odd records, the performance between the two is about the same and took between 1 and 2 seconds to index all entries, with no real noticeable difference between using GIST or GIN in index speed on our PostgreSQL 9.0 beta install.
CREATE INDEX idx_places_trgm_gist_loc_name
ON places USING gist (loc_name gist_trgm_ops);
-- we chose to build with gin
CREATE INDEX idx_places_trgm_gin_loc_name
ON places USING gin(loc_name gin_trgm_ops);
Doing comparisons
Just like soundex, the % operator and similarity functions are case insensitive. The pg_trigram approach seems especially good at catching misspellings and allows you to easily control your tolerance for a misspelling. It doesn't suffer from the same issues fuzzystrmatch does when dealing with unicode characters, but will also not catch similar pronouciations as effectively.
Now to use we can use the % operator in conjunction with the similarity function or by itself. The similarity function will always return a number between 0 and 1 which is a measure of how similar two text strings are ; with 1 having same set of trigrams.
SELECT loc_name, similarity(loc_name, 'abeville') As sim_score
FROM places
WHERE loc_name % 'abeville' AND similarity(loc_name, 'abeville') > 0.35
ORDER BY loc_name;
-- output --
loc_name | sim_score
----------------+-----------
Abbeville city | 0.5
Abbeville city | 0.5
Abbeville city | 0.5
Abbeville city | 0.5
Abbeville town | 0.5
Let us try a similar example to what we tried with soundex.
SELECT loc_name, similarity(loc_name, 'rising sun') As sim_score
FROM places
WHERE loc_name % 'rising sun' AND similarity(loc_name, 'rising sun') > 0.35
ORDER BY similarity(loc_name, 'rising sun') DESC, loc_name;
loc_name | sim_score
------------------------+-----------
Rising Sun city | 0.6875
Rising Sun town | 0.6875
Rising Sun-Lebanon CDP | 0.478261
Rising Star town | 0.4
Risingsun village | 0.380952
Controlling default weights
What would happen if we used % without the added similarity filter? It would behave as if the similarity filter were set at the default weight. To see the default weight we use this statement.
SELECT show_limit();
--output --
0.3
You can lower or increase the weight with set_limit();
As mentioned. Trigrams are good for catching misspellings, but not as good as soundex and dmetaphone for catching same sound words spelled differently. To demonstrate, if we picked our El Cajon example from before where our dmetaphone excelled and return El Cajon.
SELECT loc_name, similarity(loc_name, 'Elle Cahon') As sim_score
FROM places
WHERE loc_name % 'Elle Cahon'
ORDER BY similarity(loc_name, 'Elle Cahon') DESC, loc_name;
loc_name | sim_score
----------------+-----------
Ellenton CDP | 0.333333
Ellaville city | 0.3
Ellendale city | 0.3
Ellendale city | 0.3
It is obviously not as good as our dmetaphone.