pgtrgm

Wednesday, October 12. 2011

Improving speed of GIST indexes in PostgreSQL 9.2

Recommended Books: PostGIS in Action

This is about improvements to GIST indexes that I hope to see in PostgreSQL 9.2. One is a patch for possible inclusion in PostgreSQL 9.2 called SP-GiST, Space-Partitioned GiST created by Teodor Sigaev and Oleg Bartunov whose basic technique is described in SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees. For those who don't know Teodor and Oleg, they are the great fellows that brought us many other GiST and GIN goodnesses that many specialty PostgreSQL extensions enjoy -- e.g. PostGIS, trigrams, ltree, pgsphere, hstore, full-text search to name a few.

Another is a recent one just committed by Alexander Korotkov which I just recently found out about on New node splitting algorithm for GIST and admit I don't know enough about to judge. I have to admit to being very clueless when it comes to the innards of index implementations so don't ask me any technical details. It's one of those short-comings among the trillion others I have that I have learned to accept will probably never change.

What the SP-GIST patch will provide in terms of performance and speed was outlined in PGCon 2011: SP-GiST - a new indexing infrastructure for PostgreSQL Space-Partitioning trees in PostgreSQL.

What it provides specifically for PostGIS is summarized in Paul's call for action noted below. As a passionate user of PostGIS ,ltree, tsearch, and hstore, I'm pretty excited about these patches and other GIST and general index enhancements and there potential use in GIST dependent extensions. I'm hoping to see these spring to life in PostgreSQL 9.2 and think it will help to further push the envelope of where PostgreSQL can go as a defacto platform for cutting-edge technology and scientific research. I think one of PostgreSQL's greatest strength is its extensible index API.

Paul's PostGIS newsgroup note about seeking funding for faster GIST indexes , work done so far on SP-GIST and call for further action is rebroadcast in it's entirety here.

Thanks to the sponsorship of Michigan Technological University, we now
have 50% of the work complete. There is a working patch at the
commitfest https://commitfest.postgresql.org/action/patch_view?id=631
which provides quad-tree and kd-tree indexes.

However, there is a problem: unless the patch is reviewed and goes
through more QA/QC, it'll never get into PostgreSQL proper. In case
you think I am kidding: we had a patch for KNN searching ready for the
9.0 release, but it wasn't reviewed in time, so we had to wait all the
way through the 9.1 cycle to get it.

I am looking for sponsors in the $5K to $10K range to complete this
work. If you use PostgreSQL in your business, this is a chance to add
a basic capability that may help you in all kinds of ways you don't
expect. We're talking about faster geospatial indexes here, but this
facility will also radically speed any partitioned space. (For
example, the suffix-tree, which can search through URLs incredibly
fast. Another example, you can use a suffix tree to very efficiently
index geohash strings. Interesting.)

If you think there's a possibility, please contact me and I will send
you a prospectus you can take to your manager. Let's make this happen
folks!

Paul

Continue reading "Improving speed of GIST indexes in PostgreSQL 9.2"

Posted by Leo Hsu and Regina Obe in 9.2, editor note, gis, hstore, intermediate, ltree, pgtrgm, postgis, postgresql versions, tsearch at 18:24 | Comments (0) | Trackbacks (0)

Monday, June 06. 2011

PostgreSQL 9.1 Trigrams teaching LIKE and ILIKE new tricks

Printer Friendly

Recommended Books: PostgreSQL Up and Running: 3rd Edition SQL in a Nutshell (4th)

There once existed programmers who were asked to explain this snippet of code: 1 + 2

The C programmer explained "It's a common mathematical expression."
The C++, Java, C# and other impure object-oriented programmers said "We concur. It's a common mathematical expression."
The Smalltalk programmer explained "1 adds 2."
The Lisp programmer stood up, a bit in disgust, and said, "No no! You are doing it all wrong!"
The Lisp Programmer then pulled out a Polish calculator, punched in + 1 2 ,and with a very serious face, explained
"+ should be pushing those other two around."

I find this episode interesting because while the Lisp programmer I feel is more right, the Smalltalk programmer has managed to follow the rest of the crowd and still stick to her core principle. This brings us to what does this have to do with trigrams in PostgreSQL 9.1. Well just like 1 + 2 being a common mathematical expression, abc LIKE '%b%' is a common logical relational database expression that we have long taken for granted as not an indexable operation in most databases (not any other database to I can think of) until PostgreSQL 9.1, which can utilize trigram indices (the Lisp programmer behind the curtain) to make it fast.

There are 2 main enhancements happening with trigrams in PostgreSQL 9.1 both of which depesz has already touched on in FASTER LIKE/ILIKE and KNNGIST. This means you can have an even faster trigram search than you ever have had before and you can do it in such a fashion that doesn't require any PostgreSQL trigram specific syntactical expressions. So while PostgreSQL 9.1 might be understanding LIKE much like all the other databases you work with, if you have a trigram index in place, it will just be doing it a little faster and sometimes a lot faster using the more clever PostgreSQL 9.1 planner. This is one example of how you can use applications designed for many databases and still be able to utilize advanced features in your database of choice. In this article we'll demonstrate.

For this example we'll use a table of 490,000 someodd records consisting of Massachusetts street segments and their names excerpted from TIGER 2010 data. You can download the trimmed data set from here if you want to play along.

Continue reading "PostgreSQL 9.1 Trigrams teaching LIKE and ILIKE new tricks"

Posted by Leo Hsu and Regina Obe in 9.1, basics, contrib spotlight, intermediate, pgtrgm, postgresql versions at 01:23 | Comment (1) | Trackbacks (0)

Wednesday, July 21. 2010

Fuzzy string matching with Trigram and Trigraphs

Printer Friendly

In an earlier article Where is Soundex and other Fuzzy string things we covered the PostgreSQL contrib module fuzzstrmatch which contains the very popular function soundex that is found in other popular relational databases. We also covered the more powerful levenshtein distance, metaphone and dmetaphone functions included in fuzzstrmatch, but rarely found in other relational databases.

As far as fuzzy string matching goes, PostgreSQL has other functions up its sleeves. This time we will cover the contrib module pg_trgm which was introduced in PostgreSQL 8.3. pgtrgm uses a concept called trigrams for doing string comparisons. The pg_trgm module has several functions and gist/gin operators. Like other contrib modules, you just need to run the /share/contrib/pg_trgm.sql file packaged in your PostgreSQL install to enable it in your database.

For this set of exercises, we'll use trigrams to compare words using the same set of data we tested with soundex and metaphones. For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.

The most useful are the similarity function and the % operator. The % operator allows for using a GIST/GIN index and the similarity function allows for narrowing your filter similar to what levenshtein did for us in fuzzstrmatch.

Continue reading "Fuzzy string matching with Trigram and Trigraphs"

Posted by Leo Hsu and Regina Obe in 8.3, 8.4, 9.0, contrib spotlight, fuzzystrmatch, intermediate, pgtrgm, postgresql versions at 18:20 | Comments (3) | Trackback (1)

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting

Wednesday, October 12. 2011

Improving speed of GIST indexes in PostgreSQL 9.2

Monday, June 06. 2011

PostgreSQL 9.1 Trigrams teaching LIKE and ILIKE new tricks

Wednesday, July 21. 2010

Fuzzy string matching with Trigram and Trigraphs

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

pgtrgm

Postgres OnLine Journal PostGIS in Action About the Authors Consulting

Wednesday, October 12. 2011

Improving speed of GIST indexes in PostgreSQL 9.2

Monday, June 06. 2011

PostgreSQL 9.1 Trigrams teaching LIKE and ILIKE new tricks

Wednesday, July 21. 2010

Fuzzy string matching with Trigram and Trigraphs

Quicksearch

Calendar

Categories

Archives

Subscribe

Blog Administration

Postgres OnLine Journal

PostGIS in Action About the Authors Consulting