In an earlier article Where is Soundex and other Fuzzy string things we covered the PostgreSQL contrib module fuzzstrmatch which contains the very popular function
soundex that is found in other popular relational databases. We also covered the more powerful levenshtein distance, metaphone and
dmetaphone functions included in fuzzstrmatch, but rarely found in other relational databases.
As far as fuzzy string matching goes, PostgreSQL has other functions up its sleeves. This time we will cover
the contrib module pg_trgm which was introduced in PostgreSQL 8.3. pgtrgm uses a concept called trigrams for doing string comparisons. The pg_trgm module has several functions and gist/gin operators.
Like other contrib modules, you just need to run the /share/contrib/pg_trgm.sql file packaged in your PostgreSQL install to enable it in your database.
For this set of exercises, we'll use trigrams to compare words using the same set of data we tested
with soundex and metaphones. For the next set of exercises, we will be using the places dataset we created in Importing Fixed width data into PostgreSQL with just PSQL.
The most useful are the similarity function and the
% operator. The % operator allows for using a GIST/GIN index and the similarity function allows for narrowing your filter similar to what
levenshtein did for us in fuzzstrmatch.