<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   >
<channel>
    
    <title>Postgres OnLine Journal - fuzzystrmatch</title>
    <link>https://www.postgresonline.com/journal/</link>
    <description>Tips and tricks for PostgreSQL</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 2.3.5 - http://www.s9y.org/</generator>
    <pubDate>Sat, 07 May 2011 06:14:55 GMT</pubDate>

    <image>
    <url>https://www.postgresonline.com/journal/templates/default/img/s9y_banner_small.png</url>
    <title>RSS: Postgres OnLine Journal - fuzzystrmatch - Tips and tricks for PostgreSQL</title>
    <link>https://www.postgresonline.com/journal/</link>
    <width>100</width>
    <height>21</height>
</image>

<item>
    <title>Fuzzy string matching with Trigram and Trigraphs</title>
    <link>https://www.postgresonline.com/journal/index.php?/archives/169-Fuzzy-string-matching-with-Trigram-and-Trigraphs.html</link>
            <category>8.3</category>
            <category>8.4</category>
            <category>9.0</category>
            <category>contrib spotlight</category>
            <category>fuzzystrmatch</category>
            <category>intermediate</category>
            <category>pgtrgm</category>
            <category>postgresql versions</category>
    
    <comments>https://www.postgresonline.com/journal/index.php?/archives/169-Fuzzy-string-matching-with-Trigram-and-Trigraphs.html#comments</comments>
    <wfw:comment>https://www.postgresonline.com/journal/wfwcomment.php?cid=169</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>https://www.postgresonline.com/journal/rss.php?version=2.0&amp;type=comments&amp;cid=169</wfw:commentRss>
    

    <author>nospam@example.com (Leo Hsu and Regina Obe)</author>
    <content:encoded>
    &lt;p&gt;In an earlier article &lt;a href=&quot;https://www.postgresonline.com/journal/archives/158-Where-is-soundex-and-other-warm-and-fuzzy-string-things.html&quot; target=&quot;_blank&quot;&gt;Where is Soundex and other Fuzzy string things&lt;/a&gt; we covered the PostgreSQL contrib module fuzzstrmatch which contains the very popular function
soundex that is found in other popular relational databases. We also covered  the more powerful levenshtein distance, metaphone and 
dmetaphone functions included in fuzzstrmatch, but rarely found in other relational databases.&lt;/p&gt;

&lt;p&gt;As far as fuzzy string matching goes, PostgreSQL has other functions up its sleeves.  This time we will cover
the contrib module &lt;a href=&quot;http://www.postgresql.org/docs/8.4/interactive/pgtrgm.html&quot; target=&quot;_blank&quot;&gt;pg_trgm&lt;/a&gt; which was introduced in PostgreSQL 8.3.  pgtrgm uses a concept called trigrams  for doing string comparisons. The pg_trgm module has several functions and gist/gin operators.  
Like other contrib modules, you just need to run the &lt;b&gt;/share/contrib/pg_trgm.sql&lt;/b&gt; file packaged in your PostgreSQL install to enable it in your database. 
&lt;/p&gt;
&lt;p&gt;For this set of exercises, we&#039;ll use trigrams to compare words using the same set of data we tested 
with soundex and metaphones. For the next set of exercises, we will be using the places dataset we created in &lt;a href=&quot;https://www.postgresonline.com/journal/archives/157-Import-fixed-width-data-into-PostgreSQL-with-just-PSQL.html&quot; target=&quot;_blank&quot;&gt;Importing Fixed width data into PostgreSQL with just PSQL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt; The most useful are the &lt;B&gt;similarity&lt;/B&gt; function and the
% operator.  The &lt;b&gt;%&lt;/b&gt; operator allows for using a GIST/GIN index and the similarity function allows for narrowing your filter similar to what
levenshtein did for us in fuzzstrmatch.&lt;/p&gt; &lt;a class=&quot;block_level&quot; href=&quot;https://www.postgresonline.com/journal/index.php?/archives/169-Fuzzy-string-matching-with-Trigram-and-Trigraphs.html#extended&quot;&gt;Continue reading &quot;Fuzzy string matching with Trigram and Trigraphs&quot;&lt;/a&gt;
    </content:encoded>

    <pubDate>Wed, 21 Jul 2010 18:20:00 -0400</pubDate>
    <guid isPermaLink="false">https://www.postgresonline.com/journal/index.php?/archives/169-guid.html</guid>
    
</item>
<item>
    <title>Where is soundex and other warm and fuzzy string things</title>
    <link>https://www.postgresonline.com/journal/index.php?/archives/158-Where-is-soundex-and-other-warm-and-fuzzy-string-things.html</link>
            <category>8.2</category>
            <category>8.3</category>
            <category>8.4</category>
            <category>9.0</category>
            <category>beginner</category>
            <category>contrib spotlight</category>
            <category>fuzzystrmatch</category>
            <category>mysql</category>
            <category>oracle</category>
            <category>postgresql versions</category>
            <category>sql server</category>
    
    <comments>https://www.postgresonline.com/journal/index.php?/archives/158-Where-is-soundex-and-other-warm-and-fuzzy-string-things.html#comments</comments>
    <wfw:comment>https://www.postgresonline.com/journal/wfwcomment.php?cid=158</wfw:comment>

    <slash:comments>2</slash:comments>
    <wfw:commentRss>https://www.postgresonline.com/journal/rss.php?version=2.0&amp;type=comments&amp;cid=158</wfw:commentRss>
    

    <author>nospam@example.com (Leo Hsu and Regina Obe)</author>
    <content:encoded>
    &lt;p&gt;For those people coming from Oracle, SQL Server and MySQL or other databases that have soundex functionality, 
you may be puzzled, or even frustrated when you try to do 
something like &lt;br /&gt;&lt;code&gt;WHERE soundex(&#039;Wushington&#039;) = soundex(&#039;Washington&#039;)&lt;/code&gt; 
&lt;br /&gt; in PostgreSQL and get a function does not exist error.&lt;/p&gt;

&lt;p&gt;Well it does so happen that there is a soundex function in PostgreSQL, and yes it is 
also called &lt;b&gt;soundex&lt;/b&gt;, but is offered as a contrib module and not installed by default. It also has other fuzzy  string matching functions in addition to soundex. 
One of my favorites, the &lt;b&gt;levenshenstein&lt;/b&gt; distance function is included as well.  In this article
we&#039;ll be covering the contrib module packaged as &lt;b&gt;fuzzystrmatch.sql&lt;/b&gt;. Details of the module can be found in &lt;a href=&quot;http://www.postgresql.org/docs/8.4/static/fuzzystrmatch.html&quot; target=&quot;_blank&quot;&gt;FuzzyStrMatch&lt;/a&gt;.
The contrib module has been around for sometime, but has changed slightly from PostgreSQL version to PostgreSQL version.  We are covering the 8.4 version in this article.&lt;/p&gt;

&lt;p&gt;For those unfamiliar with soundex, its a basic approach developed by the US Census in the 1930s as a way of sorting
names by pronounciation.  Read &lt;a href=&quot;http://www.fcgsc.org/forms/CensusAndSoundex.pdf&quot; target=&quot;_blank&quot;&gt;Census and Soundex&lt;/a&gt; for more gory history details.&lt;/p&gt;
&lt;p&gt;Given that it is an approach designed primarily for the English alphabet, it sort of makes sense why its not built-in to PostgreSQL,
which has more of a diverse international concern. For example if you used it to compare two words in Japanese or Chinese,
don&#039;t think it would fair too well in any of the database platforms that support this function.&lt;/p&gt;
&lt;p&gt;The original soundex algorithm has been improved over the years.  Though its still the most common used today, newer variants 
exist called &lt;a href=&quot;http://en.wikipedia.org/wiki/Metaphone&quot; target=&quot;_blank&quot;&gt;MetaPhone&lt;/a&gt; developed in the 1990s and &lt;a href=&quot;http://en.wikipedia.org/wiki/Double_Metaphone&quot; target=&quot;_blank&quot;&gt;Double Metaphone (DMetaPhone)&lt;/a&gt; developed in 2000 that support additional
consonants in other languages such as Slavic, Celtic, Italian, Spanish etc.  
These two variants are also included in the fuzzystrmatch contrib library.  The soundex function still seems to be 
the most popularly used at least for U.S. This is perhaps because most of the other databases (Oracle, SQL Server, MySQL) have soundex built-in but not the metaphone variants.
So in a sense soundex is a more portable function.  The other reason is that metaphone and dmetaphone take up a bit more space and
are also more processor intensive to compute than soundex. We&#039;ll demonstrate some differences between them in this article.&lt;/p&gt;

&lt;p&gt;To enable soundex and the other fuzzy string matching functions included, just run the 
&lt;b&gt;share/contrib/fuzzystrmatch.sql&lt;/b&gt; located in your PostgreSQL install folder.  This library is an important piece of arsenal for geocoding and genealogy tracking particularly
the U.S. streets and surnames data sets.  I come from a long line of Minors, Miners, Burnettes and Burnets.&lt;/p&gt;

&lt;p&gt;For the next set of exercises, we will be using the places dataset we created in &lt;a href=&quot;https://www.postgresonline.com/journal/archives/157-Import-fixed-width-data-into-PostgreSQL-with-just-PSQL.html&quot; target=&quot;_blank&quot;&gt;Importing Fixed width data into PostgreSQL with just PSQL&lt;/a&gt;.&lt;/p&gt;
 &lt;a class=&quot;block_level&quot; href=&quot;https://www.postgresonline.com/journal/index.php?/archives/158-Where-is-soundex-and-other-warm-and-fuzzy-string-things.html#extended&quot;&gt;Continue reading &quot;Where is soundex and other warm and fuzzy string things&quot;&lt;/a&gt;
    </content:encoded>

    <pubDate>Mon, 17 May 2010 16:53:00 -0400</pubDate>
    <guid isPermaLink="false">https://www.postgresonline.com/journal/index.php?/archives/158-guid.html</guid>
    
</item>

</channel>
</rss>
