<?xml version="1.0" encoding="windows-1252"?>
<node id="996537" title="Re: Comparing sets of phrases stored in a database?" created="2012-09-30 16:33:12" updated="2012-09-30 16:33:12">
<type id="11">
note</type>
<author id="399498">
erix</author>
<data>
<field name="doctext">
&lt;p&gt;Interesting question.

&lt;p&gt;In PostgreSQL there are some text tools that may already be adequate for such a database:

&lt;p&gt;The built-in full-text search (includes indexing, parsing, stemming, ranking). &lt;c&gt;[1]&lt;/c&gt;

&lt;p&gt;The extension pg_trgm (trigrams).  Can be used to index, provides similarity functions. &lt;c&gt;[2]&lt;/c&gt;

&lt;p&gt;The extension fuzzystrmatch (with soundex, levenshtein etc.). &lt;c&gt;[3]&lt;/c&gt;


&lt;p&gt;&lt;c&gt;[1]&lt;/c&gt; [http://www.postgresql.org/docs/current/static/textsearch.html]

&lt;p&gt;&lt;c&gt;[2]&lt;/c&gt; [http://www.postgresql.org/docs/current/static/pgtrgm.html]

&lt;p&gt;&lt;c&gt;[3]&lt;/c&gt; [http://www.postgresql.org/docs/current/static/fuzzystrmatch.html]</field>
<field name="root_node">
996530</field>
<field name="parent_node">
996530</field>
</data>
</node>
