What is TSearch?
TSearch is a Full-Text Search engine that is packaged with PostgreSQL. The key developers of TSearch are Oleg Bartunov and Teodor Sigaev who have also done extensive
work with GiST and GIN indexes used by PostGIS, PgSphere and other projects. For more about how TSearch and OpenFTS got started check out A Brief History of FTS in PostgreSQL.
Check out the TSearch Official Site if you are interested in related TSearch tips or interested in donating to this very worthy project.
Tsearch is different from regular string searching in
PostgreSQL in a couple of key ways.
- It is well-suited for searching large blobs of text since each word is indexed using a Generalized Inverted Index (GIN) or Generalized Search Tree (GiST) and searched using text search vectors. GIN is generally used for indexing. Search vectors
are at word and phrase boundaries.
- TSearch has a concept of Linguistic significance using various language dictionaries, ISpell, thesaurus, stop words, etc. therefore it can ignore common words and
equate like meaning terms and phrases.
- TSearch is for the most part case insensitive.
- While various dictionaries and configs are available out of the box with TSearch, one can create new ones and customize existing further to
cater to specific niches within industries - e.g. medicine, pharmaceuticals, physics, chemistry, biology, legal matters.
Prior to PostgreSQL 8.3, it was a contrib module
located in the shared/contribs folder. As of PostgreSQL 8.3 it is now fully integrated into the PostgreSQL core.
The official documents for using TSearch in 8.3 are located in
Chapter 12. Full Text Search of the official
PostgreSQL documentation.
In this article we shall provide a quick primer to using TSearch in 8.3.
In the next month's issue of the Postgres OnLine Journal we shall provide a TSearch cheat sheet similar to our PostgreSQL 8.3 cheat sheet.