http://www.perlmonks.org?node_id=806545


in reply to Re: What DB style to use with search engine
in thread What DB style to use with search engine

There is a good reaason why search engines don't allow full regex searches--they are simply too slow.

Yep. The way we typically implement a regex query against an inverted index is...

  1. Scan the over the whole term dictionary looking for terms that match the regex.
  2. Iterate over the posting lists (enumeration of doc ids that match) for all matching terms.

If too many terms match, that could end up being slower than a full table scan. Depending on implementation and index size, you could also end up running out of memory (e.g. if the posting lists are all iterated concurrently). Futhermore, that algo limits the scope of regex matches to individual terms.

Getting good performance out of indexed data is all about planning what queries you need in advance. Regexes are so flexible that they're hard to plan for.