http://www.perlmonks.org?node_id=187351


in reply to Newest Super Search

This is excellent. Thank you!

As an aside, I just searched for "IDE" (without quotes), " IDE " (sans quotes), and " IDE " (with quotes). Note the attempts to include leading/trailing spaces. In all cases the results included many (most) pages with no standalone IDE in them, but as word parts.

I don't know anything about Everything :) so don't know what it looks like in the query, plus, as usual, I can get the data other ways (Google, different terms, etc.) but thought I'd point it out.

Thanks again! It is very flexible and fast, much appreciated.

Replies are listed 'Best First'.
(tye)Re4: Newest Super Search
by tye (Sage) on Aug 03, 2002 at 19:05 UTC

    To quote from Super Search:

    Match text containing [______________________________] (seperate strings with [__] -- default is spaces)
    and then, quoting from Super Search results after each of your above search attempts:

    where any text contains all of "IDE"

    where any text contains all of "IDE"

    where any text contains all of """, "IDE", """

    So separating search terms on space (the default) gives us search terms for each of your attempts of ('IDE'), ('', 'IDE', and ''), and ('"', 'IDE', and '"'). But, of course, we ignore empty search terms. That explains the results above. The code that does this is even posted in public1. (:

    So if you want to search on spaces, you have to make it so spaces don't seperate search terms. In the second field just enter any character (or string) that doesn't appear in your single search string.

    There are no "special characters" when searching on strings. Backslashes, spaces, quotes, stars, parens, brackets, ampersands, etc. all just match what you type (case is ignored). You pick the separator you want. The default is space as people are used to listing several space-separated words. Searching for punctuation is important on a Perl site so the search doesn't tie up any characters as being "special" other than the delimiter string you choose.

    As I've already said, some help pages need to be added.

    I also hope to add support for some limited regular expressions. MySQL supports regular expressions and I had initially just assumed that they were rather minimal regular expressions that would be quick to run (this being a database server and all), but they are full-blown regular expressions from Henry Spencer. This means that entering a (perhaps intentionally) "bad" regular expression could burn a lot of SQL daemon CPU time (the strings being matched against are node contents which can be quite long leaving lots of room for pathological backtracking to get way out of hand).

    So to support regular expressions I'm going to have to pick a subset of regular expression features (or perhaps some other wildcarding scheme such as what glob uses, ...) that I can translate to MySQL regular expressions so that I can be sure that no one can enter a "pattern" that locks up the site.

    I don't know anything about Everything :) so don't know what it looks like in the query
    But that is easy enough to find out. Simply "view source" on the results page and search for "SELECT" and there you'll see (one of) the query all spelled out in full SQL. :)

    (Minor updates applied.)

            - tye (but my friends call me "Tye")

    1 Leading and trailing spaces are stripped from the separator string (but not from the search string). And an empty separator string is interpretted as being a single space instead. This is because spaces don't really "show" on the web page.

    So you can't set your separator to be, for example ", " nor " + ", but you can search on those strings so long as your separator string doesn't conflict.

      Perfect. I changed the separator to something else and searched for the <space>EDI<space> string and got quick and accurate responses.

      Thanks some more.