Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Actually, the code to handle \b was written long ago but it isn't available for public use because it requires MySQL regexes which are huge CPU hogs. It used to be that Saints could use MySQL regexes for searching on node titles, but that feature was doomed to eventually go away (which finally happened as a prelude to introducing the new levels) from the beginning, because it still required too much DB server CPU for it to be used except rarely (it offered a superb denial-of-service attack vector).

sauoq shows the best that is available in Super Search and, unfortunately, MySQL makes it unreasonable to provide anything much more useful. The only way MySQL provides for matching whole words is regexes (too much CPU) or "full-text searches" (which we tried for a while but they had their own problems, the worst but not only one being that they couldn't be prevented from taking the DB server to its knees if your search matched too many nodes). It is too bad that MySQL's LIKE operator doesn't allow something as simple as '%[^a-z]map[^a-z]%', which is a fairly common feature IME.

There are some fairly minor improvements that would be possible such as the ability to not ignore letter case, the ability to anchor searches, the ability to "OR" terms in a single search, or the ability to use concat(' ',title,' ') like '% map %' (which is used if you search for a one-letter word in the simple title search). But none of those are a huge "win" over what we already have.

The best answer (other than somehow getting MySQL to upgrade to one of the newer regex engines that aren't CPU hogs) is probably to allow post-filtering of search results using limited Perl regexes. The trick is that good keywords need to be provided in the SQL for the search to be effective. So I can't just let the user search for /\bmap\b/, I have to have them search for LIKE '%map%' in the SQL and post-filter on /\bmap\b/. Perhaps that would be done automatically by having a field that searches for "full words"...

- tye        


In reply to Re^3: Exact words in super search (MySQL) by tye
in thread Exact words in super search by LucaPette

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others rifling through the Monastery: (3)
    As of 2024-09-11 09:07 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      The PerlMonks site front end has:





      Results (13 votes). Check out past polls.

      Notices?
      erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.