Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Exact words in super search

by radiantmatrix (Parson)
on Nov 04, 2005 at 21:07 UTC ( #505872=note: print w/ replies, xml ) Need Help??


in reply to Re: Exact words in super search
in thread Exact words in super search

There is no regex capability which would allow you to specify e.g. word boundaries. "Tough beans." :-)

Yes, there is. It's called \b, and it looks for "word boundaries". Now, I believe that implementing the code to wrap the various regexed search terms correctly might be less trivial: I haven't looked at that code yet, so I don't know if the implementation somehow makes it difficult.

<-radiant.matrix->
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
"In any sufficiently large group of people, most are idiots" - Kaa's Law


Comment on Re^2: Exact words in super search
Download Code
Re^3: Exact words in super search
by jdporter (Canon) on Nov 04, 2005 at 22:52 UTC

    Really? Are you sure? Any explanation as to why \b doesn't work for me right now? Does the use of this search feature require permission that you have but I don't?

    We're building the house of the future together.

      You are mixing terms. I didn't say you could type '\b' in the search box and have it work. I said that there was a way, using regex, that one can check for word boundaries. This was in reply to the statement "There is no regex capability which would allow you to specify e.g. word boundaries.".

      There certainly is such a regex capability. There is not, however, a way to use that capability in the current incarnation of Super Search. The reason for this is explained by tye in a node below.

      <-radiant.matrix->
      A collection of thoughts and links from the minds of geeks
      The Code that can be seen is not the true Code
      "In any sufficiently large group of people, most are idiots" - Kaa's Law

        I'm not sure what terms you think I'm mixing. I stayed on-context. You, by abandoning context, have brought a lot of heat and confusion to this thread.

        The reason for this is explained by tye in a node below.

        I know. I discussed it at length with tye and others in the cb before posting my previous replies. Even so, I tried to phrase it so as to give you the maximum BOD.

        We're building the house of the future together.
Re^3: Exact words in super search (MySQL)
by tye (Cardinal) on Nov 05, 2005 at 01:03 UTC

    Actually, the code to handle \b was written long ago but it isn't available for public use because it requires MySQL regexes which are huge CPU hogs. It used to be that Saints could use MySQL regexes for searching on node titles, but that feature was doomed to eventually go away (which finally happened as a prelude to introducing the new levels) from the beginning, because it still required too much DB server CPU for it to be used except rarely (it offered a superb denial-of-service attack vector).

    sauoq shows the best that is available in Super Search and, unfortunately, MySQL makes it unreasonable to provide anything much more useful. The only way MySQL provides for matching whole words is regexes (too much CPU) or "full-text searches" (which we tried for a while but they had their own problems, the worst but not only one being that they couldn't be prevented from taking the DB server to its knees if your search matched too many nodes). It is too bad that MySQL's LIKE operator doesn't allow something as simple as '%[^a-z]map[^a-z]%', which is a fairly common feature IME.

    There are some fairly minor improvements that would be possible such as the ability to not ignore letter case, the ability to anchor searches, the ability to "OR" terms in a single search, or the ability to use concat(' ',title,' ') like '% map %' (which is used if you search for a one-letter word in the simple title search). But none of those are a huge "win" over what we already have.

    The best answer (other than somehow getting MySQL to upgrade to one of the newer regex engines that aren't CPU hogs) is probably to allow post-filtering of search results using limited Perl regexes. The trick is that good keywords need to be provided in the SQL for the search to be effective. So I can't just let the user search for /\bmap\b/, I have to have them search for LIKE '%map%' in the SQL and post-filter on /\bmap\b/. Perhaps that would be done automatically by having a field that searches for "full words"...

    - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://505872]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2014-12-21 04:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (103 votes), past polls