Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

PerlMonks Search Regex

by mt2k (Hermit)
on Jun 08, 2002 at 22:55 UTC ( #172854=monkdiscuss: print w/ replies, xml ) Need Help??

I did a quick perlmonks search for "search regex" and "perlmonks search" but came up with nothing on this subject. If I missed it somewhere, please post a link where I can read up on it.

My proposition is this: Yes, Super Search is cool, and the general search is usually enough to find what one is looking for, but how hard would it be to incorporate the use of a regex in the search engine?

Imagine how much power we would have in searching for a particular post/question/answer/user or whatever else we are looking for on the site. Pretend you are pepik_knize's friend, but you just can't seem to remember how to spell that whole dang name (pepsi_knife?). Let's say you remember that it starts with "pep". Super Search could almost handle this, but this would first reuire finding the dang thing and then you remember that it can only handle 4 letters or longer.

Wouldn't it be so much more funner/faster/programatic to simply type "/^pep/" in the normal search box. Of course, this is just the simplest of regexes. If you *really* know what you are looking for, you could throw together quite a regex to find the specific post(s) you want, rather than scrolling for forever down the whole list brought up when searching for anything.

I think the main debate here is how much strain would this put on the server to run a customized regex? Would it actually run faster, or would it bog everything down? Then there is always the fact that someone might search for "/.*?/" or something similar, which would be quite a problem.... return every post of pm? Comments please!

Comment on PerlMonks Search Regex
Re: PerlMonks Search Regex
by belg4mit (Prior) on Jun 09, 2002 at 01:06 UTC
    Firstly; and I could be wrong, but it's my expeirence that; the serches here ignore the second word. Well actually a search is performed for each word and the result lists are merged.

    A regexp search while probakly not too hard since this site uses MySQL, which has RLIKE, could be mighty dangerous. Search is already kind of heavy on resources, and I could foresee an improperly constructed regexp running out of control.

    --
    perl -pew "s/\b;([mnst])/'$1/g"

      The "new" version of simple search uses MySQL "full text search" (in hopes of reducing DB load) which doesn't work very well IMHO. The previous version of simple search did not act as you describe. The next version won't either. See also (tye)Re: Searching and Typing.

              - tye (but my friends call me "Tye")
Re: PerlMonks Search Regex
by BUU (Prior) on Jun 09, 2002 at 01:16 UTC
    I think security would be a huge concern, a 'bad' regex could easily end up doing lots and lots of bad things.
      Well, concerning security issues, would using \Q and \E not do plenty? The only real problem would be deciding what kind of regexes are allowed and how to enforce them... As I first stated and as belg4mit reinforced, a malicious user could come up with some regex to do so much searching... maybe even something that would kind of loop...
(proxy for tye) Re: PerlMonks Search Regex
by Zaxo (Archbishop) on Jun 09, 2002 at 07:42 UTC

    Relay from tye via cb:

    "I started writing the regex PM search last week and now it is just a matter of (my) time."
    For details, see Tye's Search.

    He is currently having technical trouble which prevents direct posting.

    After Compline,
    Zaxo

Re: PerlMonks Search Regex
by gumby (Scribe) on Jun 12, 2002 at 19:26 UTC
    You could always use Text::Soundex for finding names that sound similar.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://172854]
Approved by theorbtwo
Front-paged by silent11
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-07-23 23:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (154 votes), past polls