Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
What do people do with such problems? [...] groups based on common prefix

Yes, this is exactly what I am doing. If you don't care which pattern matched, just that one pattern among many matched, then you can do just that.

I am in the middle of writing a module that will let you assemble an arbitrary number of regular expressions, and combine them into trie structure, and from there produce an expression that will produce a minimal amount of backtracking behaviour. (Things like /a.*b/ notwithstanding).

The basic stuff works and tests okay, I'm adding code to shorten the length of the generated regular expression.

The main problem with using a brute force solution of just |ing the different patterns together is that on a target string that does not match, the engine will have to do a significant amount of work before working out that this is in fact the case.

In the 5.10 TODO list, Nicholas Clark talks about the fact that it would be nice to have the RE compiler to do this for you, but having already battled doing it in Perl, I think I'd rather gouge my eyes out with a blunt stick than to recast the algorithm in C.

... and as far as I know (and have asked) no, no-one appears to have done this yet, which sort of surprised me.

update: urg! I missed the part about the fact that you did want to know which one matched. I'm using this in relation to spam, which is why I don't care what matched, just that something matched and therefore triggers a different course of events.

Thinking about how to extend the approach, I think that all I need to do is to add code into the end of each RE with the (?{ code }) construct that sets a variable to record which rule matched. Hmm, I'll put that in the TODO list, but first off I'd like to get the damned thing out the door.

- another intruder with the mooring of the heat of the Perl


In reply to Re: Matching against list of patterns by grinder
in thread Matching against list of patterns by Eyck

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (4)
    As of 2014-08-02 04:43 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Who would be the most fun to work for?















      Results (54 votes), past polls