Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
What do people do with such problems? [...] groups based on common prefix

Yes, this is exactly what I am doing. If you don't care which pattern matched, just that one pattern among many matched, then you can do just that.

I am in the middle of writing a module that will let you assemble an arbitrary number of regular expressions, and combine them into trie structure, and from there produce an expression that will produce a minimal amount of backtracking behaviour. (Things like /a.*b/ notwithstanding).

The basic stuff works and tests okay, I'm adding code to shorten the length of the generated regular expression.

The main problem with using a brute force solution of just |ing the different patterns together is that on a target string that does not match, the engine will have to do a significant amount of work before working out that this is in fact the case.

In the 5.10 TODO list, Nicholas Clark talks about the fact that it would be nice to have the RE compiler to do this for you, but having already battled doing it in Perl, I think I'd rather gouge my eyes out with a blunt stick than to recast the algorithm in C.

... and as far as I know (and have asked) no, no-one appears to have done this yet, which sort of surprised me.

update: urg! I missed the part about the fact that you did want to know which one matched. I'm using this in relation to spam, which is why I don't care what matched, just that something matched and therefore triggers a different course of events.

Thinking about how to extend the approach, I think that all I need to do is to add code into the end of each RE with the (?{ code }) construct that sets a variable to record which rule matched. Hmm, I'll put that in the TODO list, but first off I'd like to get the damned thing out the door.

- another intruder with the mooring of the heat of the Perl

In reply to Re: Matching against list of patterns by grinder
in thread Matching against list of patterns by Eyck

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (7)
    As of 2020-11-27 09:35 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found