Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
What do people do with such problems? [...] groups based on common prefix

Yes, this is exactly what I am doing. If you don't care which pattern matched, just that one pattern among many matched, then you can do just that.

I am in the middle of writing a module that will let you assemble an arbitrary number of regular expressions, and combine them into trie structure, and from there produce an expression that will produce a minimal amount of backtracking behaviour. (Things like /a.*b/ notwithstanding).

The basic stuff works and tests okay, I'm adding code to shorten the length of the generated regular expression.

The main problem with using a brute force solution of just |ing the different patterns together is that on a target string that does not match, the engine will have to do a significant amount of work before working out that this is in fact the case.

In the 5.10 TODO list, Nicholas Clark talks about the fact that it would be nice to have the RE compiler to do this for you, but having already battled doing it in Perl, I think I'd rather gouge my eyes out with a blunt stick than to recast the algorithm in C.

... and as far as I know (and have asked) no, no-one appears to have done this yet, which sort of surprised me.

update: urg! I missed the part about the fact that you did want to know which one matched. I'm using this in relation to spam, which is why I don't care what matched, just that something matched and therefore triggers a different course of events.

Thinking about how to extend the approach, I think that all I need to do is to add code into the end of each RE with the (?{ code }) construct that sets a variable to record which rule matched. Hmm, I'll put that in the TODO list, but first off I'd like to get the damned thing out the door.

- another intruder with the mooring of the heat of the Perl


In reply to Re: Matching against list of patterns by grinder
in thread Matching against list of patterns by Eyck

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-03-29 09:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found