Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Matching against list of patterns

by grinder (Bishop)
on Sep 16, 2004 at 13:43 UTC ( [id://391441]=note: print w/replies, xml ) Need Help??


in reply to Matching against list of patterns

What do people do with such problems? [...] groups based on common prefix

Yes, this is exactly what I am doing. If you don't care which pattern matched, just that one pattern among many matched, then you can do just that.

I am in the middle of writing a module that will let you assemble an arbitrary number of regular expressions, and combine them into trie structure, and from there produce an expression that will produce a minimal amount of backtracking behaviour. (Things like /a.*b/ notwithstanding).

The basic stuff works and tests okay, I'm adding code to shorten the length of the generated regular expression.

The main problem with using a brute force solution of just |ing the different patterns together is that on a target string that does not match, the engine will have to do a significant amount of work before working out that this is in fact the case.

In the 5.10 TODO list, Nicholas Clark talks about the fact that it would be nice to have the RE compiler to do this for you, but having already battled doing it in Perl, I think I'd rather gouge my eyes out with a blunt stick than to recast the algorithm in C.

... and as far as I know (and have asked) no, no-one appears to have done this yet, which sort of surprised me.

update: urg! I missed the part about the fact that you did want to know which one matched. I'm using this in relation to spam, which is why I don't care what matched, just that something matched and therefore triggers a different course of events.

Thinking about how to extend the approach, I think that all I need to do is to add code into the end of each RE with the (?{ code }) construct that sets a variable to record which rule matched. Hmm, I'll put that in the TODO list, but first off I'd like to get the damned thing out the door.

- another intruder with the mooring of the heat of the Perl

Replies are listed 'Best First'.
Re^2: Matching against list of patterns
by Eyck (Priest) on Sep 16, 2004 at 14:01 UTC

    I care about which pattern matched, that's the whole point

    Where can I find more about what you're working on?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://391441]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-24 11:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found