Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
XP is just a number
 
PerlMonks  

Re^3: How to use "less than" and "greater than" inside a regex for a $variable number

by AnomalousMonk (Monsignor)
on Oct 02, 2012 at 03:18 UTC ( #996788=note: print w/ replies, xml ) Need Help??


in reply to Re^2: How to use "less than" and "greater than" inside a regex for a $variable number
in thread How to use "less than" and "greater than" inside a regex for a $variable number

The  (*F) operator was introduced with 5.10. Prior to that,  (?!) can be used.

>perl -wMstrict -le "for my $n (4 .. 9) { my $str = qq{I have 5 apples, $n oranges, and 8 limes.}; print qq{'$str'}; next unless $str =~ m{ (\d+) \s+ apples \D+ (\d+) \s+ oranges \D+ (\d+) \s+ limes (?(?{ $1 < $2 && $2 < $3 }) | (*F) ) }xms; print qq{'$2'}; } " 'I have 5 apples, 4 oranges, and 8 limes.' 'I have 5 apples, 5 oranges, and 8 limes.' 'I have 5 apples, 6 oranges, and 8 limes.' '6' 'I have 5 apples, 7 oranges, and 8 limes.' '7' 'I have 5 apples, 8 oranges, and 8 limes.' 'I have 5 apples, 9 oranges, and 8 limes.'


Comment on Re^3: How to use "less than" and "greater than" inside a regex for a $variable number
Select or Download Code
Re^4: How to use "less than" and "greater than" inside a regex for a $variable number
by Polyglot (Monk) on Oct 04, 2012 at 19:42 UTC
    I've implemented this approach, as it seems fairly close to the sort of solution I was looking for. Unfortunately, it is still rather slow. I started the process 2.5 days ago now (it's been running over 60 hours) and it is about half-way through the material. So it appears with this method it will take 5 days of 100% CPU on one of four cores of my Dell PowerEdge server. That's a little disappointing. My ugly approach, which may be slightly less thorough, finished after about three days. So it was 40% quicker.

    Given the complexity of the regex, I suppose I cannot blame perl or the program itself, it's just the way it is. But without the attempt to narrow the search to finding numbers between their respective forerunners/postrunners, the whole search can complete in less than five minutes.

    Anyway, at least I have learned something and I much appreciate your patience in demonstrating this method for me. I may still be able to use this as a final check over a long weekend or something, or perhaps I can limit the amount of material to be checked at a time (~130 books total). Thank you!

    Blessings,

    ~Polyglot~

      Polyglot: I don't know if the following will be of any use to you, but I was curious to play with some different approaches to what I conceive to be your problem. You may as well have the results. All these work (for some definition of 'work').

      The first new approach is a variation on something I've already posted: two different replacement strings for the sequential versus non-sequential page number cases. In the case of sequential page numbers, the replacement string is the empty string, which may be something the regex engine can effectively 'optimize away' at run time.

      The second new approach is to try to avoid altogether the replacement clause of the substitution in the case of sequential page numbers. This approach uses some of the newer, more exotic regex constructs introduced with 5.10. The problem with these is that their newness means that they may not be as efficiently recognized and optimized by the regex compiler, hence slower overall. I have done no benchmarking whatsoever.

      Output:

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://996788]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2014-04-20 20:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (487 votes), past polls