Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^5: How to use "less than" and "greater than" inside a regex for a $variable number

by AnomalousMonk (Monsignor)
on Oct 06, 2012 at 10:41 UTC ( #997613=note: print w/ replies, xml ) Need Help??


in reply to Re^4: How to use "less than" and "greater than" inside a regex for a $variable number
in thread How to use "less than" and "greater than" inside a regex for a $variable number

Polyglot: I don't know if the following will be of any use to you, but I was curious to play with some different approaches to what I conceive to be your problem. You may as well have the results. All these work (for some definition of 'work').

The first new approach is a variation on something I've already posted: two different replacement strings for the sequential versus non-sequential page number cases. In the case of sequential page numbers, the replacement string is the empty string, which may be something the regex engine can effectively 'optimize away' at run time.

The second new approach is to try to avoid altogether the replacement clause of the substitution in the case of sequential page numbers. This approach uses some of the newer, more exotic regex constructs introduced with 5.10. The problem with these is that their newness means that they may not be as efficiently recognized and optimized by the regex compiler, hence slower overall. I have done no benchmarking whatsoever.

use warnings FATAL => 'all' ; use strict; use constant DEBUG => 0; my $book = <<'ENDBOOK'; pg. 1 one two pg. 2 two three four pg. 4 four five pg. 5 five six pg. 6 six seven eight nine pg. 9 nine ten pg. 10 ten eleven twelve thirteen fourteen pg. 14 fourteen fifteen pg. 15 fifteen sixteen seventeen pg. 17 seventeen eighteen nineteen pg. 19 nineteen and out ENDBOOK print qq{[[$book]] \n\n}; # all these solutions use \K of 5.10+ # # works # # this solution works (insofar as i understand what Polyglot # # wants), but is 'inefficient' in that it involves substitution # # of a substring with an identical substring in most cases # # (assuming sequential page numbers are the most common case). # # my $pn = qr{ pg[.] \s+ }xms; # $book =~ # s{ $pn (\d+) \K (.*?) (?= $pn (\d+)) } # { my $m = missing($1, $3); $m ? qq{$2$m } : $2; }xmsge; # # works. extracts/classifies pg. number/matter ok. subst. ok. # # this solution works (with caveat given above), but in the case # # of sequential page numbers will insert an empty string into # # the target string, which may or may not be 'efficient'. # my $pn = qr{ pg[.] \s+ (\d+) }xms; # CAUTION: embedded capture # $book =~ s{ # $pn # capture pg. number to $1 # .*? \K # ignore pg. number/matter in replace # (?= $pn) # overlap capture next pg. number to $2 # } # { my $m = missing($1, $2); # print "rr'$1' '$2' s/${^MATCH}/$m/rr \n" if DEBUG; # $m; # }xmspge; # use exotic 5.10+ regex constructs to avoid 'useless' substitution. # # works. extracts/classifies pg. number/matter ok. subst. ok. # $book =~ s{ # pg[.] \s+ (\d+) # capture pg. number to $1 # .*? \K # ignore pg. number/matter in replace # (?= pg[.] \s+ (\d+)) # overlap capture next pg. number to $2 # (?(?{ $2 - $1 == 1 }) # sequential pages? # # sequential: no replacement, advance to next pg. # (?{ print "++'$1' '$2'++ \n" if DEBUG; }) # (*SKIP) (*FAIL) # | # # non-sequential: replace/insert missing pg(s)., advance # (?{ print "--'$1' '$2'-- \n" if DEBUG; }) # # null regex always true # ) # } # { my $m = missing($1, $2); # print "rr'$1' '$2' s/${^MATCH}/$m/rr \n" if DEBUG; # $m; # }xmspge; # # works. extracts/classifies pg. number/matter ok. subst. ok. # my $pn = qr{ pg[.] \s+ (\d+) }xms; # CAUTION: embedded capture # use re 'eval'; # $book =~ s{ # $pn # capture pg. number to $1 # .*? \K # ignore pg. number/matter in replace # (?= $pn) # overlap capture next pg. number to $2 # (?(?{ $2 - $1 == 1 }) # sequential pages? # # sequential: no replacement, advance to next pg. # (?{ print "++'$1' '$2'++ \n" if DEBUG; }) # (*SKIP) (*FAIL) # | # # non-sequential: replace/insert missing pg(s)., advance # (?{ print "--'$1' '$2'-- \n" if DEBUG; }) # # null regex always true # ) # } # { my $m = missing($1, $2); # print "rr'$1' '$2' s/${^MATCH}/$m/rr \n" if DEBUG; # $m; # }xmspge; # works. extracts/classifies pg. number/matter ok. subst. ok. my $pn = qr{ pg[.] \s+ (\d+) }xms; # CAUTION: embedded capture use re 'eval'; $book =~ s{ $pn # capture pg. number to $1 .*? \K # ignore pg. number/matter in replace # advance (i.e., skip) matching to this point if pages sequential (?= $pn) # overlap capture next pg. number to $2 (?(?{ $2 - $1 == 1 }) # sequential pages? # sequential: no replacement, advance to next pg. (?{ print "++'$1' '$2'++ \n" if DEBUG; }) (*SKIP) # skip past current page on failure (*FAIL) # fail the match: no replacement ) } { my $m = missing($1, $2); print "rr'$1' '$2' s/${^MATCH}/$m/rr \n" if DEBUG; $m; }xmspge; print "\n"; print "(($book)) \n"; sub missing { my ($i, $j) = @_; die "bad page sequence $i-$j" if $i >= $j; return '' if $j - $i < 2; # no missing page(s) my ($ii, $jj) = ($i + 1, $j - 1); # figure the gap return $ii == $jj ? qq{(PAGE $ii MISSING) } # just one page missing : qq{(PAGES $ii - $jj MISSING) } # multiple pages missing ; }

Output:

c:\@Work\Perl\monks\Polyglot>perl non_sequential_pages_1.pl [[pg. 1 one two pg. 2 two three four pg. 4 four five pg. 5 five six pg. 6 six seven eight nine pg. 9 nine ten pg. 10 ten eleven twelve thirteen fourteen pg. 14 fourteen fifteen pg. 15 fifteen sixteen seventeen pg. 17 seventeen eighteen nineteen pg. 19 nineteen and out ]] ((pg. 1 one two pg. 2 two three four (PAGE 3 MISSING) pg. 4 four five pg. 5 five six pg. 6 six seven eight nine (PAGES 7 - 8 MISSING) pg. 9 nine ten pg. 10 ten eleven twelve thirteen fourteen (PAGES 11 - 13 MISSING) pg. 14 f +ourteen fifteen pg. 15 fifteen sixteen seventeen (PAGE 16 MISSING) pg. 17 seventeen eighteen nineteen (PAGE 18 MISSING) pg. 19 nineteen and out ))


Comment on Re^5: How to use "less than" and "greater than" inside a regex for a $variable number
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://997613]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2014-07-26 10:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls