Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Regexp::Assemble hangs with a certain case

by kimmel (Beadle)
on Nov 15, 2012 at 16:41 UTC ( #1004038=perlquestion: print w/ replies, xml ) Need Help??
kimmel has asked for the wisdom of the Perl Monks concerning the following question:

Okay I am trying to figure out why 'one' works but 'two' just hangs the script. I read perlre and perlretut again to see if the answer would just jump out at me and it didn't.

Edit: I forgot to paste part of the code below. See my follow-up comment for the correct full program.

#!/usr/bin/perl use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$1}++ while $content =~ /$regex/; }, } );

The source text is Bram Stoker's Dracula a 836KB file with 16,248 lines. The sample_patterns file contains 4,000 patterns, one per line. The only difference between 'one' and 'two' is the g modifier on the regexp.

Comment on Regexp::Assemble hangs with a certain case
Download Code
Replies are listed 'Best First'.
Re: Regexp::Assemble hangs with a certain case
by golux (Pilgrim) on Nov 15, 2012 at 16:44 UTC
    Hi Kimmel,

    It's because in the second case, $content matches the $regex (at the same location each time), so you're never changing the condition; hence never exiting the loop. Try changing "while" to "if", perhaps?

    say  substr+lc crypt(qw $i3 SI$),4,5
Re: Regexp::Assemble hangs with a certain case
by Anonymous Monk on Nov 16, 2012 at 03:18 UTC

    Run this

    perl -Mre=debug -le " 1 while q{234} =~ /\d/g "

    Compare with one or two pages from this infinite loop

    perl -Mre=debug -le " 1 while q{234} =~ /\d/ "

    You should notice that without g in m//g the pos-ition doesn't advance, you're always matching against 2, and its always true, and it never ends, cause 2 is always \d

    Regexp::Assemble hangs with a certain case

    A regex produced by Regexp::Assemble is not Regexp::Assemble , Regexp::Assemble is not hanging -- but you're not even using Regexp::Assemble to assemble a regex, so its got nothing to do with it

      Ooops I did not check the code I posted, again. I really need to stop doing that. Here is what the program should have looked like.

      use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; my $regex2 = Regexp::Assemble->new->add(@patterns); $regex2->anchor_word(1); $regex2->flags('ixms'); $regex2->re(); cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$regex2->mvar(1)}++ while $content =~ /$regex2/; }, } );

      I understand now why it was acting the way it was.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004038]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (15)
As of 2015-07-30 14:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls