Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Regexp::Assemble hangs with a certain case

by kimmel (Beadle)
on Nov 15, 2012 at 16:41 UTC ( #1004038=perlquestion: print w/replies, xml ) Need Help??
kimmel has asked for the wisdom of the Perl Monks concerning the following question:

Okay I am trying to figure out why 'one' works but 'two' just hangs the script. I read perlre and perlretut again to see if the answer would just jump out at me and it didn't.

Edit: I forgot to paste part of the code below. See my follow-up comment for the correct full program.

#!/usr/bin/perl use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$1}++ while $content =~ /$regex/; }, } );

The source text is Bram Stoker's Dracula a 836KB file with 16,248 lines. The sample_patterns file contains 4,000 patterns, one per line. The only difference between 'one' and 'two' is the g modifier on the regexp.

Replies are listed 'Best First'.
Re: Regexp::Assemble hangs with a certain case
by golux (Hermit) on Nov 15, 2012 at 16:44 UTC
    Hi Kimmel,

    It's because in the second case, $content matches the $regex (at the same location each time), so you're never changing the condition; hence never exiting the loop. Try changing "while" to "if", perhaps?

    say  substr+lc crypt(qw $i3 SI$),4,5
Re: Regexp::Assemble hangs with a certain case
by Anonymous Monk on Nov 16, 2012 at 03:18 UTC

    Run this

    perl -Mre=debug -le " 1 while q{234} =~ /\d/g "

    Compare with one or two pages from this infinite loop

    perl -Mre=debug -le " 1 while q{234} =~ /\d/ "

    You should notice that without g in m//g the pos-ition doesn't advance, you're always matching against 2, and its always true, and it never ends, cause 2 is always \d

    Regexp::Assemble hangs with a certain case

    A regex produced by Regexp::Assemble is not Regexp::Assemble , Regexp::Assemble is not hanging -- but you're not even using Regexp::Assemble to assemble a regex, so its got nothing to do with it

      Ooops I did not check the code I posted, again. I really need to stop doing that. Here is what the program should have looked like.

      use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; my $regex2 = Regexp::Assemble->new->add(@patterns); $regex2->anchor_word(1); $regex2->flags('ixms'); $regex2->re(); cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$regex2->mvar(1)}++ while $content =~ /$regex2/; }, } );

      I understand now why it was acting the way it was.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004038]
Front-paged by Arunbear
help
Chatterbox?
[stevieb]: yeah, so the wiringPi library appears to be missing/injecting incorrect defined variables into I2C calls, and it's all over the map. I'm going to have to revisit and use something else, write something else, or scrutinize the code and fix

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2017-06-23 00:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many monitors do you use while coding?















    Results (532 votes). Check out past polls.