Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regexp::Assemble hangs with a certain case

by kimmel (Scribe)
on Nov 15, 2012 at 16:41 UTC ( [id://1004038]=perlquestion: print w/replies, xml ) Need Help??

kimmel has asked for the wisdom of the Perl Monks concerning the following question:

Okay I am trying to figure out why 'one' works but 'two' just hangs the script. I read perlre and perlretut again to see if the answer would just jump out at me and it didn't.

Edit: I forgot to paste part of the code below. See my follow-up comment for the correct full program.

#!/usr/bin/perl use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$1}++ while $content =~ /$regex/; }, } );

The source text is Bram Stoker's Dracula a 836KB file with 16,248 lines. The sample_patterns file contains 4,000 patterns, one per line. The only difference between 'one' and 'two' is the g modifier on the regexp.

Replies are listed 'Best First'.
Re: Regexp::Assemble hangs with a certain case
by golux (Chaplain) on Nov 15, 2012 at 16:44 UTC
    Hi Kimmel,

    It's because in the second case, $content matches the $regex (at the same location each time), so you're never changing the condition; hence never exiting the loop. Try changing "while" to "if", perhaps?

    say  substr+lc crypt(qw $i3 SI$),4,5
Re: Regexp::Assemble hangs with a certain case
by Anonymous Monk on Nov 16, 2012 at 03:18 UTC

    Run this

    perl -Mre=debug -le " 1 while q{234} =~ /\d/g "

    Compare with one or two pages from this infinite loop

    perl -Mre=debug -le " 1 while q{234} =~ /\d/ "

    You should notice that without g in m//g the pos-ition doesn't advance, you're always matching against 2, and its always true, and it never ends, cause 2 is always \d

    Regexp::Assemble hangs with a certain case

    A regex produced by Regexp::Assemble is not Regexp::Assemble , Regexp::Assemble is not hanging -- but you're not even using Regexp::Assemble to assemble a regex, so its got nothing to do with it

      Ooops I did not check the code I posted, again. I really need to stop doing that. Here is what the program should have looked like.

      use v5.16; use warnings; use autodie qw( :all ); use utf8::all; use File::Slurp qw( read_file ); use Regexp::Assemble; use Benchmark qw( cmpthese :hireswallclock ); my %seen; my %seen2; my $fname = 'dracula.txt'; my $content = read_file($fname); $content =~ tr/!"#$%&'()*+,\-.\/:;<=>?@\[\\]^_`{|}~/ /; my @patterns = read_file('sample_patterns'); chomp @patterns; my $regex = join '|', map {quotemeta} @patterns; $regex = qr/\b($regex)\b/ixms; my $regex2 = Regexp::Assemble->new->add(@patterns); $regex2->anchor_word(1); $regex2->flags('ixms'); $regex2->re(); cmpthese( -5, { 'one' => sub { $seen{$1}++ while $content =~ /$regex/g; }, 'two' => sub { $seen2{$regex2->mvar(1)}++ while $content =~ /$regex2/; }, } );

      I understand now why it was acting the way it was.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1004038]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-04-23 16:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found