<?xml version="1.0" encoding="windows-1252"?>
<node id="1004038" title="Regexp::Assemble hangs with a certain case" created="2012-11-15 11:41:08" updated="2012-11-15 11:41:08">
<type id="115">
perlquestion</type>
<author id="910459">
kimmel</author>
<data>
<field name="doctext">
&lt;p&gt;Okay I am trying to figure out why 'one' works but 'two' just hangs the script. I read perlre and perlretut again to see if the answer would just jump out at me and it didn't.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Edit: I forgot to paste part of the code below. See my follow-up comment for the correct full program.&lt;/b&gt;&lt;/p&gt;

&lt;code&gt;
#!/usr/bin/perl

use v5.16;
use warnings;
use autodie qw( :all );
use utf8::all;
use File::Slurp qw( read_file );
use Regexp::Assemble;
use Benchmark qw( cmpthese :hireswallclock );

my %seen;
my %seen2;

my $fname   = 'dracula.txt';
my $content = read_file($fname);
$content =~ tr/!"#$%&amp;'()*+,\-.\/:;&lt;=&gt;?@\[\\]^_`{|}~/ /;

my @patterns = read_file('sample_patterns');
chomp @patterns;
my $regex = join '|', map {quotemeta} @patterns;
$regex = qr/\b($regex)\b/ixms;


cmpthese(
    -5,
    {   
        'one' =&gt; sub {
            $seen{$1}++ while $content =~ /$regex/g;
        },
        'two' =&gt; sub {
            $seen2{$1}++ while $content =~ /$regex/;
        },
    }
);
&lt;/code&gt;

&lt;p&gt;
The source text is Bram Stoker's Dracula a 836KB file with 16,248 lines. The sample_patterns file contains 4,000 patterns, one per line. The only difference between 'one' and 'two' is the g modifier on the regexp.
&lt;/p&gt;</field>
</data>
</node>
