my @groups = ();
my $pred = -1;
my $j = 0;
while ($string =~ m/$regex/g){
$pred == $-[0] or $j++;
$groups[$j] += $+[0] - $-[0];
$pred = $+[0];
}
And reduced regex to:
qr{
# start of full pattern
\( # match an opening parenthesis
(?0)* # recurse to capture full pattern
\) # match a closing parenthesis
# end of full pattern
}x;
UPDATE: seems that using recursive regex is dangerous, because in rare random "unlucky cases" it takes very much time, although in often "lucky cases" it takes little amount of time. I found seemingly better and faster (or say more stable) way - just glue long regex and use+modify through iterations against the target string. Code: #use strict;
use warnings;
use Time::HiRes;
$|++;
srand;
my @lines = <>;
my $n = 50000;
my $long = '';
$long .= qw{( )}[ int(rand() * 2) ] for 1 .. $n;
push @lines, $long;
my @lines2 = @lines;
my $start = Time::HiRes::time();
my $depth = 10; # some depth parameter
my $pre_regex2 =
'(?:\((?:\d+,)*(?:\)|' x $depth . '(?!)' . '\)))+' x $depth;
# if $depth == 1 then looks: '(?:\((?:\d+,)*(?:\)|x\)))+';
# (?:
# \(
# (?:\d+,)* (?:
# \)
# |
# (?!)\)
# )
# )+
my $regex2 = qr/$pre_regex2/;
my $regex3 = qr/(?:\d+,){2,}/;
for my $string (@lines2){
while (
$string =~ s/$regex2/
$a=$&,
$b=()=$a=~m{[)(]}g,
$a=~s{\d+,}{$b+=$&,''}eg,
"$b,"
/eg
or
$string =~ s/$regex3/
$a=$&,
$b=0,
$a=~s{\d+,}{$b+=$&,''}eg,
"$b,"
/eg
)
{}
my @groups = $string =~ m/\d+/g;
my $max=0;
for my $group (@groups){
$max < $group and $max = $group
}
my $occ = 0;
for my $group (@groups){
$occ += $max == $group
}
$max or $occ=1;
print "$max $occ\n";
}
print Time::HiRes::time() - $start, $/;
$start = Time::HiRes::time();
my $regex =
qr{
\( # match an opening parenthesis
(?0)* # recurse to capture full pattern
\) # match a closing parenthesis
}x;
for my $string (@lines)
{
my $max = 0;
my $occ = 1;
my @groups = ();
my $pred = -1;
my $i = 0;
while ($string =~ m/$regex/g){
$pred == $-[0] or $i++;
$groups[$i] += $+[0] - $-[0];
$pred = $+[0];
}
if (@groups)
{
@groups = sort { $a <=> $b } @groups;
$max = $groups[-1];
for my $i (reverse 0 .. $#groups - 1)
{
if ( $groups[$i] == $groups[-1])
{
++$occ;
}
else
{
last;
}
}
}
print "$max $occ\n";
}
print Time::HiRes::time() - $start, $/;
( http://ideone.com/K12225 )
Usually first block takes ~0,5 sec, when second block (recursive regex) takes from 0.01 sec to more than 5 sec. |