http://www.perlmonks.org?node_id=429705


in reply to Re: Massive regexp search and replace
in thread Massive regexp search and replace

Thanx Hena. I will try what you suggest and see if it reduces processing time sufficiently.

As for your assumtions, a sample replacement patterns list (REGEX) could be:
\b([a-z])([a-z]*)ung\b \u$1\l$2ung Treecontrol Tree Control [Tt]abreiter Reiterelement [Tt]ile Teilbild
And a sample input text (INPUT) for the replacements could be:
Die Segnung ist gestern erfolgt. Die segnung ist gestern erfolgt. Die Rechnung wird geschickt. Die rechnung wird geschickt. Die Treecontrol. Die Tabreiter. Die tabreiter. Die Tile. Die tile.
I wonder if this changes anything in what you suggest...

Replies are listed 'Best First'.
Re^3: Massive regexp search and replace
by Hena (Friar) on Feb 10, 2005 at 14:05 UTC
    Well, all direct text translations might be handled faster... but unless there is a lot of them compared to others then it probably won't help (might actually be slower). The actual help would be better to be tested as this is pure speculation :).

    Basicly make to hashes instead of one. Something like this.
    while (<REGEX>) { chomp; my ($key,$value) = split (\t,$_); $value = "\"$value\""; if ($key=~s/^\w+$/) { $simple{$key}=$value; } else { $regex{$key}=$value; } } while (<INPUT>) { s/$key/$regex{$key}/gee foreach my $key (keys %regex); foreach (split (/\s+/,$_)) { if (exists($simple{$_})) { push (@line,$simple{$_}); } else { push (@line,$_); } } print OUT "@line\n"; }
    Note that in the given examples, you might write out the '[Tt]ile' pattern to Tile and tile rows. Which would move it from slower pattern group to faster. But as I said, I'm not sure how much this would help.
Re^3: Massive regexp search and replace
by hsinclai (Deacon) on Feb 10, 2005 at 14:07 UTC
    Expanding on Hena's idea I wonder if it would be even more efficient to use Tie::File to go through, writing replacements as you go (untested):
    use Tie::File; my $inputfile = "samplein.txt"; &replacer($inputfile); sub replacer { tie my @currentfile, 'Tie::File', $inputfile or die "$!"; my $inputline; foreach $inputline ( $currentfile[0] .. $#currentfile ) { foreach my $key (keys %regex) { $inputline =~ s/$key/$regex{$key}/gee; } } untie @currentfile; } ## Totally untested

    Seems like the write operation would be faster with Tie::File