Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: Massive regexp search and replace

by albert.llorens (Initiate)
on Feb 10, 2005 at 13:31 UTC ( #429705=note: print w/ replies, xml ) Need Help??


in reply to Re: Massive regexp search and replace
in thread Massive regexp search and replace

Thanx Hena. I will try what you suggest and see if it reduces processing time sufficiently.

As for your assumtions, a sample replacement patterns list (REGEX) could be:

\b([a-z])([a-z]*)ung\b \u$1\l$2ung Treecontrol Tree Control [Tt]abreiter Reiterelement [Tt]ile Teilbild
And a sample input text (INPUT) for the replacements could be:
Die Segnung ist gestern erfolgt. Die segnung ist gestern erfolgt. Die Rechnung wird geschickt. Die rechnung wird geschickt. Die Treecontrol. Die Tabreiter. Die tabreiter. Die Tile. Die tile.
I wonder if this changes anything in what you suggest...


Comment on Re^2: Massive regexp search and replace
Select or Download Code
Replies are listed 'Best First'.
Re^3: Massive regexp search and replace
by Hena (Friar) on Feb 10, 2005 at 14:05 UTC
    Well, all direct text translations might be handled faster... but unless there is a lot of them compared to others then it probably won't help (might actually be slower). The actual help would be better to be tested as this is pure speculation :).

    Basicly make to hashes instead of one. Something like this.
    while (<REGEX>) { chomp; my ($key,$value) = split (\t,$_); $value = "\"$value\""; if ($key=~s/^\w+$/) { $simple{$key}=$value; } else { $regex{$key}=$value; } } while (<INPUT>) { s/$key/$regex{$key}/gee foreach my $key (keys %regex); foreach (split (/\s+/,$_)) { if (exists($simple{$_})) { push (@line,$simple{$_}); } else { push (@line,$_); } } print OUT "@line\n"; }
    Note that in the given examples, you might write out the '[Tt]ile' pattern to Tile and tile rows. Which would move it from slower pattern group to faster. But as I said, I'm not sure how much this would help.
Re^3: Massive regexp search and replace
by hsinclai (Deacon) on Feb 10, 2005 at 14:07 UTC
    Expanding on Hena's idea I wonder if it would be even more efficient to use Tie::File to go through, writing replacements as you go (untested):
    use Tie::File; my $inputfile = "samplein.txt"; &replacer($inputfile); sub replacer { tie my @currentfile, 'Tie::File', $inputfile or die "$!"; my $inputline; foreach $inputline ( $currentfile[0] .. $#currentfile ) { foreach my $key (keys %regex) { $inputline =~ s/$key/$regex{$key}/gee; } } untie @currentfile; } ## Totally untested

    Seems like the write operation would be faster with Tie::File

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://429705]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (16)
As of 2015-07-07 17:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls