Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Large file processing

by arivu198314 (Sexton)
on Oct 01, 2010 at 04:37 UTC ( #862904=note: print w/replies, xml ) Need Help??


in reply to Re^2: Large file processing
in thread Large file processing

My code is here.
#Usage: <Support-File> <Input-File> undef $/; open(TEXT, $ARGV[0]) or die $!; my $text = <TEXT>; close TEXT; open(XML, $ARGV[1]) or die $!; my $xml = <XML>; close XML; while($text =~ m/\/(.+)\/([^\/]+)\/[^\/]+$/mgi) { my $FindWord = $1; my $ReplaceWord = $2; $xml =~ s/(>[^>]*\b)\Q$FindWord\E(\b[^>]*<)/$1$ReplaceWord$2/gi; } open(OUTXML, ">$ARGV[1]") or die $!; print OUTXML $xml; close OUTXML;

Replies are listed 'Best First'.
Re^4: Large file processing
by ikegami (Pope) on Oct 01, 2010 at 07:00 UTC
    my %subs; while (<TEXT>) { my ($s,$r) = (split qr{/})[1,2]; $subs{lc($s)} = $r; } my $pat = join '|', map quotemeta, keys(%subs); #$xml =~ s/>[^>]*\b\K($pat)(?=\b[^>]*<)/$subs{lc($1)}/gi; #$xml =~ s/>[^>]*(?<=\W)\K($pat)(?=\W)(?=[^>]*<)/$subs{lc($1)}/gi; $xml =~ s/>[^>]*(?<=\W)\K($pat)(?=\W)/$subs{lc($1)}/gi;

    \K requires 5.10, but you could rewrite it without \K. The important bit is to create one pattern.

    It assumes you don't have inputs of the form /A/B/, /B/C/

    Update: Added required calls to lc.

      great shot!!!
Re^4: Large file processing
by GrandFather (Sage) on Oct 01, 2010 at 07:02 UTC

    Ok, I can see why that would be slow. Without knowing more about the nature of the find and replace strings it's hard to suggest any specific solution that are guaranteed to help, but the following technique may help:

    use strict; use warnings; my $source = <<STR; I have a file with 50,000 lines of find and replace string. For example /Test/Sample/ /A/X/ Now i want to process the file with input file. I have tried with usual method, it takes more than 1 hour. Please advice. STR my $matches = <<STR; /advice/advise/ /usual/the usual/ /lines/wobbles/ /file/flibble/ STR my %replace; open my $repIn, '<', \$matches; while (<$repIn>) { next if ! m{/([^/]+)/([^/]+)/}; $replace{lc $1} = $2; } close $repIn; my $match = join '|', keys %replace; open my $srcIn, '<', \$source; while (<$srcIn>) { s/($match)/$replace{lc $1}/eig; print; }

    Prints:

    I have a flibble with 50,000 wobbles of find and replace string. For example /Test/Sample/ /A/X/ Now i want to process the flibble with input flibble. I have tried with the usual method, it takes more than 1 hour. Please advise.
    True laziness is hard work
      /e isn't needed

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://862904]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2018-07-20 03:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (423 votes). Check out past polls.

    Notices?