Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Replace string in one file using input from another file

by AnomalousMonk (Chancellor)
on Feb 07, 2018 at 17:16 UTC ( #1208640=note: print w/replies, xml ) Need Help??

in reply to Replace string in one file using input from another file

Try something like this:

c:\@Work\Perl\monks\lewars>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $xlate_file = 'lookup.dat'; open my $fh_xlate, '<', $xlate_file or die qq{opening '$xlate_file': +$!}; ;; my %xlate = map { m{ \A (\S+) \s+ (.+?) \s+ \z }xms } <$fh_xlate> ; dd \%xlate; close $fh_xlate or die qq{closing '$xlate_file': $!}; ;; my ($rx_ref) = map qr{ \b (?: $_) \b }xms, join ' | ', map quotemeta, reverse sort keys %xlate ; print $rx_ref; ;; my $master_file = 'master.dat'; open my $fh_master, '<', $master_file or die qq{opening '$master_file +': $!}; ;; my $master = do { local $/; <$fh_master> }; close $fh_master or die qq{closing '$master_file': $!}; ;; $master =~ s{ ($rx_ref) }{$xlate{$1}}xmsg; print qq{[[$master]]}; " { Ref00004 => " +cc;ACS=0", Ref00005 => ";ACS=0;R +EL=0", Ref00006 => ";ACS +=0;REL=0", Ref00007 => "https:///siteminderagent/cert/smgetcred.scc?cert", Ref00008 => ";ACS +=0;REL=0", Ref00009 => ";AC +S=0;REL=0", } (?msx-i: \b (?: Ref00009 | Ref00008 | Ref00007 | Ref00006 | Ref00005 | + Ref00004) \b ) [[<Property Name="CA.SM::AuthScheme.IsUsedbyAdmin"> <BooleanValue>false</BooleanValue> </Property> <Property Name="CA.SM::AuthScheme.Desc"> <StringValue>TCP portal auth scheme</StringValue> </Property> <Property Name="CA.SM::AuthScheme.Level"> <NumberValue>5</NumberValue> </Property> <Property Name="CA.SM::AuthScheme.IsTemplate"> <BooleanValue>false</BooleanValue> </Property> <Property Name="CA.SM::AuthScheme.Param"> <LinkValue><XREF>;A +CS=0;REL=0</XREF></LinkValue> </Property> <Property Name="CA.SM::AuthScheme.Library"> ]]
  • This approach slurps the entire master file into memory, so it should work fine with a 38 MB or even 380 MB file, but will not scale to larger file sizes indefinitely.
  • The regex for matching references assumes the reference string is always bounded by a non-\w character. If this is not the case, adjust as needed.
  • The substitution replaces Ref00004-like strings anywhere and everywhere in the file. If you need this replacement done, e.g., only between certain tags, adjust the match regex as needed or perhaps use an XML parser.
  • The example code only print-s to standard out; adjust as needed.
  • Update: No validation is done on the content of the lookup.dat file. It might be wise to consider this.
  • Update: I think the regex for extracting URLs from the lookup data file will support embedded whitespace in the URL, but I haven't tested this. Caveat Programmor.
  • Update: The regex for extracting reference placeholders and URLs from records in the lookup file is very naive. For instance,  \S+ matches a reference placeholder. Personally, I would feel better with a more specific match, maybe something like
        qr{ (?<! [[:alpha:]]) Ref \d{5} (?! \d) }xms
    Likewise, I'm sure there are canned regexes for matching URLs available.

Update: For a good discussion of the technique used above to build the  $rx_ref regex matching object, see Building Regex Alternations Dynamically by haukex.

Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Replace string in one file using input from another file
by lewars (Initiate) on Feb 08, 2018 at 15:00 UTC

    Thanks so much! This works flawlessly!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1208640]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2018-06-21 02:56 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (117 votes). Check out past polls.