Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Search and Replace in one file based upon contents of another

by Marshall (Canon)
on Oct 26, 2016 at 18:08 UTC ( [id://1174804]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Search and Replace in one file based upon contents of another
in thread Search and Replace in one file based upon contents of another

Based upon your meager information, here is a framework of code (untested).
#!/usr/bin/perl use strict; use warnings; #### untested #### open my $fh_transfile, '<', "filename" or die "unable to open translation file $!"; my %table; while my $line (<$fh_transfile>) { my($nat_ip, $real_ip) = split ' ', $line; $table{$nat_ip} = $real_ip; } close $fh_transfile; open my $fh_bigfile , '<', "filename" or die "unable to open big input file $!"; open my $fh_out, '>', "filenameout" or die "unable to open big output file $!" while my $line (<$fh_bigfile>) { chomp $line; my @tokens = split ',',$line; if ($table{$tokens[8]}) { $tokens[8] = $table{$tokens[8]}; } print $fh_out join(",",@tokens),"\n"; }
note: If the CSV file can have commas within a field (like "Smith,Sr", then you will need a module to help with the parsing. A simple split on comma will not work! I of course was not able to test with real data. I'm sure I've made some error or some detail is being missed, but this is the general idea.

Update: I got a msg with a question about why is there a "chomp $line;" in the second while loop and not in the first while loop? I'll put the answer here as others may have the same question...

The purpose of chomp() is to remove the line ending, represented as "\n" in Perl. If there was no chomp() in the second while loop, then the last element of @tokens would have the line ending included in that last element after the split on ','.

In the first while loop, split ' ',$line; does NOT mean split on the "space" character. This is a special case coded into Perl and is translated into: split on any sequence of whitespace characters, (space,tab,form feed,end of line). So in the first while loop, the split ' ',$line; removes the line ending because it is included in the set of things to split upon. A chomp() before that split would not hurt, but it is not necessary.

The difference between split /\s+/, $line; and split ' ', $line; is that in the second version, any whitespace at the front of the line is removed while in the first version, leading whitespace would cause the first element of @tokens to be a null field. Easier demo'ed than further attempts at english explanations:

use strict; use warnings; use Data::Dumper; my $line = " X Y \tZ A \n"; my @tokens = split ' ', $line; print Dumper \@tokens; @tokens = split /\s+/, $line; print Dumper \@tokens; __END__ $VAR1 = [ #split ' ' version 'X', #note ending removed 'Y', 'Z', 'A' ]; $VAR1 = [ #split /\s+/ version '', #note ending removed 'X', 'Y', 'Z', 'A' ];
I also added "my" to file handle open statements.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1174804]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-03-28 14:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found