|
karey3341 has asked for the wisdom of the Perl Monks concerning the following question:
If I have two input files:
file1:
0: 0,11 5,6 11,2
1: 1,3
3: 0,1 2,2 3,2
5: 3,5 6,1 8,16 9,1
file2:
0: 0,10 4,19
2: 1,3 2,5 6,4
5: 6,10 9,3
How to merge those two files to get the output below:
output file:
0: 0,21 4,19 5,6 11,2
1: 1,3
2: 1,3 2,5 6,4
3: 0,1 2,2 3,2
5: 3,5 6,11 8,16 9,4
I have following code to read input file:
while (<FILE1>)
{next unless s/^(.*?):\s*//;
$word = $1;
for $field (split)
{($site,$count) = split /,/, $field;
$file1 {$word}{$site}=$count;
}
}
Re: merge two files
by GrandFather (Saint) on Apr 17, 2009 at 04:01 UTC
|
You are heading in the right direction. If you wrap the essence of your code in a for loop that iterates over the files you need to process and change the hash access from an assignment to a += then you've done the input part. Generating the output is then a matter of two nested loops. Consider:
use strict;
use warnings;
my $file1 = <<END_FILE1;
0: 0,11 5,6 11,2
1: 1,3
3: 0,1 2,2 3,2
5: 3,5 6,1 8,16 9,1
END_FILE1
my $file2 = <<END_FILE2;
0: 0,10 4,19
2: 1,3 2,5 6,4
5: 6,10 9,3
END_FILE2
my %sites;
for my $source (\$file1, \$file2) {
open my $inFile, '<', $source or die "Failed to open $source: $!\n
+";
while (my $line = <$inFile>) {
chomp $line;
next if ! ($line =~ s/^([^:]+):\s*//);
my $word = $1;
for my $field (split ' ', $line) {
my ($site, $count) = split /,/, $field;
$sites{$word}{$site} += $count;
}
}
}
for my $word (sort {$a <=> $b} keys %sites) {
print "$word: ";
my $wordSites = $sites{$word};
for my $site (sort {$a <=> $b} keys %$wordSites) {
print "$site,$wordSites->{$site} ";
}
print "\n";
}
Prints:
0: 0,21 4,19 5,6 11,2
1: 1,3
2: 1,3 2,5 6,4
3: 0,1 2,2 3,2
5: 3,5 6,11 8,16 9,4
There are a few things you ought do to save yourself time in the future. First off, always use strictures (use strict; use warnings;). Use the three parameter version of open (you didn't did you?) and always test the result of file opens. Use lexical file handles (open my $infile ...).
Note the indentation and bracketing style I've used in my sample. It is pretty common in Perl circles and is probably worth adopting. There is a really good tool called Perl Tidy that is well worth getting if you are at all interested in generating consistently formatted code.
True laziness is hard work
| [reply] [d/l] [select] |
Re: merge two files
by graff (Chancellor) on Apr 17, 2009 at 03:46 UTC
|
Please put <code> and </code> around your perl snippet and data samples.
You have a good start on reading the first file. Now you just need to use the same hash when reading the second file, and do:
$file1{$word}{$site} += $count;
instead of just the straight "=" assignment. That will make sure that when the second file contains a "word" and "site" combination that was also in the first file, you'll be adding together the two "count" values.
You should also start with use strict; and it will make sense to have your file reading logic in a subroutine that you call for each file, just to avoid repeating code unnecessarily:
use strict;
my %data;
my @filenames = ...; # @ARGV or literal strings or whatever
for my $fname ( @filenames ) {
open my $input, "<", $fname or die "$fname: $!";
load_data( $input, \%data );
}
# doing stuff with %data is left as an exercise...
sub load_data
{
my ( $fh, $href ) = @_;
while ( <$fh> ) {
next unless ( s/^(.*?):\s*// );
my $word = $1;
for my $field ( split ) {
my ( $site, $count ) = split /,/, $field;
$$href{$word}{$site} += $count;
}
}
}
(not tested) | [reply] [d/l] [select] |
Re: merge two files
by Utilitarian (Vicar) on Apr 17, 2009 at 09:26 UTC
|
Update, should take my own advice, misread your intention entirely. I didn't realise the "y co-ordinate" needed to be summed
Look at the structure of your data. You want to sort a list of numbers where each list is associated with a unique key
What do you think is the data structure you need to put the data into?
Something like:
%records{$number_before_colon => @list_of_numbers_seperated_by_commas}
So read each file splitting the result into an index and an array and add the array to a hash on the index.
while(<INFILE>){
chomp;
($index,@record)=split/[:,]/;
push @{$records{$index}},@record;
}
When you need to do data conversion, half the battle is examining the relationship between the data structures.
Now you can access the data in a sorted order:
for $index (sort{$a<=>$b}(keys (%records))){
print $OUTFILE "$index: ",join(",",(sort{$a<=>$b}(@{$records{$
+index}}))),"\n";
}
Wrapping it up
#!/usr/bin/perl
use strict;
use warnings;
my @files=qw(temp.data temp1.data temp2.data);
my ($file,$index, %records,$INFILE,$OUTFILE);
for $file (@files){
open($INFILE,"<","$file")|| die "Failed to open $file: $!";
while(<$INFILE>){
my @record;
chomp;
($index,@record)=split/[:,]/;
push @{$records{$index}},@record;
}
close $INFILE;
}
open ($OUTFILE,">","newfile.data")|| die "Failed to open newfile.data:
+ $!";
for $index (sort{$a<=>$b}(keys (%records))){
print $OUTFILE "$index: ",join(",",(sort{$a<=>$b}(@{$records{$
+index}}))),"\n";
}
close $OUTFILE
| [reply] [d/l] [select] |
|
|