Multiple File handling and merging records from 2 files

kris1511 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Multiple File handling and merging records from 2 files by stevieb (Canon) on Aug 10, 2017 at 18:26 UTC
Yes, that's definitely one way to populate a nested hash. Here's a full-blown example that includes reading the files. I've written the example much more verbosely than I would normally in hopes it makes it more clear. This does not go on with sorting and lookups, just purely how to populate a hash from the files. `use warnings; use strict; use Data::Dumper; my %app_map; open my $fh1, '<', 'app.txt' or die $!; while (my $line = <$fh1>){ my ($app, $mem, $lang) = split /\s+/, $line; $app_map{$app}{mem} = $mem; $app_map{$app}{lang} = $lang; } close $fh1 or die $!; open my $fh2, '<', 'app2.txt' or die $!; while (my $line = <$fh2>){ my ($app, $cpu, $cores) = split /\s+/, $line; $app_map{$app}{cpu} = $cpu; $app_map{$app}{cores} = $cores; } close $fh2 or die $!; print Dumper \%app_map;` [download] ...and given the two files look like this (`app.txt`): `App1 4 Perl App2 8 Java App3 8 Java App4 4 PHP App5 8 C#` [download] ...and `app2.txt`: `App1 1.5 2 App2 2.5 4 App3 2.8 4 App4 2.8 2 App5 2.8 2` [download] Output: `$VAR1 = { 'App5' => { 'lang' => 'C#', 'cores' => '2', 'cpu' => '2.8', 'mem' => '8' }, 'App1' => { 'cores' => '2', 'cpu' => '1.5', 'mem' => '4', 'lang' => 'Perl' }, 'App4' => { 'cpu' => '2.8', 'cores' => '2', 'mem' => '4', 'lang' => 'PHP' }, 'App3' => { 'cpu' => '2.8', 'cores' => '4', 'mem' => '8', 'lang' => 'Java' }, 'App2' => { 'mem' => '8', 'cores' => '4', 'cpu' => '2.5', 'lang' => 'Java' } };` [download]	[reply] [d/l] [select]
Re: Multiple File handling and merging records from 2 files by johngg (Canon) on Aug 10, 2017 at 18:34 UTC
A self-contained example solution using the HoH you were thinking of. johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' open my $file1FH, q{<}, \ <<__EOD1__ or die $!; App1,4,Perl App2,8,Java App3,8,Java App4,4,PHP App5,8,C# __EOD1__ my %apps; while ( <$file1FH> ) { chomp; my( $key, $mem, $lang ) = split m{,}; $apps{ $key }->{ mem } = $mem, $apps{ $key }->{ lang } = $lang; } close $file1FH or die $!; open my $file2FH, q{<}, \ <<__EOD2__ or die $!; App1,1.5,2 App2,2.5,4 App3,2.8,4 App4,2.8,2 App5,2.8,2 __EOD2__ while ( <$file2FH> ) { chomp; my( $key, $cpu, $cores ) = split m{,}; $apps{ $key }->{ cpu } = $cpu; $apps{ $key }->{ cores } = $cores; } say for sort { $apps{ $a }->{ mem } <=> $apps{ $b }->{ mem } } grep { $apps{ $_ }->{ cores } eq q{2} } sort keys %apps;' App1 App4 App5 [download] I hope this is useful. Update: Corrected spelling mistake and added the following note:- The reason I sort the keys before passing them into the grep is that the order returned by the keys function is essentially random. Since Perl's sort is "stable" it would not change the order in which "App1" and "App4" appeared as they have the same memory value so a result of `App4 App1 App5` [download] could happen. Cheers, JohnGG	[reply] [d/l] [select]
Re: Multiple File handling and merging records from 2 files by Laurent_R (Canon) on Aug 10, 2017 at 22:00 UTC
Hi kris1511, You can use nested hash if you want to. But it can be quite simpler (and faster is your dataset is large). Think about it. You're interested only with apps that have two cores (I guess you meant two cores, not two CPUs). Start by reading the second file and just make a list with the apps with two cores. I would use a simple hash for that list (with the app name/number as key, and whatever (say 1) as a temporary value. Then read the first file, just discard the apps not in the hash and populate the values in the hash with the size. Finally sort the hash by its values.	[reply]
Re: Multiple File handling and merging records from 2 files by Marshall (Canon) on Aug 11, 2017 at 03:06 UTC
To demo one thing about HoH structures: Checking for the existence of a 2D hash key can result in the auto-vivivication of the first dimension. I don't think that this will matter in your application, but be aware that this can and does happen: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; ## This shows that if you test a multi-dimensional hash, ## Perl will "auto-vivify" the first dimension in the ## process of checking if the 2nd dimension exists. my %apps; $apps{"App1"}{Memory} = 4; $apps{"App1"}{Language} = 'Perl'; $apps{"App1"}{CPU} = '1.4'; $apps{"App1"}{Cores} = '2'; print "No App8 Cores\n" if !exists $apps{App8}{Cores}; print Dumper \%apps; __END__ No App8 Cores $VAR1 = { 'App8' => {}, <-- the exists() created this!!! 'App1' => { 'Memory' => 4, 'Cores' => '2', 'Language' => 'Perl', 'CPU' => '1.4' } }; [download] Instead of a HoH, of course another way is a HoA. This idea can more closely follow your input data format as csv files. Which is better, is debatable. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Data::Dump qw/pp/; use constant { MEMORY => 0, LANGUAGE => 1, CPU => 2, CORES => 3, }; my $file1 =<<FILE1; App1, 4 Perl App2 8 Java App3 ,8 Java App4,4 PHP App5 8 C# FILE1 my $file2 =<<FILE2; App1,1.5 2 App2 2.5 4 App3 2.8 4 App4 2.8 2 App5 2.8 2 FILE2 my %apps; # There is certainly some error handling code missing # but an obvious way to make a Hash of Array from these 2 files # that maintains a "csv" array presentation... open my $file, '<', \$file1 or die "$!"; while (my $line = <$file>) { my ($app, @tokens) = split /[,\s]+/,$line; push @{$apps{$app}},@tokens } open $file, '<', \$file2 or die "$!"; while (my $line = <$file>) { my ($app, @tokens) = split /[,\s]+/,$line; push @{$apps{$app}},@tokens } close $file; # Op's requirement: # Need to print name of apps that have CPU value 2, sorted by their # memory size e.g App1, App4 and App5 # pp \%apps; ### UNCOMMENT this to see the Hash of Array # filter using grep to find the apps that use only 2 CORES my @core2Apps = grep{$apps{$_}[CORES]==2} keys %apps; # Now sort the results my @sortedAppsbyMemory = sort{$apps{$a}[MEMORY] <=> $apps{$b}[MEMORY] +}@core2Apps; foreach my $application (@sortedAppsbyMemory) { print "$application @{$apps{$application}}\n"; } __END__ Prints: App4 4 PHP 2.8 2 App1 4 Perl 1.5 2 App5 8 C# 2.8 2 Note: App1 and App4 are equal in terms of CORES and MEMORY. Their order is unpredictable without further specification or considerations involving previous sorts. Above App4 sorts above App1. How to force App1 to be above App4 in all situations is an exercise left to the reader. [download]	[reply] [d/l] [select]
Re: Multiple File handling and merging records from 2 files by kcott (Archbishop) on Aug 11, 2017 at 06:18 UTC
G'day kris1511, "Is this the right way to read file into nested hash ?" Yes, that's a perfectly legitimate and acceptable way to represent that data. "Not able to wrap my head around the problem" Whenever you have a problem involving CSV (including tab-, pipe-, other-separated data) reach for Text::CSV in the first instance. It will generally do what you want and it's already solved most of the problem cases you're likely to encounter with this type of data. It's also well documented. If you also have Text::CSV_XS installed, it will run faster. Here's the guts of the code to do what you want. #!/usr/bin/env perl -l use strict; use warnings; use Text::CSV; use Inline::Files; use Data::Dump; my %data; my $csv = Text::CSV::->new; while (my $row = $csv->getline(\CSV1)) { @{$data{$row->[0]}}{qw{mem lang}} = @$row[1,2]; } while (my $row = $csv->getline(\CSV2)) { @{$data{$row->[0]}}{qw{cpu cores}} = @$row[1,2]; } print 'CSV data merged into hash:'; dd \%data; print 'Query data: apps with 2 cores:'; print $_ for grep { $data{$_}{cores} == 2 } sort keys %data; __CSV1__ App1,4,Perl App2,8,Java App3,8,Java App4,4,PHP App5,8,C# __CSV2__ App1,1.5,2 App2,2.5,4 App3,2.8,4 App4,2.8,2 App5,2.8,2 [download] Output: `CSV data merged into hash: { App1 => { cores => 2, cpu => 1.5, lang => "Perl", mem => 4 }, App2 => { cores => 4, cpu => 2.5, lang => "Java", mem => 8 }, App3 => { cores => 4, cpu => 2.8, lang => "Java", mem => 8 }, App4 => { cores => 2, cpu => 2.8, lang => "PHP", mem => 4 }, App5 => { cores => 2, cpu => 2.8, lang => "C#", mem => 8 }, } Query data: apps with 2 cores: App1 App4 App5` [download] I've used Inline::Files for demonstaration purposes. You'll probably want to open disk files; replacing "`\CSV1`" with something like "`$csv_fh1`" (ditto for "`\CSV2`"). You may want to look at "perldata: Slices" if you don't recognise the syntax within the `while` loops. If you have a recent version of Perl, look at "perlref: Postfix Reference Slicing": I find that syntax is easier to read and less easy to make mistakes with - you may too. See also Data::Dump if you're unfamiliar with that module. — Ken	[reply] [d/l] [select]
Re: Multiple File handling and merging records from 2 files by Not_a_Number (Prior) on Aug 10, 2017 at 18:23 UTC
None of your apps has CPU value 2. Do you mean "apps that have cores value 2"? Or something else?	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


XP is just a number
	PerlMonks