Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Multiple File handling and merging records from 2 files

by kris1511 (Acolyte)
on Aug 10, 2017 at 17:58 UTC ( [id://1197196]=perlquestion: print w/replies, xml ) Need Help??

kris1511 has asked for the wisdom of the Perl Monks concerning the following question:

I am doing questions from different sites. I came accross this one, not sure how to go about it There are 2 csv files that have following information File 1:
Application Memory (GB) Language
App1 4 Perl
App2 8 Java
App3 8 Java
App4 4 PHP
App5 8 C#

File 2:
Application CPU Cores
App1 1.5 2
App2 2.5 4
App3 2.8 4
App4 2.8 2
App5 2.8 2
Need to print name of apps that have CPU value 2, sorted by their memory size e.g App1, App4 and App5 I was thinking of using nested hash to read the file elements
my %apps; $apps{"App1"}{Memory} = 4; $apps{"App1"}{Language} = 'Perl'; $apps{"App1"}{CPU} = '1.4'; $apps{"App1"}{Cores} = '2';
.... So on. Is this the right way to read file into nested hash ? Not able to wrap my head around the problem

Replies are listed 'Best First'.
Re: Multiple File handling and merging records from 2 files
by stevieb (Canon) on Aug 10, 2017 at 18:26 UTC

    Yes, that's definitely one way to populate a nested hash. Here's a full-blown example that includes reading the files. I've written the example much more verbosely than I would normally in hopes it makes it more clear. This does not go on with sorting and lookups, just purely how to populate a hash from the files.

    use warnings; use strict; use Data::Dumper; my %app_map; open my $fh1, '<', 'app.txt' or die $!; while (my $line = <$fh1>){ my ($app, $mem, $lang) = split /\s+/, $line; $app_map{$app}{mem} = $mem; $app_map{$app}{lang} = $lang; } close $fh1 or die $!; open my $fh2, '<', 'app2.txt' or die $!; while (my $line = <$fh2>){ my ($app, $cpu, $cores) = split /\s+/, $line; $app_map{$app}{cpu} = $cpu; $app_map{$app}{cores} = $cores; } close $fh2 or die $!; print Dumper \%app_map;

    ...and given the two files look like this (app.txt):

    App1 4 Perl App2 8 Java App3 8 Java App4 4 PHP App5 8 C#

    ...and app2.txt:

    App1 1.5 2 App2 2.5 4 App3 2.8 4 App4 2.8 2 App5 2.8 2

    Output:

    $VAR1 = { 'App5' => { 'lang' => 'C#', 'cores' => '2', 'cpu' => '2.8', 'mem' => '8' }, 'App1' => { 'cores' => '2', 'cpu' => '1.5', 'mem' => '4', 'lang' => 'Perl' }, 'App4' => { 'cpu' => '2.8', 'cores' => '2', 'mem' => '4', 'lang' => 'PHP' }, 'App3' => { 'cpu' => '2.8', 'cores' => '4', 'mem' => '8', 'lang' => 'Java' }, 'App2' => { 'mem' => '8', 'cores' => '4', 'cpu' => '2.5', 'lang' => 'Java' } };
Re: Multiple File handling and merging records from 2 files
by johngg (Canon) on Aug 10, 2017 at 18:34 UTC

    A self-contained example solution using the HoH you were thinking of.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' open my $file1FH, q{<}, \ <<__EOD1__ or die $!; App1,4,Perl App2,8,Java App3,8,Java App4,4,PHP App5,8,C# __EOD1__ my %apps; while ( <$file1FH> ) { chomp; my( $key, $mem, $lang ) = split m{,}; $apps{ $key }->{ mem } = $mem, $apps{ $key }->{ lang } = $lang; } close $file1FH or die $!; open my $file2FH, q{<}, \ <<__EOD2__ or die $!; App1,1.5,2 App2,2.5,4 App3,2.8,4 App4,2.8,2 App5,2.8,2 __EOD2__ while ( <$file2FH> ) { chomp; my( $key, $cpu, $cores ) = split m{,}; $apps{ $key }->{ cpu } = $cpu; $apps{ $key }->{ cores } = $cores; } say for sort { $apps{ $a }->{ mem } <=> $apps{ $b }->{ mem } } grep { $apps{ $_ }->{ cores } eq q{2} } sort keys %apps;' App1 App4 App5

    I hope this is useful.

    Update: Corrected spelling mistake and added the following note:-

    The reason I sort the keys before passing them into the grep is that the order returned by the keys function is essentially random. Since Perl's sort is "stable" it would not change the order in which "App1" and "App4" appeared as they have the same memory value so a result of

    App4 App1 App5

    could happen.

    Cheers,

    JohnGG

Re: Multiple File handling and merging records from 2 files
by Laurent_R (Canon) on Aug 10, 2017 at 22:00 UTC
    Hi kris1511,

    You can use nested hash if you want to. But it can be quite simpler (and faster is your dataset is large).

    Think about it. You're interested only with apps that have two cores (I guess you meant two cores, not two CPUs). Start by reading the second file and just make a list with the apps with two cores. I would use a simple hash for that list (with the app name/number as key, and whatever (say 1) as a temporary value.

    Then read the first file, just discard the apps not in the hash and populate the values in the hash with the size. Finally sort the hash by its values.

Re: Multiple File handling and merging records from 2 files
by Marshall (Canon) on Aug 11, 2017 at 03:06 UTC
    To demo one thing about HoH structures: Checking for the existence of a 2D hash key can result in the auto-vivivication of the first dimension. I don't think that this will matter in your application, but be aware that this can and does happen:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; ## This shows that if you test a multi-dimensional hash, ## Perl will "auto-vivify" the first dimension in the ## process of checking if the 2nd dimension exists. my %apps; $apps{"App1"}{Memory} = 4; $apps{"App1"}{Language} = 'Perl'; $apps{"App1"}{CPU} = '1.4'; $apps{"App1"}{Cores} = '2'; print "No App8 Cores\n" if !exists $apps{App8}{Cores}; print Dumper \%apps; __END__ No App8 Cores $VAR1 = { 'App8' => {}, <-- the exists() created this!!! 'App1' => { 'Memory' => 4, 'Cores' => '2', 'Language' => 'Perl', 'CPU' => '1.4' } };
    Instead of a HoH, of course another way is a HoA. This idea can more closely follow your input data format as csv files. Which is better, is debatable.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Data::Dump qw/pp/; use constant { MEMORY => 0, LANGUAGE => 1, CPU => 2, CORES => 3, }; my $file1 =<<FILE1; App1, 4 Perl App2 8 Java App3 ,8 Java App4,4 PHP App5 8 C# FILE1 my $file2 =<<FILE2; App1,1.5 2 App2 2.5 4 App3 2.8 4 App4 2.8 2 App5 2.8 2 FILE2 my %apps; # There is certainly some error handling code missing # but an obvious way to make a Hash of Array from these 2 files # that maintains a "csv" array presentation... open my $file, '<', \$file1 or die "$!"; while (my $line = <$file>) { my ($app, @tokens) = split /[,\s]+/,$line; push @{$apps{$app}},@tokens } open $file, '<', \$file2 or die "$!"; while (my $line = <$file>) { my ($app, @tokens) = split /[,\s]+/,$line; push @{$apps{$app}},@tokens } close $file; # Op's requirement: # Need to print name of apps that have CPU value 2, sorted by their # memory size e.g App1, App4 and App5 # pp \%apps; ### UNCOMMENT this to see the Hash of Array # filter using grep to find the apps that use only 2 CORES my @core2Apps = grep{$apps{$_}[CORES]==2} keys %apps; # Now sort the results my @sortedAppsbyMemory = sort{$apps{$a}[MEMORY] <=> $apps{$b}[MEMORY] +}@core2Apps; foreach my $application (@sortedAppsbyMemory) { print "$application @{$apps{$application}}\n"; } __END__ Prints: App4 4 PHP 2.8 2 App1 4 Perl 1.5 2 App5 8 C# 2.8 2 Note: App1 and App4 are equal in terms of CORES and MEMORY. Their order is unpredictable without further specification or considerations involving previous sorts. Above App4 sorts above App1. How to force App1 to be above App4 in all situations is an exercise left to the reader.
Re: Multiple File handling and merging records from 2 files
by kcott (Archbishop) on Aug 11, 2017 at 06:18 UTC

    G'day kris1511,

    "Is this the right way to read file into nested hash ?"

    Yes, that's a perfectly legitimate and acceptable way to represent that data.

    "Not able to wrap my head around the problem"

    Whenever you have a problem involving CSV (including tab-, pipe-, other-separated data) reach for Text::CSV in the first instance. It will generally do what you want and it's already solved most of the problem cases you're likely to encounter with this type of data. It's also well documented. If you also have Text::CSV_XS installed, it will run faster.

    Here's the guts of the code to do what you want.

    #!/usr/bin/env perl -l use strict; use warnings; use Text::CSV; use Inline::Files; use Data::Dump; my %data; my $csv = Text::CSV::->new; while (my $row = $csv->getline(\*CSV1)) { @{$data{$row->[0]}}{qw{mem lang}} = @$row[1,2]; } while (my $row = $csv->getline(\*CSV2)) { @{$data{$row->[0]}}{qw{cpu cores}} = @$row[1,2]; } print 'CSV data merged into hash:'; dd \%data; print 'Query data: apps with 2 cores:'; print $_ for grep { $data{$_}{cores} == 2 } sort keys %data; __CSV1__ App1,4,Perl App2,8,Java App3,8,Java App4,4,PHP App5,8,C# __CSV2__ App1,1.5,2 App2,2.5,4 App3,2.8,4 App4,2.8,2 App5,2.8,2

    Output:

    CSV data merged into hash: { App1 => { cores => 2, cpu => 1.5, lang => "Perl", mem => 4 }, App2 => { cores => 4, cpu => 2.5, lang => "Java", mem => 8 }, App3 => { cores => 4, cpu => 2.8, lang => "Java", mem => 8 }, App4 => { cores => 2, cpu => 2.8, lang => "PHP", mem => 4 }, App5 => { cores => 2, cpu => 2.8, lang => "C#", mem => 8 }, } Query data: apps with 2 cores: App1 App4 App5

    I've used Inline::Files for demonstaration purposes. You'll probably want to open disk files; replacing "\*CSV1" with something like "$csv_fh1" (ditto for "\*CSV2").

    You may want to look at "perldata: Slices" if you don't recognise the syntax within the while loops. If you have a recent version of Perl, look at "perlref: Postfix Reference Slicing": I find that syntax is easier to read and less easy to make mistakes with - you may too.

    See also Data::Dump if you're unfamiliar with that module.

    — Ken

Re: Multiple File handling and merging records from 2 files
by Not_a_Number (Prior) on Aug 10, 2017 at 18:23 UTC

    None of your apps has CPU value 2.

    Do you mean "apps that have cores value 2"?

    Or something else?

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1197196]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-18 03:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found