kris1511 has asked for the wisdom of the Perl Monks concerning the following question:
I am doing questions from different sites. I came accross this one, not sure how to go about it
There are 2 csv files that have following information
File 1:
Application |
Memory (GB) |
Language |
App1 |
4 |
Perl |
App2 |
8 |
Java |
App3 |
8 |
Java |
App4 |
4 |
PHP |
App5 |
8 |
C# |
File 2:
Application |
CPU |
Cores |
App1 |
1.5 |
2 |
App2 |
2.5 |
4 |
App3 |
2.8 |
4 |
App4 |
2.8 |
2 |
App5 |
2.8 |
2 |
Need to print name of apps that have CPU value 2, sorted by their memory size
e.g App1, App4 and App5
I was thinking of using nested hash to read the file elements
my %apps;
$apps{"App1"}{Memory} = 4;
$apps{"App1"}{Language} = 'Perl';
$apps{"App1"}{CPU} = '1.4';
$apps{"App1"}{Cores} = '2';
.... So on. Is this the right way to read file into nested hash ?
Not able to wrap my head around the problem
Re: Multiple File handling and merging records from 2 files
by stevieb (Canon) on Aug 10, 2017 at 18:26 UTC
|
Yes, that's definitely one way to populate a nested hash. Here's a full-blown example that includes reading the files. I've written the example much more verbosely than I would normally in hopes it makes it more clear. This does not go on with sorting and lookups, just purely how to populate a hash from the files.
use warnings;
use strict;
use Data::Dumper;
my %app_map;
open my $fh1, '<', 'app.txt' or die $!;
while (my $line = <$fh1>){
my ($app, $mem, $lang) = split /\s+/, $line;
$app_map{$app}{mem} = $mem;
$app_map{$app}{lang} = $lang;
}
close $fh1 or die $!;
open my $fh2, '<', 'app2.txt' or die $!;
while (my $line = <$fh2>){
my ($app, $cpu, $cores) = split /\s+/, $line;
$app_map{$app}{cpu} = $cpu;
$app_map{$app}{cores} = $cores;
}
close $fh2 or die $!;
print Dumper \%app_map;
...and given the two files look like this (app.txt):
App1 4 Perl
App2 8 Java
App3 8 Java
App4 4 PHP
App5 8 C#
...and app2.txt:
App1 1.5 2
App2 2.5 4
App3 2.8 4
App4 2.8 2
App5 2.8 2
Output:
$VAR1 = {
'App5' => {
'lang' => 'C#',
'cores' => '2',
'cpu' => '2.8',
'mem' => '8'
},
'App1' => {
'cores' => '2',
'cpu' => '1.5',
'mem' => '4',
'lang' => 'Perl'
},
'App4' => {
'cpu' => '2.8',
'cores' => '2',
'mem' => '4',
'lang' => 'PHP'
},
'App3' => {
'cpu' => '2.8',
'cores' => '4',
'mem' => '8',
'lang' => 'Java'
},
'App2' => {
'mem' => '8',
'cores' => '4',
'cpu' => '2.5',
'lang' => 'Java'
}
};
| [reply] [d/l] [select] |
Re: Multiple File handling and merging records from 2 files
by johngg (Canon) on Aug 10, 2017 at 18:34 UTC
|
johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E '
open my $file1FH, q{<}, \ <<__EOD1__ or die $!;
App1,4,Perl
App2,8,Java
App3,8,Java
App4,4,PHP
App5,8,C#
__EOD1__
my %apps;
while ( <$file1FH> )
{
chomp;
my( $key, $mem, $lang ) = split m{,};
$apps{ $key }->{ mem } = $mem,
$apps{ $key }->{ lang } = $lang;
}
close $file1FH or die $!;
open my $file2FH, q{<}, \ <<__EOD2__ or die $!;
App1,1.5,2
App2,2.5,4
App3,2.8,4
App4,2.8,2
App5,2.8,2
__EOD2__
while ( <$file2FH> )
{
chomp;
my( $key, $cpu, $cores ) = split m{,};
$apps{ $key }->{ cpu } = $cpu;
$apps{ $key }->{ cores } = $cores;
}
say for
sort { $apps{ $a }->{ mem } <=> $apps{ $b }->{ mem } }
grep { $apps{ $_ }->{ cores } eq q{2} }
sort keys %apps;'
App1
App4
App5
I hope this is useful.
Update: Corrected spelling mistake and added the following note:-
The reason I sort the keys before passing them into the grep is that the order returned by the keys function is essentially random. Since Perl's sort is "stable" it would not change the order in which "App1" and "App4" appeared as they have the same memory value so a result of
App4
App1
App5
could happen.
| [reply] [d/l] [select] |
Re: Multiple File handling and merging records from 2 files
by Laurent_R (Canon) on Aug 10, 2017 at 22:00 UTC
|
Hi kris1511,
You can use nested hash if you want to. But it can be quite simpler (and faster is your dataset is large).
Think about it. You're interested only with apps that have two cores (I guess you meant two cores, not two CPUs). Start by reading the second file and just make a list with the apps with two cores. I would use a simple hash for that list (with the app name/number as key, and whatever (say 1) as a temporary value.
Then read the first file, just discard the apps not in the hash and populate the values in the hash with the size. Finally sort the hash by its values.
| [reply] |
Re: Multiple File handling and merging records from 2 files
by Marshall (Canon) on Aug 11, 2017 at 03:06 UTC
|
To demo one thing about HoH structures:
Checking for the existence of a 2D hash key can result in the auto-vivivication of the first dimension. I don't think that this will matter in your application, but be aware that this can and does happen:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
## This shows that if you test a multi-dimensional hash,
## Perl will "auto-vivify" the first dimension in the
## process of checking if the 2nd dimension exists.
my %apps;
$apps{"App1"}{Memory} = 4;
$apps{"App1"}{Language} = 'Perl';
$apps{"App1"}{CPU} = '1.4';
$apps{"App1"}{Cores} = '2';
print "No App8 Cores\n" if !exists $apps{App8}{Cores};
print Dumper \%apps;
__END__
No App8 Cores
$VAR1 = {
'App8' => {}, <-- the exists() created this!!!
'App1' => {
'Memory' => 4,
'Cores' => '2',
'Language' => 'Perl',
'CPU' => '1.4'
}
};
Instead of a HoH, of course another way is a HoA. This idea can more closely follow your input data format as csv files. Which is better, is debatable.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Data::Dump qw/pp/;
use constant { MEMORY => 0,
LANGUAGE => 1,
CPU => 2,
CORES => 3,
};
my $file1 =<<FILE1;
App1, 4 Perl
App2 8 Java
App3 ,8 Java
App4,4 PHP
App5 8 C#
FILE1
my $file2 =<<FILE2;
App1,1.5 2
App2 2.5 4
App3 2.8 4
App4 2.8 2
App5 2.8 2
FILE2
my %apps;
# There is certainly some error handling code missing
# but an obvious way to make a Hash of Array from these 2 files
# that maintains a "csv" array presentation...
open my $file, '<', \$file1 or die "$!";
while (my $line = <$file>)
{
my ($app, @tokens) = split /[,\s]+/,$line;
push @{$apps{$app}},@tokens
}
open $file, '<', \$file2 or die "$!";
while (my $line = <$file>)
{
my ($app, @tokens) = split /[,\s]+/,$line;
push @{$apps{$app}},@tokens
}
close $file;
# Op's requirement:
# Need to print name of apps that have CPU value 2, sorted by their
# memory size e.g App1, App4 and App5
# pp \%apps; ### UNCOMMENT this to see the Hash of Array
# filter using grep to find the apps that use only 2 CORES
my @core2Apps = grep{$apps{$_}[CORES]==2} keys %apps;
# Now sort the results
my @sortedAppsbyMemory = sort{$apps{$a}[MEMORY] <=> $apps{$b}[MEMORY]
+}@core2Apps;
foreach my $application (@sortedAppsbyMemory)
{
print "$application @{$apps{$application}}\n";
}
__END__
Prints:
App4 4 PHP 2.8 2
App1 4 Perl 1.5 2
App5 8 C# 2.8 2
Note: App1 and App4 are equal in terms of CORES and MEMORY.
Their order is unpredictable without further specification or
considerations involving previous sorts. Above App4 sorts above
App1. How to force App1 to be above App4 in all situations is
an exercise left to the reader.
| [reply] [d/l] [select] |
Re: Multiple File handling and merging records from 2 files
by kcott (Archbishop) on Aug 11, 2017 at 06:18 UTC
|
G'day kris1511,
"Is this the right way to read file into nested hash ?"
Yes, that's a perfectly legitimate and acceptable way to represent that data.
"Not able to wrap my head around the problem"
Whenever you have a problem involving CSV (including tab-, pipe-, other-separated data)
reach for Text::CSV in the first instance.
It will generally do what you want
and it's already solved most of the problem cases you're likely to encounter with this type of data.
It's also well documented. If you also have Text::CSV_XS
installed, it will run faster.
Here's the guts of the code to do what you want.
#!/usr/bin/env perl -l
use strict;
use warnings;
use Text::CSV;
use Inline::Files;
use Data::Dump;
my %data;
my $csv = Text::CSV::->new;
while (my $row = $csv->getline(\*CSV1)) {
@{$data{$row->[0]}}{qw{mem lang}} = @$row[1,2];
}
while (my $row = $csv->getline(\*CSV2)) {
@{$data{$row->[0]}}{qw{cpu cores}} = @$row[1,2];
}
print 'CSV data merged into hash:';
dd \%data;
print 'Query data: apps with 2 cores:';
print $_ for grep { $data{$_}{cores} == 2 } sort keys %data;
__CSV1__
App1,4,Perl
App2,8,Java
App3,8,Java
App4,4,PHP
App5,8,C#
__CSV2__
App1,1.5,2
App2,2.5,4
App3,2.8,4
App4,2.8,2
App5,2.8,2
Output:
CSV data merged into hash:
{
App1 => { cores => 2, cpu => 1.5, lang => "Perl", mem => 4 },
App2 => { cores => 4, cpu => 2.5, lang => "Java", mem => 8 },
App3 => { cores => 4, cpu => 2.8, lang => "Java", mem => 8 },
App4 => { cores => 2, cpu => 2.8, lang => "PHP", mem => 4 },
App5 => { cores => 2, cpu => 2.8, lang => "C#", mem => 8 },
}
Query data: apps with 2 cores:
App1
App4
App5
I've used Inline::Files for demonstaration purposes.
You'll probably want to open disk files;
replacing "\*CSV1" with something like "$csv_fh1" (ditto for "\*CSV2").
You may want to look at "perldata: Slices" if you
don't recognise the syntax within the while loops.
If you have a recent version of Perl, look at
"perlref: Postfix Reference Slicing":
I find that syntax is easier to read and less easy to make mistakes with - you may too.
See also Data::Dump if you're unfamiliar with that module.
| [reply] [d/l] [select] |
Re: Multiple File handling and merging records from 2 files
by Not_a_Number (Prior) on Aug 10, 2017 at 18:23 UTC
|
None of your apps has CPU value 2.
Do you mean "apps that have cores value 2"?
Or something else?
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|