|
jaypal has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks!
I am practicing perl by picking up text parsing questions from Stack Overflow. I came across a question where I had to transpose column such that following dataset
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
would become
id1 name1 cat1:catname1, cat2:catname2
id2 name2 cat3:catname3
id3 name3 cat1:catname1, cat4:catname4
I wrote a perl script, and even though it works as expected, I am in need of some wisdom. I wanted to know if there is a more idiomatic way of doing:
my @val = split(" ",$val);
my $v = join (":", @val);
and
if (exists $hash{$key}) {
$hash{$key} .= $sep.$v;
}
else {
$hash{$key} = $v;
}
Here is my complete perl script. Any advice and or suggestion to improve this will be greatly appreciated. This is my first post here at perlmonks, so if I missed on any crucial details, I apologize for that.
Looking forward to be hearing from perlmonks.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash;
my $sep = ", ";
while (<DATA>) {
chomp;
(my $key, my $val) = /(\S*\s*\S*\s*)(\S*\s*\S*\s*)/;
my @val = split(" ",$val);
my $v = join (":", @val);
if (exists $hash{$key}) {
$hash{$key} .= $sep.$v;
}
else {
$hash{$key} = $v;
}
}
foreach my $k (sort keys %hash) {
print $k.$hash{$k}."\n";
}
#print Dumper \%h;
__DATA__
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
I could have used better variable names but this was just for practice so didn't really focus on that part.
Re: Appending values to existing hash key
by GrandFather (Saint) on Mar 21, 2014 at 03:08 UTC
|
A number of minor things, most of which are personal preference:
- I tend to use statement modifiers where there would only otherwise be one small statement in a controlled block.
- remove the trailing white space from the captured strings and require at least one of each character type in the regex to avoid some nasty edge cases
- Avoid intermediate variables where the calculation is trivial (@val for example)
- Use double quote interpolation instead of concatenation
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash;
while (<DATA>) {
(my $key, my $value) = /(\S+\s+\S+)\s+(\S+\s+\S+)/;
$value = join ":", split " ", $value;
$hash{$key} .= ", " if exists $hash{$key};
$hash{$key} .= $value;
}
print "$_ $hash{$_}\n" for sort keys %hash;
__DATA__
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
Prints:
id1 name1 cat1:catname1, cat2:catname2
id2 name2 cat3:catname3
id3 name3 cat1:catname1, cat4:catname4
If the code changes take longer than the time saved, it's fast enough already.
| [reply] [d/l] [select] |
|
|
$value = join ":", split " ", $value; # This is awesome
$hash{$key} .= ", " if exists $hash{$key}; # I was thinking more i
+n lines of a ternary op but this is short and concise too
$hash{$key} .= $value;
}
print "$_ $hash{$_}\n" for sort keys %hash; # I definitely like this a
+pproach
Thank you once again! | [reply] [d/l] |
Re: Appending values to existing hash key
by NetWallah (Canon) on Mar 21, 2014 at 03:18 UTC
|
Arguably more idomatic for the split/join:
$val=~s/\s+/:/g;
But this modifies the $val.
For the second question of appending to hash values, I'd suggest a better structure would be a hash of arrays"
push @{ $hash{$key} }, $v;
That way, you do not need to check for existance - it is auto-vivified.
The difference is - for output, you will have to "join" with $sep:
... output for each $key ....
join $sep,@{ $hash{$key} }
What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
-Larry Wall, 1992
| [reply] [d/l] [select] |
|
|
chomp;
(my $k, my @v) = /(\S*\s*\S*\s*)(\S*\s*\S*\s*)/;
push @{$h{$k}} , shift @v;
But dropped the idea because I had to iterate twice:
foreach my $key (sort keys %h) {
print "$key";
foreach my $a (@{$h{$key}}) {
print "$a ";
}
print "\n";
}
But I see how in a bigger data set this should be the way to go. Checking for existence in a larger hash could be counter productive. | [reply] [d/l] [select] |
|
|
use warnings;
use strict;
my %hash;
while (<DATA>) {
my @data = split;
push @{ $hash{ $data[0] }{ $data[1] } } => "$data[2]: $data[3]";
}
for my $key ( sort keys %hash ) {
print $key, ' ',
join( ', ' => map { $_, @{ $hash{$key}{$_} } } keys %{ $hash{$ke
+y} } ), $/;
}
__DATA__
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
UPDATE:
Code updated as indicated by NetWallah.
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [d/l] |
|
|
|
|
|
|
|
|
| [reply] |
|
|
| [reply] |
Re: Appending values to existing hash key
by wind (Priest) on Mar 21, 2014 at 02:10 UTC
|
One potential alternative, ensuring keys stay in the same order:
#!/usr/bin/perl
use strict;
use warnings;
my @keys;
my %vals;
while (<DATA>) {
chomp;
my @data = split;
my $key = join ' ', splice @data, 0, 2;
push @keys, $key if !$vals{$key};
push @{$vals{$key}}, join ':', @data;
}
for (@keys) {
print "$_ " . join(', ', @{$vals{$_}}) . "\n";
}
__DATA__
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
Outputs:
id1 name1 cat1:catname1, cat2:catname2
id2 name2 cat3:catname3
id3 name3 cat1:catname1, cat4:catname4
- Miller | [reply] [d/l] [select] |
|
|
| [reply] |
Re: Appending values to existing hash key
by graff (Chancellor) on Mar 21, 2014 at 03:52 UTC
|
$val =~ s/\s+/:/g
As for the other snippet, you could use a hash of arrays instead of a hash of concatenated strings, then join the array elements for each hash key after the whole set has been loaded. Here are the relevant loops (with a few other changes):
my %hash;
while (<DATA>) {
s/\s+$//; # remove all trailing whitespace (not just newline)
s/(\S+\s+\S+\s+)//; # remove (and capture) first two fields
my $key = $1;
s/\s+/:/g; # join remaining fields with ":"
push @{$hash{$key}}, $_;
}
foreach my $k (sort keys %hash) {
print $k.join( ', ', @{$hash{$k}} )."\n";
}
There's nothing wrong with the way you originally did it - it works, it's clear, it's not wasteful, and that's all good. | [reply] [d/l] [select] |
|
|
| [reply] |
Re: Appending values to existing hash key
by kcott (Archbishop) on Mar 21, 2014 at 05:50 UTC
|
G'day jaypal,
Welcome to the monastery.
Probably the only part of this code that I would seriously question is the regex:
-
Instead of matching zero or more times throughout (i.e. with '*'),
matching one or more times throughout (i.e. with '+') may be more appropriate.
-
Would '\w+' be a better choice than '\S+':
both exhibit the same behaviour with the sample data shown but the former might be more useful in finding bad data (in a real application).
-
The final '\s*' is questionable.
Is that something you really want to capture?
Also, note that leaving that out of the regex means that you don't need the chomp on the preceding line.
Your sample data could also have a little more description.
The code suggests that spaces are significant in some places and not in others:
-
Spaces between idN & nameN and nameN & catN are significant for the hash key.
Is this correct?
Should records starting with 'id1 name1' and 'id1 name1' use different hash keys?
-
Spaces between catN & catnameN and after catnameN are not significant for the output.
Do you need to capture these?
[This is slightly off-topic for your specific questions; however, if you're parsing fixed-width data, consider using unpack (there's a tutorial for this: perlpacktut).]
Your two lines
my @val = split(" ",$val);
my $v = join (":", @val);
could be written as a single line not requiring the intermediary @val:
my $v = join ':', split ' ', $val;
If you're not already familiar with this, take a look at the split documentation to see the difference between the "split /PATTERN/"s ' ', / / and /\s+/.
In the script below, I've modified the regex so that an explicit split isn't required and the join is performed as part of the assignment to $val.
This gets rid of both of those two lines as well as both of the @val and $v variables.
Your code
if (exists $hash{$key}) {
$hash{$key} .= $sep.$v;
}
else {
$hash{$key} = $v;
}
looks fine as it is.
It's easy to both read and maintain.
You can often replace this construct by using a ternary operator (aka "Conditional Operator"); however, I think the following looks clunky and is inferior to what you've already written (with respect to readability and maintainability):
$hash{$key} = exists $hash{$key} ? $hash{$key} . $sep . $v : $v;
or even
$hash{$key} = exists $hash{$key} ? "$hash{$key}$sep$v" : $v;
In the script below, I haven't needed that construct because I've allowed $hash{$key} to autovivify (with push @{$data{$key}}, $val;) and performed the concatenation as part of the final print.
[Side Note: print takes a list.
There's no need to explicitly concatenate its arguments.
In the specific case of your code here, you could have written print $k.$hash{$k}."\n"; as just print "$k$hash{$k}\n";; although, that's probably less readable than print $k, $hash{$k}, "\n";.]
Here's the script I've referred to a couple of times above.
#!/usr/bin/env perl -l
use strict;
use warnings;
my %data;
my $re = qr{^(\w+\s+\w+)\s+(\w+)\s+(\w+)};
while (<DATA>) {
my ($key, $val) = map { $_->[0], join ':', @$_[1,2] } [/$re/];
push @{$data{$key}}, $val;
}
print "$_ ", join ', ', @{$data{$_}} for sort keys %data;
__DATA__
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
Output:
id1 name1 cat1:catname1, cat2:catname2
id2 name2 cat3:catname3
id3 name3 cat1:catname1, cat4:catname4
| [reply] [d/l] [select] |
|
|
print "$_ ", join ', ', @{$data{$_}} for sort keys %data;
to do what I was very verbose with. map function is something I am still struggling with. Hopefully with little more practice, I can wrap the concept around my head.
Thanks again, your answer was very enlightening. | [reply] [d/l] |
Re: Appending values to existing hash key
by Lennotoecom (Pilgrim) on Mar 21, 2014 at 06:56 UTC
|
maybe similar to something posted above, anyway
while (@t = split /\s+/, <DATA>){
push (@{$h{$t[0]}{$t[1]}}, "$t[2]:$t[3]");
}
foreach $k (sort keys %h){
print "$k ", join(' ', map {$_, @{$h{$k}{$_}}} sort keys %{$h{$k}}
+),"\n";
}
__DATA__
id1 name1 cat1 catname1
id1 name1 cat2 catname2
id2 name2 cat3 catname3
id3 name3 cat1 catname1
id3 name3 cat4 catname4
| [reply] [d/l] |
Re: Appending values to existing hash key
by locked_user sundialsvc4 (Abbot) on Mar 21, 2014 at 14:05 UTC
|
As others have said, I approach these problems by looking for a representative data-structure, first; a way to get the data into that structure, second; a way to produce the desired output from that structure, third. Perl’s “auto-vivification” features are already designed to make the construction of such data-structures very easy.
If you try to go too-directly from input to output, soon enough someone will want you to add another output, or to do something else with the data between the input and output stages.
| |
|
|