http://www.perlmonks.org?node_id=11106779

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a (small) dataset in a array of hashes, like this:

@AoH = ( { 'targetL' => 'foisonnement', 'origin' => 'AMG', 'count' => '1', }, { 'targetL' => 'foisonnement', 'origin' => 'IDBR', 'count' => '1', }, { 'origin' => 'IWWF', 'targetL' => 'gonfler', 'count' => '1', }, { 'origin' => 'IWWF', 'targetL' => 'due', 'count' => '1', }, { 'origin' => 'IWWF', 'targetL' => 'due', 'count' => '1', }, );

I need to 'merge' the hashes that have the same value in key 'targetL' and keep the count of how many of them have been merged, incrementing the 'count' value. Plus, the 'origin' keys of the merged hashes should be concatenated. With the above input I need to get:

@AoHfinal = ( { 'targetL' => 'foisonnement', 'origin' => 'AMG IDBR', 'count' => '2', }, { 'origin' => 'IWWF', 'targetL' => 'gonfler', 'count' => '1', }, { 'origin' => 'IWWF', 'targetL' => 'due', 'count' => '2', }, );

I am quite new to Perl. I came out with a very bulky solution with too many loops and recreation of intermediate array of hashes. But, as far I can see, this is something that should be done in a few lines of code. In which direction should I move?

Replies are listed 'Best First'.
Re: Counting elements in array of cases
by tybalt89 (Monsignor) on Sep 27, 2019 at 15:24 UTC

    Preserves order of both targetL and origin.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11106779 use warnings; use List::Util qw( uniq ); my @AoH = ( { count => 1, origin => "AMG", targetL => "foisonnement" }, { count => 1, origin => "IDBR", targetL => "foisonnement" }, { count => 1, origin => "IWWF", targetL => "gonfler" }, { count => 1, origin => "IWWF", targetL => "due" }, { count => 1, origin => "IWWF", targetL => "due" }, ); my %seen; my @AoHfinal = grep { if( my $prev = $seen{$_->{targetL}} ) { $prev->{count} += $_->{count}; $prev->{origin} = join ' ', uniq split(' ', $prev->{origin}), $_-> +{origin}; 0 # skip duplicate } else { $seen{$_->{targetL}} = $_; } } @AoH; use Data::Dump 'dd'; dd @AoHfinal;

    Outputs:

    ( { count => 2, origin => "AMG IDBR", targetL => "foisonnement" }, { count => 1, origin => "IWWF", targetL => "gonfler" }, { count => 2, origin => "IWWF", targetL => "due" }, )
      Interesting approach! ++ :)

      Allow me some side notes:

      1. it's destructive.

      Copying the first hash with

      $seen{$_->{targetL}} = {%$_};

      should fix it.(can't test)

      2. Also are the repeated splits not really efficient... Not sure how best to change that.

      3. For better clarity I'd rather use a map and not a grep.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        It's destructive => There was no requirement to not do that. I'll resort to my old saying "If it passes all the test cases, it's correct :)"

        Repeated splits => Just concatenate as we go, and fix up right before the end. Is it really faster? I don't know and I don't care :)

        Like so:

        #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11106779 use warnings; use List::Util qw( uniq ); my @AoH = ( { count => 1, origin => "AMG", targetL => "foisonnement" }, { count => 1, origin => "IDBR", targetL => "foisonnement" }, { count => 1, origin => "IWWF", targetL => "gonfler" }, { count => 1, origin => "IWWF", targetL => "due" }, { count => 1, origin => "IWWF", targetL => "due" }, ); my %seen; my @AoHfinal = grep { $_->{origin} = join ' ', uniq split ' ', $_->{or +igin} } map { if( my $prev = $seen{$_->{targetL}} ) { $prev->{count} += $_->{count}; $prev->{origin} .= ' ' . $_->{origin}; () # skip duplicate } else { $seen{$_->{targetL}} = { %$_ }; } } @AoH; use Data::Dump 'dd'; dd @AoHfinal;
        my take on it...
        #!/usr/bin/perl -w # AoH_map.pl --- Perlmonks: Counting elements in array of cases # Link: https://perlmonks.org/?node_id=11106779 # Author: <LanX> # Base: <tybalt89> https://perlmonks.org/?node_id=11106792 # Created: 28 Sep 2019 # Version: 0.03 use warnings; use strict; use Test::More; sub transform { my %prev; my %origin; return map { my $target = $_->{targetL}; my $origin = $_->{origin}; if ( my $prev = $prev{$target} ) { $prev->{count} += $_->{count}; $prev->{origin} .= " $origin" unless $origin{$target}{$origin}++; # + ++ FIXED () # skip duplicate } else { $origin{$target}{$origin} = 1; $prev{$target} = { %$_ }; # return first (clone) } } @_; } # ---------- tests is_deeply ( [ transform( {'targetL' => 'foisonnement', 'origin' => 'AMG', 'count' => '1 +'}, {'targetL' => 'foisonnement', 'origin' => 'IDBR', 'count' => ' +1'}, {'targetL' => 'gonfler', 'origin' => 'IWWF', 'count' => '1'}, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '1' }, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '1' }, ) ], [ {'targetL' => 'foisonnement','origin' => 'AMG IDBR','count'=>' +2'}, {'targetL' => 'gonfler','origin' => 'IWWF','count' => '1'}, {'targetL' => 'due','origin' => 'IWWF','count' => '2'}, ] ); is_deeply ( [ transform( {'targetL' => 'foisonnement', 'origin' => 'AMG', 'count' => '1 +'}, {'targetL' => 'foisonnement', 'origin' => 'IDBR', 'count' => ' +1'}, {'targetL' => 'foisonnement', 'origin' => 'AMG', 'count' => '1 +'}, {'targetL' => 'foisonnement', 'origin' => 'IDBR', 'count' => ' +1'}, ) ], [ {'targetL' => 'foisonnement','origin' => 'AMG IDBR','count'=>' +4'}, ] ); done_testing;

        C:/Perl_524/bin\perl.exe d:/tmp/pm/AoH_map.pl ok 1 ok 2 1..2 Compilation finished at Sun Sep 29 00:44:35

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        update

        added missing ++ and testcase, see reply from rsFalse for details.

Re: Counting elements in array of cases
by haukex (Archbishop) on Sep 27, 2019 at 08:32 UTC

    Show your code anyway?

    use warnings; use strict; use Test::More; my @AoH = ( {'targetL' => 'foisonnement', 'origin' => 'AMG', 'count' => '1'}, {'targetL' => 'foisonnement', 'origin' => 'IDBR', 'count' => '1'}, {'targetL' => 'gonfler', 'origin' => 'IWWF', 'count' => '1'}, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '1' }, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '1' }, ); my @AoHfinal; my %targets; for my $h (@AoH) { push @AoHfinal, ($targets{$$h{targetL}}={targetL=>$$h{targetL}}) unless $targets{$$h{targetL}}; $targets{$$h{targetL}}{origin}{$$h{origin}}++; $targets{$$h{targetL}}{count}++; } $$_{origin} = join ' ', sort keys %{$$_{origin}} for values %targets; is_deeply \@AoHfinal, [ {'targetL' => 'foisonnement','origin' => 'AMG IDBR','count'=>'2'}, {'targetL' => 'gonfler','origin' => 'IWWF','count' => '1'}, {'targetL' => 'due','origin' => 'IWWF','count' => '2'}, ] or diag explain \@AoHfinal; done_testing;

    Note you didn't specify the order of the resulting values, both @AoHfinal (so I kept the original order from @AoH), or of the origin values (so I sorted them because it was easier).

      Aww you fell for it and did OPs homework
Re: Counting elements in array of cases (updated)
by LanX (Saint) on Sep 27, 2019 at 09:44 UTC
    stealing HaukeX' test suite. (++)

    This

    • keeps order and
    • enforces unique origins (something you didn't describe but showed in your expected output)
    turned out to be very similar to Hauke's idea.

    #!/usr/bin/perl -w # AoH_count.pl --- Perlmonks: Counting elements in array of cases # Link: https://perlmonks.org/?node_id=11106779 # Author: <LanX> # Created: 27 Sep 2019 # Version: 0.01 use warnings; use strict; use List::Util qw/uniq/; use Test::More; #use Data::Dump qw/pp/; my @AoH = ( {'targetL' => 'foisonnement', 'origin' => 'AMG', 'count' => '1'}, {'targetL' => 'foisonnement', 'origin' => 'IDBR', 'count' => '1'}, {'targetL' => 'gonfler', 'origin' => 'IWWF', 'count' => '1'}, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '1' }, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '1' }, ); my @AoHfinal; my %count; my %origin; my @order; for my $h (@AoH) { my $target = $h->{targetL}; push @order, $target unless exists $count{$target}; $count{ $target }++; push @{ $origin{$target} }, $h->{origin}; } #pp '\%count,\%origin,\@order: ', \%count,\%origin,\@order; for my $target ( @order ) { push @AoHfinal, { targetL => $target, origin => join ( " ", uniq @{ $origin{$target} } ), count => $count{$target}, }; } #pp '\@AoHfinal: ', \@AoHfinal; is_deeply \@AoHfinal, [ {'targetL' => 'foisonnement','origin' => 'AMG IDBR','count'=>'2'}, {'targetL' => 'gonfler','origin' => 'IWWF','count' => '1'}, {'targetL' => 'due','origin' => 'IWWF','count' => '2'}, ] or diag explain \@AoHfinal; done_testing;

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    here an even more readable version with a second uniq and dealing with varying input-count values

Re: Counting elements in array of cases
by 1nickt (Canon) on Sep 27, 2019 at 13:56 UTC

    Hi, welcome to Perl, the One True Religion. There is more than one way to do it. You should use a database for this.

    If you install DBD::SQLite, the Perl driver for sqlite, you get, as the doc says, "a Perl DBI driver for SQLite, that includes the entire thing in the distribution. So in order to get a fast transaction capable RDBMS working for your perl project you simply have to install this module, and nothing else."

    In the example below, I simply create a database from the data, and then use the SQLite client on my computer to run a query. You could of course extend the Perl script to run the query after loading the data (and if so, you may not need to write a DB file at all, see :memory: as a database name) ... adding such queries to the script is left as an exercise for the reader.

    $ cat 11106779.pl
    use strict; use warnings; use DBI; my @AoH = ({ targetL => 'foisonnement', origin => 'AMG', count => '1', }, { targetL => 'foisonnement', origin => 'IDBR', count => '1', }, { origin => 'IWWF', targetL => 'gonfler', count => '1', }, { origin => 'IWWF', targetL => 'due', count => '1', }, { origin => 'IWWF', targetL => 'due', count => '1', }); my $dbh = DBI->connect('dbi:SQLite:dbname=11106779.db','','', { RaiseE +rror => 1 }); $dbh->do('create table data(targetL varchar(32), origin varchar(16))') +; my $sql = 'insert into data (targetL, origin) values (?, ?)'; my $sth = $dbh->prepare($sql); $sth->execute($_->{targetL}, $_->{origin}) for @AoH; __END__
    $ perl 11106779.pl
    ( ^ note no output; no errors )
    $ sqlite3 11106779.db
    SQLite version 3.24.0 2018-06-04 14:10:15 Enter ".help" for usage hints. sqlite> .headers on sqlite> select origin, targetL, count(origin) count from data group by + origin, targetL order by count desc; origin|targetL |count IWWF |due |2 AMG |foisonnement|1 IDBR |foisonnement|1 IWWF |gonfler |1

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Counting elements in array of cases
by jcb (Parson) on Sep 27, 2019 at 22:35 UTC

    And here is another solution:

    #!/usr/bin/perl use strict; use warnings; my @AoH = ({qw/ targetL foisonnement origin AMG count 1 /}, {qw/ targetL foisonnement origin IDBR count 1 /}, {qw/ origin IWWF targetL gonfler count 1 /}, {qw/ origin IWWF targetL due count 1 /}, {qw/ origin IWWF targetL due count 1 /}); my @AoHfinal = ({'targetL' => 'foisonnement', 'origin' => 'AMG IDBR', 'count' => '2 +'}, {'targetL' => 'gonfler', 'origin' => 'IWWF', 'count' => '1' +}, {'targetL' => 'due', 'origin' => 'IWWF', 'count' => '2' +}); my %out = (); my %seen = (); my @out = (); foreach (@AoH) { if ($out{$_->{targetL}}) { push @{$out{$_->{targetL}}->{origin}}, $_->{origin} unless $seen{$_->{targetL}}{$_->{origin}}++; $out{$_->{targetL}}{count} += $_->{count}; } else { my $row = {targetL => $_->{targetL}, origin => [$_->{origin}], count => $_->{count}}; $out{$_->{targetL}} = $row; push @out, $row; $seen{$_->{targetL}}{$_->{origin}} = 1; } } $_->{origin} = join(' ', @{$_->{origin}}) for @out; use Test::More; is_deeply(\@out, \@AoHfinal); done_testing(); __END__

    I think that the most notable difference in my solution is keeping the running origin values as an array until the very end, instead of repeating string manipulations. This preserves the order of both the targets and the origins while using only the core language — and a core module to verify the result.

      > while using only the core language — and a core module to verify the result.

      As far as I can see all solutions so far - with the exception of the sqlite "solution" - used core modules.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice