Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Removing duplicate values for a hash of arrays

by perlvroom (Acolyte)
on Nov 21, 2013 at 17:42 UTC ( [id://1063761]=perlquestion: print w/replies, xml ) Need Help??

perlvroom has asked for the wisdom of the Perl Monks concerning the following question:

I was able to do this for a regular hash with scalar values but i'm having trouble with a hash of arrays. Suppose i have a hash like:
my %hash = ( 'a' => ['1', '2'], 'b' => ['2', '3'], 'c' => ['1', '2'] );
I want to iterate through the hash and remove the key/value pair for a or c (doesn't matter which one). The best thing i could come up with was something that identified which keys have duplicate values but i don't know how to delete all but 1.

Replies are listed 'Best First'.
Re: Removing duplicate values for a hash of arrays
by Kenosis (Priest) on Nov 21, 2013 at 17:51 UTC

    The best thing i could come up with was something that identified which keys have duplicate values but i don't know how to delete all but 1.

    Can you share your code?

    Edit - Here's one option:

    use strict; use warnings; use Array::Compare; use Data::Dumper; my %hash = ( 'a' => [ '1', '2' ], 'b' => [ '2', '3' ], 'c' => [ '1', '2' ] ); my $comp = Array::Compare->new; for my $key1 ( keys %hash ) { for my $key2 ( keys %hash ) { next if $key1 eq $key2 or !$hash{$key1}; delete $hash{$key2} if $comp->compare( \@{ $hash{$key1} }, \@{ $hash{$key2} } ); } } print Dumper \%hash;

    Output:

    $VAR1 = { 'c' => [ '1', '2' ], 'b' => [ '2', '3' ] };

    The or !$hash{$key1} notation avoids autovivification in cases where the key no longer exists.

      Thank you. This is what i needed. To answer another question from below, they are different arrays that happen to contain the same content.
Re: Removing duplicate values for a hash of arrays
by Eily (Monsignor) on Nov 21, 2013 at 18:27 UTC

    With a regular hash you just have to do:

    %reverseHash = reverse %hash; %hash = reverse %reverseHash;
    to eliminate duplicate values. Which of course doesn't work in your case, because the arrays in @hash{'a', 'c'} are no the same, though they contain the same values, and so have different references. Well, you could always turn those arrays into something identical. If you have someFunc(PARAM) that returns the same thing if and only if the values in the data structure PARAM are identical, you can do:
    my %reverseHash; while(($key, $value)=each %hash) { $processedValue = someFunc($value); $reverseHash{$processedValue} = $key; } my @keys = values %reverseHash; my %newHash; @newHash{@keys} = @hash{@keys};

    The thing is, there is an easy solution for someFunc : Data::Dumper (and this time I'll use map)

    use Data::Dumper; my %hash = ( a => [1, 2], b => [2, 3], c => [1, 2] ); my %reverseHash = map { Dumper($hash{$_}) => $_ } keys %hash; %hash = map { $_ => $hash{$_} } values %reverseHash; print Dumper \%hash;

    Edit: this works well for nested arrays, even blessed ones. But with hashes, you probably have to activate Data::Dumper's key sorting (It depends of the version of Perl you are using).

Re: Removing duplicate values for a hash of arrays
by trippledubs (Deacon) on Nov 21, 2013 at 18:08 UTC
    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], ); print Dumper \%hash; print "Delete whole key containing aref\n\n"; delete $hash{a}; print Dumper \%hash; print "Will turn to undef, perldoc -f delete says dont do it\n\n"; delete $hash{b}->[0]; print Dumper \%hash; print "Delete an element\n\n"; splice @{$hash{c}},0,1; print Dumper \%hash; # Returns $VAR1 = { 'c' => [ '1', '2' ], 'a' => [ '1', '2' ], 'b' => [ '2', '3' ] }; Delete whole key containing aref $VAR1 = { 'c' => [ '1', '2' ], 'b' => [ '2', '3' ] }; Will turn to undef, perldoc -f delete says dont do it $VAR1 = { 'c' => [ '1', '2' ], 'b' => [ undef, '3' ] }; Delete an element $VAR1 = { 'c' => [ '2' ], 'b' => [ undef, '3' ] };
Re: Removing duplicate values for a hash of arrays
by kcott (Archbishop) on Nov 22, 2013 at 05:59 UTC

    G'day perlvroom,

    This does what you want:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; my %hash = ('a' => ['1', '2'], 'b' => ['2', '3'], 'c' => ['1', '2']); dd \%hash; my %seen; $seen{join $; => @{$hash{$_}}}++ and delete $hash{$_} for keys %hash; dd \%hash;

    Output:

    { a => [1, 2], b => [2, 3], c => [1, 2] } { a => [1, 2], b => [2, 3] }

    You can use whatever you want as the join separator. I've used $; as it's probably unlikely to occur in your arrayref data; obviously, you'll have a better idea of this than me. It used to be used in Perl4 to emulate multidimensional arrays but it's rarely seen in Perl5 code: I occasionally find it useful in this sort of situation. Read about it in perlvar (aka $SUBSCRIPT_SEPARATOR and $SUBSEP).

    -- Ken

Re: Removing duplicate values for a hash of arrays
by Laurent_R (Canon) on Nov 21, 2013 at 18:21 UTC

    In this:

    my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], );
    are a and c pointing to the same array ref or are they two different arrays just happening to have the same content? This might change the best strategy to remove the duplicates.

    Generally, the best way to remove duplicates is to use a hash. So in this case you have to construct the reverse hash to identify the keys of the original hash that you want to keep. And the best way to do this will depend on the anwer to my first question above.

      my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], ); say "$_ => $hash{$_}" for keys %hash;
      c => ARRAY(0x1abd8f0) a => ARRAY(0x1a9f998) b => ARRAY(0x1abd7a0)
      So the two arrays are two different arrays, and you can't compare them with a simple cmp.

        Yes, if the hash is built this way, these will be different array refs, but we don't know how the hash was originally constructed. This is why I asked the question.
Re: Removing duplicate values for a hash of arrays
by trippledubs (Deacon) on Nov 21, 2013 at 18:31 UTC
    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], ); my $testing_now = 'a'; for (keys %hash) { next if ($testing_now eq $_); if ($hash{$testing_now} ~~ $hash{$_}) { print "$testing_now is duplicate of $_\n"; } #This is an update! Forgot this next line in initial post $testing_now = $_ } # Returns a is duplicate of c c is duplicate of a (.02 version)

    I did not see the part about duplicate values earlier. I'm not sure how to do it without the smart match easily! I think for me it would be easier to compile a new version of Perl! reverse!

Re^2: Removing duplicate values for a hash of arrays
by Happy-the-monk (Canon) on Nov 21, 2013 at 18:03 UTC

    but i don't know how to delete all but 1

    Again this might be funnier with some code to show, but you could delete all and re-enter one of them.

    Cheers, Sören

    Créateur des bugs mobiles - let loose once, run everywhere.
    (hooked on the Perl Programming language)

Re: Removing duplicate values for a hash of arrays
by Random_Walk (Prior) on Nov 21, 2013 at 21:52 UTC

    This relies on having some separator, 'x' that is sure not to appear in your data. YMMV

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( a => ['1', '2'], b => ['2', '3'], c => ['1', '2'] ); my %temp = map {(join 'x', @{$hash{$_}}) => $_} keys %hash; my %d_duped = map {$temp{$_}, [split 'x', $_]} keys %temp; print Dumper \%d_duped;

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1063761]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-24 05:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found