Removing duplicate values for a hash of arrays

perlvroom has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing duplicate values for a hash of arrays by Kenosis (Priest) on Nov 21, 2013 at 17:51 UTC
The best thing i could come up with was something that identified which keys have duplicate values but i don't know how to delete all but 1. Can you share your code? Edit - Here's one option: `use strict; use warnings; use Array::Compare; use Data::Dumper; my %hash = ( 'a' => [ '1', '2' ], 'b' => [ '2', '3' ], 'c' => [ '1', '2' ] ); my $comp = Array::Compare->new; for my $key1 ( keys %hash ) { for my $key2 ( keys %hash ) { next if $key1 eq $key2 or !$hash{$key1}; delete $hash{$key2} if $comp->compare( \@{ $hash{$key1} }, \@{ $hash{$key2} } ); } } print Dumper \%hash;` [download] Output: `$VAR1 = { 'c' => [ '1', '2' ], 'b' => [ '2', '3' ] };` [download] The `or !$hash{$key1}` notation avoids autovivification in cases where the key no longer exists.	[reply] [d/l] [select]
Re^2: Removing duplicate values for a hash of arrays by perlvroom (Acolyte) on Nov 21, 2013 at 19:47 UTC
Thank you. This is what i needed. To answer another question from below, they are different arrays that happen to contain the same content.	[reply]
Re: Removing duplicate values for a hash of arrays by Eily (Monsignor) on Nov 21, 2013 at 18:27 UTC
With a regular hash you just have to do: `%reverseHash = reverse %hash; %hash = reverse %reverseHash;` [download] to eliminate duplicate values. Which of course doesn't work in your case, because the arrays in @hash{'a', 'c'} are no the same, though they contain the same values, and so have different references. Well, you could always turn those arrays into something identical. If you have someFunc(PARAM) that returns the same thing if and only if the values in the data structure PARAM are identical, you can do: `my %reverseHash; while(($key, $value)=each %hash) { $processedValue = someFunc($value); $reverseHash{$processedValue} = $key; } my @keys = values %reverseHash; my %newHash; @newHash{@keys} = @hash{@keys};` [download] The thing is, there is an easy solution for someFunc : Data::Dumper (and this time I'll use map) `use Data::Dumper; my %hash = ( a => [1, 2], b => [2, 3], c => [1, 2] ); my %reverseHash = map { Dumper($hash{$_}) => $_ } keys %hash; %hash = map { $_ => $hash{$_} } values %reverseHash; print Dumper \%hash;` [download] Edit: this works well for nested arrays, even blessed ones. But with hashes, you probably have to activate Data::Dumper's key sorting (It depends of the version of Perl you are using).	[reply] [d/l] [select]
Re: Removing duplicate values for a hash of arrays by trippledubs (Deacon) on Nov 21, 2013 at 18:08 UTC
#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], ); print Dumper \%hash; print "Delete whole key containing aref\n\n"; delete $hash{a}; print Dumper \%hash; print "Will turn to undef, perldoc -f delete says dont do it\n\n"; delete $hash{b}->[0]; print Dumper \%hash; print "Delete an element\n\n"; splice @{$hash{c}},0,1; print Dumper \%hash; # Returns $VAR1 = { 'c' => [ '1', '2' ], 'a' => [ '1', '2' ], 'b' => [ '2', '3' ] }; Delete whole key containing aref $VAR1 = { 'c' => [ '1', '2' ], 'b' => [ '2', '3' ] }; Will turn to undef, perldoc -f delete says dont do it $VAR1 = { 'c' => [ '1', '2' ], 'b' => [ undef, '3' ] }; Delete an element $VAR1 = { 'c' => [ '2' ], 'b' => [ undef, '3' ] }; [download]	[reply] [d/l]
Re: Removing duplicate values for a hash of arrays by kcott (Archbishop) on Nov 22, 2013 at 05:59 UTC
G'day perlvroom, This does what you want: `#!/usr/bin/env perl use strict; use warnings; use Data::Dump; my %hash = ('a' => ['1', '2'], 'b' => ['2', '3'], 'c' => ['1', '2']); dd \%hash; my %seen; $seen{join $; => @{$hash{$_}}}++ and delete $hash{$_} for keys %hash; dd \%hash;` [download] Output: `{ a => [1, 2], b => [2, 3], c => [1, 2] } { a => [1, 2], b => [2, 3] }` [download] You can use whatever you want as the `join` separator. I've used `$;` as it's probably unlikely to occur in your arrayref data; obviously, you'll have a better idea of this than me. It used to be used in Perl4 to emulate multidimensional arrays but it's rarely seen in Perl5 code: I occasionally find it useful in this sort of situation. Read about it in perlvar (aka `$SUBSCRIPT_SEPARATOR` and `$SUBSEP`). -- Ken	[reply] [d/l] [select]
Re: Removing duplicate values for a hash of arrays by Laurent_R (Canon) on Nov 21, 2013 at 18:21 UTC
In this: `my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], );` [download] are a and c pointing to the same array ref or are they two different arrays just happening to have the same content? This might change the best strategy to remove the duplicates. Generally, the best way to remove duplicates is to use a hash. So in this case you have to construct the reverse hash to identify the keys of the original hash that you want to keep. And the best way to do this will depend on the anwer to my first question above.	[reply] [d/l]
Re^2: Removing duplicate values for a hash of arrays by Eily (Monsignor) on Nov 21, 2013 at 18:30 UTC
`my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], ); say "$_ => $hash{$_}" for keys %hash;` [download] `c => ARRAY(0x1abd8f0) a => ARRAY(0x1a9f998) b => ARRAY(0x1abd7a0)` [download] So the two arrays are two different arrays, and you can't compare them with a simple cmp.	[reply] [d/l] [select]
Re^3: Removing duplicate values for a hash of arrays by Laurent_R (Canon) on Nov 21, 2013 at 18:55 UTC
Yes, if the hash is built this way, these will be different array refs, but we don't know how the hash was originally constructed. This is why I asked the question.	[reply]
Re: Removing duplicate values for a hash of arrays by trippledubs (Deacon) on Nov 21, 2013 at 18:31 UTC
`#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; my %hash = ( a => [ '1','2', ], b => [ '2','3', ], c => [ '1','2', ], ); my $testing_now = 'a'; for (keys %hash) { next if ($testing_now eq $_); if ($hash{$testing_now} ~~ $hash{$_}) { print "$testing_now is duplicate of $_\n"; } #This is an update! Forgot this next line in initial post $testing_now = $_ } # Returns a is duplicate of c c is duplicate of a (.02 version)` [download] I did not see the part about duplicate values earlier. ~~I'm not sure how to do it without the smart match easily! I think for me it would be easier to compile a new version of Perl!~~ reverse!	[reply] [d/l]
Re^2: Removing duplicate values for a hash of arrays by Happy-the-monk (Canon) on Nov 21, 2013 at 18:03 UTC
but i don't know how to delete all but 1 Again this might be funnier with some code to show, but you could delete all and re-enter one of them. Cheers, Sören Créateur des bugs mobiles - let loose once, run everywhere. (hooked on the Perl Programming language)	[reply]
Re: Removing duplicate values for a hash of arrays by Random_Walk (Prior) on Nov 21, 2013 at 21:52 UTC
This relies on having some separator, 'x' that is sure not to appear in your data. YMMV `#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( a => ['1', '2'], b => ['2', '3'], c => ['1', '2'] ); my %temp = map {(join 'x', @{$hash{$_}}) => $_} keys %hash; my %d_duped = map {$temp{$_}, [split 'x', $_]} keys %temp; print Dumper \%d_duped;` [download] Cheers, R. Pereant, qui ante nos nostra dixerunt!	[reply] [d/l]


go ahead... be a heretic
	PerlMonks