How do I find if an array has duplicate elements, if so discard it?

toddprof has asked for the wisdom of the Perl Monks concerning the following question: ⭐ (arrays)

Replies are listed 'Best First'.
Re: How do I find if an array has duplicate elements, if so discard it?⭐ by DamnDirtyApe (Curate) on Jul 13, 2002 at 22:12 UTC
`use strict ; use warnings ; use Data::Dumper ; sub has_dups { my $arr = shift ; my %counter ; foreach ( @$arr ) { return 1 if $counter{$_}++ ; } return 0 ; } my @A = (1, 2, 3, 4, 5) ; my @B = (6, 7, 7, 8, 9) ; my @C = (10, 11, 12, 13, 15) ; my @D = grep { !has_dups($_) } ( \@A, \@B, \@C ) ; print Dumper( \@D ) ;` [download]	[reply] [d/l]
Re: Answer: Clarify: How do I find if an array has duplicate elements, if so discard it? by ysth (Canon) on Nov 07, 2003 at 10:11 UTC
I took a look at this and wondered what on earth it was trying to do. Then I reread the question. This answer does try to provide a meaningful example of detecting an array with duplicate elements and discarding it. But I have a feeling the questioner meant to say "if so discard" them. If you are downvoting this for other than lack of interest, thanks for sending me a message or reply to let me know why so I can do better (or just stay quiet) next time.	[reply]
Re: How do I find if an array has duplicate elements, if so discard it?⭐ by gba (Initiate) on Jul 15, 2002 at 01:45 UTC
this is some code i applied to an array: @legit is an array of elements @uniq is the resulting array of unique elements. `foreach(@legit) { unless($b{$_}++) { push(@uniq,$_); } }` [download]	[reply] [d/l]
Re: How do I find if an array has duplicate elements, if so discard it?⭐ by maraist (Initiate) on Jul 14, 2002 at 15:14 UTC
I think, as other's have suggested that this is largely dependent on your data structures. e.g. Are we talking about small amounts of data run not so oftenly. Or are we talking about massive amounts of data that is rarely updated, or massive data that's updated regularly. Here's a chart: ----------- small data, unoften updated, order not important # want simplicity sub remove_redundant { my %hdata = map { ($_, 1 ) } @_; return keys %hdata; } Note that you might just want to save things in a hash to begin with (and use "keys %data" whenever you want it as an array). ------------ # small data, often update (more reads than writes) Keep in a hash and convert to a temp array when needed. %data{$val} = 1; for my $el ( keys %data ) { ... } You can even create an overloaded array object which is really just a hash. ------------ # for large data that's rarely updated push @data, $val if ! grep { $_ eq $val } @data; It's a linear search, but if we're talking megs of data here, this is MUCH better than building a HUGE intermediate hash, then having to garbage collect it afterwards. ----------- # for large data that's often updated. (more reads than writes) It's worth while building an overloaded class that stores the data as a hash.. Ideally just use a hash (except that that'll waste a lot of memory). It's better to waste this memory up front than to fragement your dynamic memory pool by constantly generating intermediate memory scratch pads. There is one final solution, and that is to maintain sorted data, then implement a c-function to efficiently perform an insertion function. You can either use a red-black tree or an insertion sort. You could get very creative, and it would obviously be of general utilization (since you're storing scalars). There might be CPAN modules for this already. The red-black tree is a good compromise between performance and memory use whereas the insertion sort will give you the best memory utilization. =-=-=-=-=-=-==- One possible mechanis for using a hash for the array is the following: our %hash_cache; sub hash_array_push { my ( $hash_name, @vals ) = @_; for my $val ( @vals ) { $hash_cache{$hash_name}{$val} = 1; } } sub hash_array_del { my ($hash_name) = @_; delete $hash_cache{$hash_name}; } sub hash_array_getref { my ( $hash_name) = @_; # note that the user can more efficiently # utilize an array-ref than an array. return $hash_cache{$hash_name}; } Yes there's an additional function call overhead for these funcs, but you'd have that with just about any solution aside from manually maintaining the hash. Alternatively you could use it as oo, but OO in perl adds a slightly greater amount of function-call overhead (which is probably offset by the lack of need to do the symbolic hash-table name lookup).	[reply]
Re: How do I find if an array has duplicate elements, if so discard it?⭐ by hossman (Prior) on May 30, 2003 at 00:22 UTC
A one-liner from "Effective Perl Programming"... `@uniq = sort keys %{ { map { $_, 1 } @list } };`	[reply] [d/l]
Re: How do I find if an array has duplicate elements, if so discard it?⭐ by syphilis (Archbishop) on Oct 28, 2019 at 12:22 UTC
I think the answers already given are fine for strings and IVs, but can fail where NVs (including NVs that contain certain integer values) are involved. In the following, I demonstrate the shortcomings of some of the (flawed) techniques provided by those answers. use strict; use warnings ; use Data::Dumper ; use List::MoreUtils; sub has_dups { my $arr = shift ; my %counter ; foreach ( @$arr ) { return 1 if $counter{$_}++ ; } return 0 ; } # @A contains pairs of duplicate values my @A = (1 << 60, 2 60, 1 << 63, 2 63) ; # @B contains two unique values my $approx = sqrt 2.0; my @B = ("$approx" + 0, sqrt 2.0) ; my @D = grep { !has_dups($_) } ( \@A, \@B ) ; print Dumper(\@D), "\n"; @D = grep {List::MoreUtils::uniq(@$_) == @$_} (\@A,\@B); print Dumper(\@D), "\n"; my @list = @A; my @uniq = sort keys %{ { map { $_, 1 } @list } }; print Dumper(\@uniq); __END__ Outputs: $VAR1 = [ [ '1152921504606846976', '1.15292150460684698e+18', '9223372036854775808', '9.22337203685477581e+18' ] ]; $VAR1 = [ [ '1152921504606846976', '1.15292150460684698e+18', '9223372036854775808', '9.22337203685477581e+18' ] ]; $VAR1 = [ '1.15292150460684698e+18', '1152921504606846976', '9.22337203685477581e+18', '9223372036854775808' ]; [download] On any perl whose nvtype is double, you'll find that the various routines used above will consider that 1.4142135623731000 (0x1.6a09e667f3be3) and sqrt 2.0 (0x1.6a09e667f3bcd) are duplicates of each other, though the hex representation demonstrates quite clearly that it's not so. (The same type of failure arises when nvtype is long double or __float128.) OTOH, when ivsize is 8, the same routines will determine that (eg) 1 << 60 and 2 ** 60 are different values, even though they are obviously equivalent. (Perl's whose nvtype is __float128 might not suffer this issue.) The uniqnum() implementation in the core module List::Util is similarly afflicted - both List::Util and List::MoreUtils pass their test suites by avoiding the testing of these cases. I believe the solutions given at How to find and remove duplicate elements from an array? are similarly flawed. Let me know if there's an answer in either of those Q&A threads that works flawlessly - it looks to me that there isn't, but I didn't test all of them. List::Util::PP::uniqnum(), which is part of List::Util::MaybeXS, is currently the most reliable detector of numeric duplicates that I found on CPAN. AFAIK it's only flaw is that (on perls whose nvsize == 8 && ivsize == 8) the IV 18446744073709551615 (0xffffffffffffffff) and the NV 2 ** 64 (0x1p+64) are deemed to be duplicates, even though they are exact representations of 2 different values. There's currently a pull request that will fix these and all other known issues for List::Util::uniqnum at https://github.com/Dual-Life/Scalar-List-Utils/pull/80. Cheers, Rob	[reply] [d/l]
Re: Clarify: How do I find if an array has duplicate elements, if so discard it?⭐ by BrowserUk (Patriarch) on Jul 14, 2002 at 03:18 UTC
Depending how your building your data structure, it might make more sense to only add an array to @D, if that array has no duplicates. It being easier to not add it than to later remove it.	[reply]
Re: How do I find if an array has duplicate elements, if so discard it? by rjimlad (Acolyte) on Jul 19, 2002 at 20:18 UTC
Err, okay, my answer, short and sweet: use a hash and array slices: `my @a=(1 .. 2); my @b=(2 .. 3); my %c; @c{@a,@b}=(@a,@b); warn "Duplicates exist!" if scalar @c{@a,@b} != (@a+@b);` [download] Of course, in this instance you don't want the last line - instead you'd probably want something like: `my @d=keys %c;` [download] ...which will be out-of-order, but guaranteed duplicate free.	[reply] [d/l] [select]
Re: How do I find if an array has duplicate elements, if so discard it? by snoopy (Curate) on Sep 27, 2005 at 23:02 UTC
`use YAML; use List::MoreUtils; my @A = (1, 2, 3, 4, 5) ; my @B = (6, 7, 7, 8, 9) ; my @C = (10, 11, 12, 13, 15) ; my @D = grep {List::MoreUtils::uniq(@$_) == @$_} (\@A, \@B, \@C); print YAML::Dump(@D);` [download]	[reply] [d/l]
Re: Clarify: How do I find if an array has duplicate elements, if so discard it? by toddprof (Initiate) on Jul 13, 2002 at 21:35 UTC
I meant: @A = (1, 2, 3, 4, 5) ; @B = (6, 7, 7, 8, 9) ; @C = (10, 11, 12, 13, 15) ; @D = @A + @B + @C ; # here is where if array has duplicate elements (since @B does) then you would want to discard @B: foreach $element (@D) { print $element, "\n" ; Results should be this since @B has duplicate elements: 1, 2, 3, 4, 5 10, 11, 12, 13, 15 Originally posted as a Categorized Answer.	[reply]
Re: How do I find if an array has duplicate elements, if so discard it? by Anonymous Monk on Jun 19, 2004 at 10:01 UTC
`duplicate(k, n) { int i for(i=o;i<n;i++) { if(a[i]==a[i+1]) { i++; } else i++; } }` [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: How do I find if an array has duplicate elements, if so discard it? by perladdict (Chaplain) on May 11, 2006 at 09:13 UTC
hi monk, i am beginar to perl,i tried to remove the duplicates from a one dimentional array.Here is my code `#!/usr/bin/perl -w @a=(1,2,3,4,5,5,6,3,7); undef %raw; @b=grep(!$raw{$_}++,@a); print "@b \n";` [download] Originally posted as a Categorized Answer.	[reply] [d/l]


P is for Practical
	PerlMonks