Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

How do I find if an array has duplicate elements, if so discard it?

( #181532=categorized question: print w/ replies, xml ) Need Help??
Contributed by Zombie toddprof on Jul 13, 2002 at 21:33 UTC
Q&A  > arrays


Description:

@A = (1, 2, 3, 4, 5) ; @B = (6, 7, 7, 8, 9) ; @C = (10, 11, 12, 13, 15) ; @D = ( @A , @B , @C );
Since @B has duplicates (7,7) It should not appear in @D. In the end @D should contain only sets @A and @C

Answer: How do I find if an array has duplicate elements, if so discard it?
contributed by DamnDirtyApe

use strict ; use warnings ; use Data::Dumper ; sub has_dups { my $arr = shift ; my %counter ; foreach ( @$arr ) { return 1 if $counter{$_}++ ; } return 0 ; } my @A = (1, 2, 3, 4, 5) ; my @B = (6, 7, 7, 8, 9) ; my @C = (10, 11, 12, 13, 15) ; my @D = grep { !has_dups($_) } ( \@A, \@B, \@C ) ; print Dumper( \@D ) ;
Answer: Clarify: How do I find if an array has duplicate elements, if so discard it?
contributed by BrowserUk

Depending how your building your data structure, it might make more sense to only add an array to @D, if that array has no duplicates. It being easier to not add it than to later remove it.

Answer: How do I find if an array has duplicate elements, if so discard it?
contributed by Zombie gba

this is some code i applied to an array:
@legit is an array of elements
@uniq is the resulting array of unique elements.

foreach(@legit) { unless($b{$_}++) { push(@uniq,$_); } }
Answer: How do I find if an array has duplicate elements, if so discard it?
contributed by Zombie maraist

I think, as other's have suggested that this is largely dependent on your data structures.
e.g. Are we talking about small amounts of data run not so oftenly.
Or are we talking about massive amounts of data that is rarely updated, or massive data that's updated regularly.

Here's a chart:
-----------

small data, unoften updated, order not important
# want simplicity
sub remove_redundant {
  my %hdata = map { ($_, 1 ) } @_;
  return keys %hdata;
}
Note that you might just want to save things in a hash to begin with (and use "keys %data" whenever you want it as an array).

------------
# small data, often update (more reads than writes)
Keep in a hash and convert to a temp array when needed.

%data{$val} = 1;

for my $el ( keys %data ) { ... }

You can even create an overloaded array object which is really just a hash.

------------
# for large data that's rarely updated
push @data, $val if ! grep { $_ eq $val } @data;

It's a linear search, but if we're talking megs of data here, this is MUCH better than building a
HUGE intermediate hash, then having to garbage collect it afterwards.
-----------
# for large data that's often updated. (more reads than writes)
It's worth while building an overloaded class that stores the data as a hash..
Ideally just use a hash (except that that'll waste a lot of memory).
It's better to waste this memory up front than to fragement your dynamic memory
pool by constantly generating intermediate memory scratch pads.

There is one final solution, and that is to maintain sorted data, then implement a
c-function to efficiently perform an insertion function. You can either use a red-black tree or
an insertion sort. You could get very creative, and it would obviously be of general
utilization (since you're storing scalars). There might be CPAN modules for this already. The red-black tree is a good compromise
between performance and memory use whereas the insertion sort will give you the best
memory utilization.
=-=-=-=-=-=-==-
One possible mechanis for using a hash for the array is the following:

our %hash_cache;
sub hash_array_push {
  my ( $hash_name, @vals ) = @_;
  for my $val ( @vals ) {
    $hash_cache{$hash_name}{$val} = 1;
  }
}

sub hash_array_del {
  my ($hash_name) = @_;
  delete $hash_cache{$hash_name};
}

sub hash_array_getref {
  my ( $hash_name) = @_;
  # note that the user can more efficiently
  # utilize an array-ref than an array.
  return $hash_cache{$hash_name};
}
Yes there's an additional function call overhead for these funcs, but you'd have that with just
about any solution aside from manually maintaining the hash. Alternatively you could use it as oo, but OO in perl adds a slightly greater amount of
function-call overhead (which is probably offset by the lack of need to do the symbolic
hash-table name lookup).
Answer: How do I find if an array has duplicate elements, if so discard it?
contributed by hossman

A one-liner from "Effective Perl Programming"...

@uniq = sort keys %{ { map { $_, 1 } @list } };
Answer: How do I find if an array has duplicate elements, if so discard it?
contributed by rjimlad

Err, okay, my answer, short and sweet: use a hash and array slices:

my @a=(1 .. 2); my @b=(2 .. 3); my %c; @c{@a,@b}=(@a,@b); warn "Duplicates exist!" if scalar @c{@a,@b} != (@a+@b);
Of course, in this instance you don't want the last line - instead you'd probably want something like:
my @d=keys %c;
...which will be out-of-order, but guaranteed duplicate free.
Answer: How do I find if an array has duplicate elements, if so discard it?
contributed by snoopy

use YAML; use List::MoreUtils; my @A = (1, 2, 3, 4, 5) ; my @B = (6, 7, 7, 8, 9) ; my @C = (10, 11, 12, 13, 15) ; my @D = grep {List::MoreUtils::uniq(@$_) == @$_} (\@A, \@B, \@C); print YAML::Dump(@D);

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (12)
    As of 2014-10-23 13:01 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      For retirement, I am banking on:










      Results (125 votes), past polls