Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

How to get rid of dupes from an array of hashes

by bravenmd (Sexton)
on Jun 30, 2005 at 22:05 UTC ( #471493=perlquestion: print w/ replies, xml ) Need Help??
bravenmd has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering if anyone had any insight on an elegant way to get rid of dupes from from an array of hashes. I have an array containing more than 4000 hashes each containing about 60 key-value pairs and I want a list of unique hashes from the original array. Is there a way to do this without creating a triple loop to look for dupes and in a reasonable amount of time? Thanks for everyone's help in advance. I am really stuck on this. --Brandon Lerner

Comment on How to get rid of dupes from an array of hashes
Re: How to get rid of dupes from an array of hashes
by waswas-fng (Curate) on Jun 30, 2005 at 22:50 UTC
    Post your structure.


    -Waswas
Re: How to get rid of dupes from an array of hashes
by Joost (Canon) on Jun 30, 2005 at 23:00 UTC
    Assuming the keys and values can be uniquely stringified, and you can find a character(sequence) that isn't in your data (I've taken "#" here), you only need to go through the dataset once...

    #!/usr/bin/perl -w use strict; use Data::Dumper; my @array = ( { a => 1, b => 2, c => 3}, { b => 1, d => 2, c => 3}, { a => 1, b => 2, c => 3}, { a => 2, b => 2, c => 3}, { a => 2, b => 1, c => 3}, ); my %count; my @unique; # eats away at @array, use for() if you want to keep it while (my $entry = shift @array) { # need to sort by something because an equivalent hash migh # return the key/value pairs in a different order my @sorted = map { $_, $entry->{$_} } sort keys %$entry; # create a unique string and see if we've seen it before next if $count{ join"#",@sorted }++; push @unique,$entry; } print Dumper(\@unique);

    I'm not so sure how efficient this is with your data, though. Try it out. :-)

    by the way: if your hashes are really the same objects (i.e. references to the same structure) you can just compare the stringified reference (ie. $href eq $href2). But considering your question, you're probably not in that situation.

Re: How to get rid of dupes from an array of hashes
by injunjoel (Priest) on Jun 30, 2005 at 23:45 UTC
    Greetings all,
    This works for a small array. Im not sure how well it will scale for 4000 elements, hopefully you have enough memory for it.
    #!/usr/bin/perl -w use strict; my @array = ( { a => 1, b => 2, c => 3}, { a => 1, b => 2, c => 3}, { a => 1, b => 2, c => 3}, { a => 2, b => 2, c => 3}, { a => 2, b => 1, c => 3} ); my @unique = do{ my %seen = map{join(":", values %{$_}), $_} @array; values %seen; }; foreach(@array){ foreach my $key(sort keys %{$_}){ print "$key => $_->{$key}:"; } print "\n"; } print "\nunique\n"; foreach(@unique){ foreach my $key(sort keys %{$_}){ print "$key => $_->{$key}:"; } print "\n"; }
    output
    a => 1:b => 2:c => 3: a => 1:b => 2:c => 3: a => 1:b => 2:c => 3: a => 2:b => 2:c => 3: a => 2:b => 1:c => 3: unique a => 2:b => 2:c => 3: a => 1:b => 2:c => 3: a => 2:b => 1:c => 3:

    This code assumes that the hashes have the same keys just different values. If this is not the case change the line
    my %seen = map{join(":", values %{$_}), $_} @array;
    to
    my %seen = map{join(":", keys %{$_}), $_} @array;


    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: How to get rid of dupes from an array of hashes
by davidrw (Prior) on Jun 30, 2005 at 23:49 UTC
    I immediately thought of a previous post of mine: Re: Tabular Data, Group By Functions -- If this data were in a database, this would be a simple SQL statement (assuming cols in Joost's solution):

    SELECT DISTINCT a, b, c FROM your_table; (or could do GROUP BY a, b, c HAVING count(*) = 1 (or count(*)>1, etc))

    Pros/cons are addressed in the aforementioned node & replies. If this data is coming from (or going to) a database and there isn't one available (though don't need one w/DBD::AnyData), then may not be worth it, but just food for thought (especially since I don't know the larger context of your application/requirements/environment).
Re: How to get rid of dupes from an array of hashes
by sapnac (Beadle) on Jul 01, 2005 at 16:14 UTC


    If this going to be more that once, you are better off creating 4001th hash in the array containing only the unique ones. Everytime you add and element chekc this array add to it too and vola! ur task will be greately reduced.

    Just a thought! I really don't know of if you can get uniques without looping less. But per experience, hashes are fast and will not take that much time. Time it with 1000 first.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://471493]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-08-23 06:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (172 votes), past polls