Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: To Hash or to Array--Uniqueness is the question.

by Ryszard (Priest)
on Dec 02, 2005 at 08:28 UTC ( #513532=note: print w/replies, xml ) Need Help??

in reply to To Hash or to Array--Uniqueness is the question.

Warning, untested code:
my %stathash; while (<FH>) { $stathash{$_}++; }
Has the extra advantage of counting the number of hits for each unique value.

You can then do some grooy stuff, like pulling out records which occur n times, records which appear in one set and not another (if you use two hashes, two datasets), or records which appear in both sets, (again,if you use two hashes, two datasets)

I regularly do this with sets of about 500k records to determine where my data integrity issues lie, its pretty damn fast.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://513532]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2016-10-23 16:28 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (301 votes). Check out past polls.