Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: To Hash or to Array--Uniqueness is the question.

by Ryszard (Priest)
on Dec 02, 2005 at 08:28 UTC ( #513532=note: print w/ replies, xml ) Need Help??


in reply to To Hash or to Array--Uniqueness is the question.

Warning, untested code:

my %stathash; while (<FH>) { $stathash{$_}++; }
Has the extra advantage of counting the number of hits for each unique value.

You can then do some grooy stuff, like pulling out records which occur n times, records which appear in one set and not another (if you use two hashes, two datasets), or records which appear in both sets, (again,if you use two hashes, two datasets)

I regularly do this with sets of about 500k records to determine where my data integrity issues lie, its pretty damn fast.


Comment on Re: To Hash or to Array--Uniqueness is the question.
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://513532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (10)
As of 2015-07-31 08:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (276 votes), past polls