Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: To Hash or to Array--Uniqueness is the question.

by Ryszard (Priest)
on Dec 02, 2005 at 08:28 UTC ( #513532=note: print w/replies, xml ) Need Help??


in reply to To Hash or to Array--Uniqueness is the question.

Warning, untested code:
my %stathash; while (<FH>) { $stathash{$_}++; }
Has the extra advantage of counting the number of hits for each unique value.

You can then do some grooy stuff, like pulling out records which occur n times, records which appear in one set and not another (if you use two hashes, two datasets), or records which appear in both sets, (again,if you use two hashes, two datasets)

I regularly do this with sets of about 500k records to determine where my data integrity issues lie, its pretty damn fast.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://513532]
help
Chatterbox?
[LanX]: wot wot?
[thezip]: W0t n0t

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2017-03-24 18:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (305 votes). Check out past polls.