Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

(Guildenstern) Re: Re: Taming a memory hog

by Guildenstern (Deacon)
on Nov 12, 2003 at 16:44 UTC ( #306554=note: print w/replies, xml ) Need Help??

in reply to Re: Taming a memory hog
in thread Taming a memory hog

The data validation is not performed in the generation application. We've written a few companion scripts that read in the generated data, perform some integrity checks, and compare data with all other data read in.

Each line of data is three records, so the script splits the data and performs a SHA-1 calculation on each record. Each SHA value is saved to a file, then File::Sort is used to sort the SHA file. Then, it's a simple matter of reading each line of the sorted file and comparing it against the previously read line to see if there's a duplicate record.

I chose to compute the SHA for each record because the SHA value is significantly smaller than the record, and SHA values are guaranteed to be unique unless the records are indentical.

Negaterd character class uber alles!
  • Comment on (Guildenstern) Re: Re: Taming a memory hog

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://306554]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2020-10-24 17:07 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (246 votes). Check out past polls.