Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

RE: Efficiency and Large Arrays

by reptile (Monk)
on Jul 23, 2000 at 02:31 UTC ( #23938=note: print w/ replies, xml ) Need Help??

in reply to Efficiency and Large Arrays

1. Read in a file into a scalar
2. Split scalar into an array using split(/\n{2}/, $scalar)

That's rather wasteful. You can probably do it all at once, if you're slick, iterating over each line in the file as you go. You can probably get #3, and #4 in there too. For #5 (the phone numbers), maybe keep a temporary hash outside the loop with the phone numbers in it, and as you go along, if it doesn't exist in the hash, add it, if it does, drop it. The biggest speed increase you could have, I would imagine, would be getting all of these points into a single loop over the file, and I believe it can be done.

It's a lot of code, and I don't want to embarass myself by coming up with something right now, but some ideas:

Iterate over each line of the file, of course. Keep a variable handy to store the current serial number (if any) and whether a record is open or not (in case there's any line noise between records and such, probably not important). Also, your records hash, and a hash of phone numbers that you're just going to get rid of in the end.

When you get a "SERIAL NUMBER (\d+)" line, put $1 in your current serial number value. When you get a { line, set open to true, and } set it to false. Anything else, while it's open, is stuff to shove into the hash. And, when you get a phone number, check so see if it already exists (in your phone number hash), and if it does, you can just delete() the current record out of the hash (or, do the check when you get a closing brace so if you have stuff after the phone number, it won't re-enter the record).

Oh, I forgot, when you get a SERIAL NUMBER thing, you can do the check there for repeating numbers and figure out a new one. It's not that important that you don't have all the records already, as if your new number is taken by a later record, that later record will be incremented too. However, if that's not the behavior you want, my entire suggestion goes out the window.

I hope you could follow. I could send you some actual code but it would take me some time to write up and test, so /msg me or something if you want some code.

local $_ = "0A72656B636148206C72655020726568746F6E41207473754A"; while(s/..$//) { print chr(hex($&)) }

Comment on RE: Efficiency and Large Arrays
Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://23938]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (9)
As of 2014-10-23 17:19 GMT
Find Nodes?
    Voting Booth?

    For retirement, I am banking on:

    Results (126 votes), past polls