Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: Process large text data in array

by sundialsvc4 (Abbot)
on Mar 11, 2015 at 02:11 UTC ( [id://1119584]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Process large text data in array
in thread Process large text data in array

I share the opinion that it is quite unnecessary to read a 38MB disk-file into virtual memory in order to process it.   In particular, if when that file becomes, say, “10 times larger than it now is,” your current approach might begin to fail.   It’s just as easy to pass a file-handle around, and to let that be “your input,” as it is to handle a beefy array.   Also consider, if necessary, defining a sub (perhaps, a reference to an anonymous function ...) that can be used to filter each of the lines as they are read:   the while loop simply goes to the next line when this function, say, returns False.

We know that BrowserUK, in his daily $work, deals with enormous datasets in very high-performance situations.   If he says what he just did about your situation, then, frankly, I would take it as a very well-informed directive to “do it that way.”   :-)

Replies are listed 'Best First'.
Re^4: Process large text data in array
by hankcoder (Scribe) on Mar 11, 2015 at 04:22 UTC

    Thanks sundialsvc4 for the feedback. In my "untested" opinion, would it be good approach to have a check in file reading sub if data size > 30mb (example), then use file-handling method, else if smaller size, read all into memory? I'm just assuming the process would be much faster using array if data is lesser compare to using direct file-handling just to read few lines of data. Correct me if I am wrong. and Yes, I am concerned my approach would fail if data increases several times larger.

    I'm still consider new in using reference vars. So feel free to give me more suggestions of where I should look into. I will slowly re-code older subs into using reference as suggested.

    Thanks again for you guys feedback.

      Correct me if I am wrong.

      Try it yourself; then you can correct yourself.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1119584]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-28 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found