Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^2: Process large text data in array

by hankcoder (Scribe)
on Mar 10, 2015 at 15:11 UTC ( #1119525=note: print w/replies, xml ) Need Help??

in reply to Re: Process large text data in array
in thread Process large text data in array

BrowserUk Thanks for pointing it out. The process not only checking just for "active" value, there are more checking, it is only sample. I built the codes into sub so it is easier for me to refer and debug in future.

I'm more prefer to use separate sub calling to get the file content instead of using

while( <TEMPFILE> ) { processLine( $_ ) }

in every part of codes I'm going to retrieve the file content. I'm taking your notes, will do more test for every codes of it. Thanks.

Replies are listed 'Best First'.
Re^3: Process large text data in array
by SuicideJunkie (Vicar) on Mar 10, 2015 at 15:34 UTC

    Swapping back and forth (and back and forth again) between the hash and a list is still inefficient.

    Use a hash reference instead, so it won't have to make multiple copies of your hash contents.

Re^3: Process large text data in array
by sundialsvc4 (Abbot) on Mar 11, 2015 at 02:11 UTC

    I share the opinion that it is quite unnecessary to read a 38MB disk-file into virtual memory in order to process it.   In particular, if when that file becomes, say, “10 times larger than it now is,” your current approach might begin to fail.   It’s just as easy to pass a file-handle around, and to let that be “your input,” as it is to handle a beefy array.   Also consider, if necessary, defining a sub (perhaps, a reference to an anonymous function ...) that can be used to filter each of the lines as they are read:   the while loop simply goes to the next line when this function, say, returns False.

    We know that BrowserUK, in his daily $work, deals with enormous datasets in very high-performance situations.   If he says what he just did about your situation, then, frankly, I would take it as a very well-informed directive to “do it that way.”   :-)

      Thanks sundialsvc4 for the feedback. In my "untested" opinion, would it be good approach to have a check in file reading sub if data size > 30mb (example), then use file-handling method, else if smaller size, read all into memory? I'm just assuming the process would be much faster using array if data is lesser compare to using direct file-handling just to read few lines of data. Correct me if I am wrong. and Yes, I am concerned my approach would fail if data increases several times larger.

      I'm still consider new in using reference vars. So feel free to give me more suggestions of where I should look into. I will slowly re-code older subs into using reference as suggested.

      Thanks again for you guys feedback.

        Correct me if I am wrong.

        Try it yourself; then you can correct yourself.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1119525]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2020-12-01 13:02 GMT
Find Nodes?
    Voting Booth?
    How often do you use taint mode?

    Results (6 votes). Check out past polls.