Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Unless your data is sorted and the file's lines (or records) are fixed in length, your solution will never be faster than O(n). However, there's a lot of room for improvement in runtimes even if magnitudes of work don't change.

One aspect to consider is how often you're expecting to see a match in the 16GB input file. If matching records are sparse, you can gain a lot by rejecting non-matches and short-circuiting the loop's iteration as early as possible. Instead of splitting the line, massaging $tab_delimited_array[3], and then running it through Unix_Date and Date_ConvTZ before finally testing to see if $date_converted is the same as $extracted_YMD, couldn't you massage your $date_converted into something that more approximates the raw format of the date presented in the 16GB file? That would allow for faster rejections of unneeded lines.

Second, if it turns out that there are frequent matches in the file, you might be wasting unnecessary time printing often. You could push $_ onto a cache array, and then print the array every 1000 iterations, for example. Then do a final flush after terminating the loop. That would be a small enough chunk as to not introduce memory problems, while at the same time reducing time spent in IO calls.


Dave


In reply to Re: Optimise the script by davido
in thread Optimise the script by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-28 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found