comment on

Hello all. I am an EE, so while my question may seem to some of you basic, for me it is not. How to efficiently search in a 2 Gig file, w/o risk of running out of memory or two great runtime?

I have the next dillema:
I have 2 text depicting electrical path data. Each one of them can amount to 2Gig. let them be path_p and path_n I have another file, containing paths which were in only one of these files. This file is dif_file Each file may contain two different types, MIN & MAX. i know how to discern one from another These paths are seperated by ^Path (path_number). i.e. each one of these lines means a new path. The file has a header, but not a footer. I need, based on the information in dif_file, find the corresponsing path (using data within it) in the 2Gig file. This better be done with as little runtime as possible from an algorithm point of view, as I don't plan on writing the code in C ;-), and that it would not run out of memory. My initial concpet was this:

1. Get a list for path_n from dif_file
2. get a list for path_n from dif_file
3. Open path_n, read it line by line into a huge hash (reading from ^Path to the next ^Path, and then generating the key, seeing if it fits, keep it, else discard) and then generating the list.
Repeat for path_p
Does anyone have a better concept?

Edit by tye: Preserve formatting

In reply to huge file->hash by ISAI student

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks