Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

A lot depends on just how big the two files are.   If they are of a size that would comfortably(!) fit in memory, a hash would do nicely.   If they are larger, consider either sorting the two files (which will reduce the problem to a simple “merge”), or perhaps use an [Sqlite?] database file.   If you do the latter, your problem becomes an INNER JOIN.

“Two identically sorted files” is the old-school technique ... that’s literally what they were doing with all those tape drives, in the days of yore ... but it is a good one, especially if one or both of the files are already sorted and can stay that way.   The entire operation can be peformed using one sequential pass through both files, no matter how large they are.   The price-paid is the cost of sorting.   (That cost is amortized if the file, known to be sorted and kept sorted, can then be reused in the future.   A sequential file can be sequentially-updated by a sorted transaction file that is applied to it by appropriate code, and this also occurs in one sequential pass.)

If you use a hash, the operative word is “comfortably.”   If the hash is so large that the operating-system starts paging, a hash can perform exceptionally badly because it makes fairly-random references to memory addresses.   Hashes exhibit the opposite of the “locality of reference” behavior upon which efficient virtual-memory depends.

If you use SQLite, then once again you are paying a stiff file-copying price ... unless you can use the file multiple times in multiple runs ... say, using it as your master-file instead of your present flat-file #1.

In reply to Re: Best way to search file by sundialsvc4
in thread Best way to search file by insta.gator

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2023-10-01 02:05 GMT
Find Nodes?
    Voting Booth?

    No recent polls found