Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

It might be useful for you to look at the concept of Finite-State Machine (FSM) algorithms as a source of ideas for generalized solutions to these problems.   The essential notion is that, in the simple-minded case of two files, File-A and File-B, (and setting aside for the moment the “continuation-line problem” for just a moment), you can at any point in time break the problem down into “states”

  1. Initial state:   neither file has been read yet.
  2. You are now at the end-of-file of both files. (Final state.)
  3. You are at the end-of-file of File-A but not File-B.
  4. You are at the end-of-file of File-B but not File-A.
  5. You have current-records from both files at the same time and ... (additional states are defined based on some comparison between the two records)

The FSM starts in its initial-state and runs until it reaches its final-state.   In each iteration, first it reacts to whatever state it is in, then it determines what new-state it is now in.

This general line of reasoning can be applied to your problem, as soon as you deal with the slight “twist” ... that you must “peek ahead” one record to determine if you actually have read the complete intended string.   Fortunately, in the case of random-access disk files, that is trivially done:   simply note your current-position in the file, attempt to read the next line, then, if you succeed, see if it is a continuation.   If so, append it and repeat.   If not, reposition the file to the previously-saved position before returning what you have.   All of this logic, which actually is irrelevant to the FSM, can be packaged into a short subroutine.

Always bear in mind that, whatever problem you are now facing, it almost certainly is not “new.”   FORTRAN had to deal with the strictures of the 80-column punched card in the 1950’s ... and it did so in the same way:   with the “continuation line.”   Although Perl did not appear until 1987, the tricks of this trade have been with us for a much longer time.   (Gaaack!   Suddenly I feel old!!)


In reply to Re: Reading concurrently two files with different number of lines by sundialsvc4
in thread Reading concurrently two files with different number of lines by frogsausage

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-03-19 08:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found