http://www.perlmonks.org?node_id=1028019


in reply to Reading concurrently two files with different number of lines

It might be useful for you to look at the concept of Finite-State Machine (FSM) algorithms as a source of ideas for generalized solutions to these problems.   The essential notion is that, in the simple-minded case of two files, File-A and File-B, (and setting aside for the moment the “continuation-line problem” for just a moment), you can at any point in time break the problem down into “states”

  1. Initial state:   neither file has been read yet.
  2. You are now at the end-of-file of both files. (Final state.)
  3. You are at the end-of-file of File-A but not File-B.
  4. You are at the end-of-file of File-B but not File-A.
  5. You have current-records from both files at the same time and ... (additional states are defined based on some comparison between the two records)

The FSM starts in its initial-state and runs until it reaches its final-state.   In each iteration, first it reacts to whatever state it is in, then it determines what new-state it is now in.

This general line of reasoning can be applied to your problem, as soon as you deal with the slight “twist” ... that you must “peek ahead” one record to determine if you actually have read the complete intended string.   Fortunately, in the case of random-access disk files, that is trivially done:   simply note your current-position in the file, attempt to read the next line, then, if you succeed, see if it is a continuation.   If so, append it and repeat.   If not, reposition the file to the previously-saved position before returning what you have.   All of this logic, which actually is irrelevant to the FSM, can be packaged into a short subroutine.

Always bear in mind that, whatever problem you are now facing, it almost certainly is not “new.”   FORTRAN had to deal with the strictures of the 80-column punched card in the 1950’s ... and it did so in the same way:   with the “continuation line.”   Although Perl did not appear until 1987, the tricks of this trade have been with us for a much longer time.   (Gaaack!   Suddenly I feel old!!)

Replies are listed 'Best First'.
Re^2: Reading concurrently two files with different number of lines
by frogsausage (Sexton) on Apr 11, 2013 at 09:03 UTC

    Thank you for the link to Finite-State-Machine. I have made myself a couple of scenarios I could use to resolve my case, each having of course pros and cons.

    In term of following code complexity, it would be much simple to read first and append then compute later.

    As for Fortan, the file format I am dealing with was output by some Fortran code when it was created. However, even though the file can now go further 80 characters per line, it seems the + was there with that limit in mind and has been kept for legacy purposes.

    I am now going to implement that, taking in account all your feedbacks and I will definitely come back to let you know about how it goes! Hopefully it will be for something positive :)

    P.S: you are not old, just some are younger than you, and most probably lots are older than you! However I am of the first kind, never read Fortran.