It might be useful for you to look at the concept of Finite-State Machine (FSM) algorithms as a source of ideas for generalized solutions to these problems. The essential notion is that, in the simple-minded case of two files, File-A and File-B, (and setting aside for the moment the “continuation-line problem” for just a moment), you can at any point in time break the problem down into “states”
- Initial state: neither file has been read yet.
- You are now at the end-of-file of both files. (Final state.)
- You are at the end-of-file of File-A but not File-B. You are at the end-of-file of File-B but not File-A.
- You have current-records from both files at the same time and ... (additional states are defined based on some comparison between the two records)
The FSM starts in its initial-state and runs until it reaches its final-state. In each iteration, first it reacts to whatever state it is in, then it determines what new-state it is now in.
This general line of reasoning can be applied to your problem, as soon as you deal with the slight “twist” ... that you must “peek ahead” one record to determine if you actually have read the complete intended string. Fortunately, in the case of random-access disk files, that is trivially done: simply note your current-position in the file, attempt to read the next line, then, if you succeed, see if it is a continuation. If so, append it and repeat. If not, reposition the file to the previously-saved position before returning what you have. All of this logic, which actually is irrelevant to the FSM, can be packaged into a short subroutine.
Always bear in mind that, whatever problem you are now facing, it almost certainly is not “new.” FORTRAN had to deal with the strictures of the 80-column punched card in the 1950’s ... and it did so in the same way: with the “continuation line.” Although Perl did not appear until 1987, the tricks of this trade have been with us for a much longer time. (Gaaack! Suddenly I feel old!!)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||