Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Text File Parsing

by sundialsvc4 (Abbot)
on Nov 24, 2010 at 19:21 UTC ( [id://873530]=note: print w/replies, xml ) Need Help??


in reply to Text File Parsing

Here is a simple run-down of a suitable approach:

  1. Carefully read all of the perldocs that just have been pushed at you.   Also get to know the CPAN library and the various modules that are listed there.   Your goal should be to have to do as little original work as possible.   Actum Ne Agas:   “Do Not Do A Thing Already Done.”   You did not get an “RTFM” brush-off response.
  2. Your code will read the file line-by-line, using a regular expression (or split) to divide the line into two parts.   The left part is the keyword; the right part is the value.
  3. As you read each line, you will accumulate the (keyword, value) pairs.   A hash is the most-loigical way to do this.
  4. Although all the lines seem to look alike, there is one kind of record which will identify “the start of something new,” such that an output-record needs to be written (and previously accumulated values discarded so that you do not hang on to “stale data”) before starting to capture the new record.
  5. When the file-reading loop ends, if there are any accumulated values, a final output-record needs to be written for these, as well.   (Repeated tasks such as “writing the output record” are a logical place to use a sub.)
  6. When writing programs like these, I like to be defensive.   I like to think that the file-reading program ought to be the one that detects that “this input file is bogus,” if it is, since this program is clearly in the best position to do so.   (So, “if the program ran successfully, the contents of the file are more-or-less good.”)

Replies are listed 'Best First'.
Re^2: Text File Parsing
by mjscott2702 (Pilgrim) on Nov 25, 2010 at 10:25 UTC
    Regarding points 4 and 5 in the response from sundialsvc4, if an "end-of-record" identifier can be isolated (nickname in this case), then there is no need to do a separate write for accumulated values at end-of-file, or add logic for the first "start of record" line (where the accumulated values would be empty).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://873530]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-24 17:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found