Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Text File Parsing

by dr014578 (Initiate)
on Nov 24, 2010 at 17:00 UTC ( #873505=perlquestion: print w/ replies, xml ) Need Help??
dr014578 has asked for the wisdom of the Perl Monks concerning the following question:

Thank you to everyone for your information/guidance. I was able to get the data successfully put into a hash and got the Text::Table output to work also. Now if i can complicate it a bit more, within the same column after the nickname there is a list of disks associated with each host and there could be multiples for each one (i updated input file below to show example). Sorry i didn't put this in originally but i'm taking baby steps. Is it possible to do this in one hash? should i put the disk info into an array somehow and create a reference to that array in the hash table or should i try and split this into two hashs and possbily join them by unique key?. Thanks again.

I'm looking to parse a single column text file and split repeating data in that file into record/table format for output. Below is a sample of the input file and what I'd like to see for the output. I'm not looking for sample code just some direction to which perldoc i should be reading to get me started. Thank you.

<input file>
portID=0
portName=1A
domainID=0
hostMode=Standard
nickname=host1
disk=00:81
disk=00:79
disk=01:34
portID=0
portName=1A
domainID=1
hostMode=AIX
nickname=host2
portID=0
portName=1A
domainID=10
hostMode=HP
nickname=host3


<output>
PortID PortName DomainID HostMode Nickname
0 1A 0 Standard host1
0 1A 1 AIX host2
0 1A 10 HP host3


Comment on Text File Parsing
Re: Text File Parsing
by umasuresh (Hermit) on Nov 24, 2010 at 17:06 UTC
    I recommend Data Munging with Perl, read Chapter2 which is available to download. This will help you get started. Good Luck!
Re: Text File Parsing
by Anonymous Monk on Nov 24, 2010 at 17:36 UTC
Re: Text File Parsing
by biohisham (Priest) on Nov 24, 2010 at 18:12 UTC
    Data Dumper to visualize your data structures, Tutorials->References-> particularly the topics on hashes of arrays (HoAs), The split function and Text::Table to format your textual data are some of the things you can look into to get this job done...

    The general steps involved can be as follow:

    1. loop through your data file.
    2. split each line into a key and a value.
    3. push into the HoA the values referenced by the key.
    4. Loop through your HoA to print the lines or use Text::Table to align table columns neatly
    You may also need to deal with missing values and make sure that all your records are five-attributes long...



    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
Re: Text File Parsing
by sundialsvc4 (Monsignor) on Nov 24, 2010 at 19:21 UTC

    Here is a simple run-down of a suitable approach:

    1. Carefully read all of the perldocs that just have been pushed at you.   Also get to know the CPAN library and the various modules that are listed there.   Your goal should be to have to do as little original work as possible.   Actum Ne Agas:   “Do Not Do A Thing Already Done.”   You did not get an “RTFM” brush-off response.
    2. Your code will read the file line-by-line, using a regular expression (or split) to divide the line into two parts.   The left part is the keyword; the right part is the value.
    3. As you read each line, you will accumulate the (keyword, value) pairs.   A hash is the most-loigical way to do this.
    4. Although all the lines seem to look alike, there is one kind of record which will identify “the start of something new,” such that an output-record needs to be written (and previously accumulated values discarded so that you do not hang on to “stale data”) before starting to capture the new record.
    5. When the file-reading loop ends, if there are any accumulated values, a final output-record needs to be written for these, as well.   (Repeated tasks such as “writing the output record” are a logical place to use a sub.)
    6. When writing programs like these, I like to be defensive.   I like to think that the file-reading program ought to be the one that detects that “this input file is bogus,” if it is, since this program is clearly in the best position to do so.   (So, “if the program ran successfully, the contents of the file are more-or-less good.”)

      Regarding points 4 and 5 in the response from sundialsvc4, if an "end-of-record" identifier can be isolated (nickname in this case), then there is no need to do a separate write for accumulated values at end-of-file, or add logic for the first "start of record" line (where the accumulated values would be empty).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://873505]
Approved by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-07-25 03:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (167 votes), past polls