Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Parsing Text from a File to HTML Table

by marinersk (Chaplain)
on Oct 27, 2013 at 15:03 UTC ( #1059912=note: print w/ replies, xml ) Need Help??


in reply to Parsing Text from a File to HTML Table

As already noted, splitting based on whitespace is a faulty assumption in your algorithm, assuming company names have whitespace in them.

This, in my experience, is a common error for someone parsing a log for the first time so don't feel bad.  :-) I prefer to parse logs based on predictable components. The more wild the potential format, the more complicated the code gets, but for a relatively simple format like the one you are suggesting, I think it's fairly straightforward (assuming you have a basic understanding of Regular Expressions).

You have to craft your Regular Expression to match the data you are expecting. A technique I have become fond of is the use of an if statement, which provides the additional feature of filtering out lines that don't match my preconceived format. I often capture those out to another file for occasional review to see if the parsing routine needs to compensate for previously unknown formats or conditions. I won't do that in this example so we can save space.

C:\Steve\Dev\PerlMonks\P-2013-10-27@0838-Log-Parse>type test1.log GOOD Acme Toy Company 2010-01-01 2011-12-31 BAD XYZZY 1972-01-01 1972-06-18 UGLY Enron 2001-10-01 2011-09-11 C:\Steve\Dev\PerlMonks\P-2013-10-27@0838-Log-Parse>parselog.pl test1.l +og

Status Company Name Start Date End Date
GOOD Acme Toy Company 2010-01-01 2011-12-31
BAD XYZZY 1972-01-01 1972-06-18
UGLY Enron 2001-10-01 2011-09-11

#!/usr/bin/perl use strict; use warnings; # --------------------------------------------------------------- # Parse log with following format: # Status Company Name Start Date End Date # # Assumptions: Status contains no whitespace # Dates are in YYYY-MM-DD format # Company names have nothing that looks like a date # --------------------------------------------------------------- foreach my $inpfnm (@ARGV) { if (!open INPFIL, '<', $inpfnm) { print "ERROR: Cannot open input file '$inpfnm'\n"; } else { print "<HTML>\n"; print "<BODY>\n"; print "<TABLE BORDER>\n"; print " <TR>\n"; print " <TH>Status</TH>\n"; print " <TH>Company Name</TH>\n"; print " <TH>Start Date</TH>\n"; print " <TH>End Date</TH>\n"; print " </TR>\n"; while (my $inpbuf = <INPFIL>) { chomp $inpbuf; if ($inpbuf =~ /^(\w+)\s+(.+)\s+(\d{4}\-\d{2}\-\d{2})\s+(\ +d{4}\-\d{2}\-\d{2})\s*$/) { my $inpsts = $1; my $inpnam = $2; my $stadat = $3; my $enddat = $4; print " <TR>\n"; print " <TD>$inpsts</TD>\n"; print " <TD>$inpnam</TD>\n"; print " <TD>$stadat</TD>\n"; print " <TD>$enddat</TD>\n"; print " </TR>\n"; } } close INPFIL; print "</TABLE>\n"; print "</BODY>\n"; print "</HTML>\n"; } } exit; __END__


Comment on Re: Parsing Text from a File to HTML Table
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1059912]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2014-12-28 21:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls