Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Parsing Text from a File to HTML Table

by marinersk (Curate)
on Oct 27, 2013 at 15:03 UTC ( #1059912=note: print w/ replies, xml ) Need Help??


in reply to Parsing Text from a File to HTML Table

As already noted, splitting based on whitespace is a faulty assumption in your algorithm, assuming company names have whitespace in them.

This, in my experience, is a common error for someone parsing a log for the first time so don't feel bad.  :-) I prefer to parse logs based on predictable components. The more wild the potential format, the more complicated the code gets, but for a relatively simple format like the one you are suggesting, I think it's fairly straightforward (assuming you have a basic understanding of Regular Expressions).

You have to craft your Regular Expression to match the data you are expecting. A technique I have become fond of is the use of an if statement, which provides the additional feature of filtering out lines that don't match my preconceived format. I often capture those out to another file for occasional review to see if the parsing routine needs to compensate for previously unknown formats or conditions. I won't do that in this example so we can save space.

C:\Steve\Dev\PerlMonks\P-2013-10-27@0838-Log-Parse>type test1.log GOOD Acme Toy Company 2010-01-01 2011-12-31 BAD XYZZY 1972-01-01 1972-06-18 UGLY Enron 2001-10-01 2011-09-11 C:\Steve\Dev\PerlMonks\P-2013-10-27@0838-Log-Parse>parselog.pl test1.l +og

Status Company Name Start Date End Date
GOOD Acme Toy Company 2010-01-01 2011-12-31
BAD XYZZY 1972-01-01 1972-06-18
UGLY Enron 2001-10-01 2011-09-11

#!/usr/bin/perl use strict; use warnings; # --------------------------------------------------------------- # Parse log with following format: # Status Company Name Start Date End Date # # Assumptions: Status contains no whitespace # Dates are in YYYY-MM-DD format # Company names have nothing that looks like a date # --------------------------------------------------------------- foreach my $inpfnm (@ARGV) { if (!open INPFIL, '<', $inpfnm) { print "ERROR: Cannot open input file '$inpfnm'\n"; } else { print "<HTML>\n"; print "<BODY>\n"; print "<TABLE BORDER>\n"; print " <TR>\n"; print " <TH>Status</TH>\n"; print " <TH>Company Name</TH>\n"; print " <TH>Start Date</TH>\n"; print " <TH>End Date</TH>\n"; print " </TR>\n"; while (my $inpbuf = <INPFIL>) { chomp $inpbuf; if ($inpbuf =~ /^(\w+)\s+(.+)\s+(\d{4}\-\d{2}\-\d{2})\s+(\ +d{4}\-\d{2}\-\d{2})\s*$/) { my $inpsts = $1; my $inpnam = $2; my $stadat = $3; my $enddat = $4; print " <TR>\n"; print " <TD>$inpsts</TD>\n"; print " <TD>$inpnam</TD>\n"; print " <TD>$stadat</TD>\n"; print " <TD>$enddat</TD>\n"; print " </TR>\n"; } } close INPFIL; print "</TABLE>\n"; print "</BODY>\n"; print "</HTML>\n"; } } exit; __END__


Comment on Re: Parsing Text from a File to HTML Table
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1059912]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2015-07-06 21:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (83 votes), past polls