We don't bite newbies here... much | |
PerlMonks |
Parse Loops with flat text files. (code)by deprecated (Priest) |
on May 13, 2001 at 18:31 UTC ( [id://80040]=perlquestion: print w/replies, xml ) | Need Help?? |
deprecated has asked for the wisdom of the Perl Monks concerning the following question:
I've had the pleasure of hacking through three types of flat text file recently. They are
the entire pile of RFC's from http://www.faqs.org 3000 alcoholic beverage recipes (from somewhere I, mysteriously, cannot remember) So with these three programs (in about 2 weeks) I have had to use some sort of start parsing - parse - stop parsing loop three times. I've even pondered writing a small module to do it for me (not for CPAN, probably would post it here, but just something to keep in my homedir to ease future scripts). This is something that has undoubtedly been done zillions of times. After all, what is perl but a <!- pathologically eclectic RUBBISH lister!!!!!! ->parser? So whilst working on making my code readable, I stumbled upon (see Using arrays of qr!! to simplify larger RE's for readability (code). and Optimization for readability and speed (code)) the use of arrays of qr!! and iterate through them when matching text. This allows some flexibility (i.e., mulitple "start" and "finish" conditions), and it also is pretty clear to read (as it reduces the size of the individual regular expressions). But looking over the code, I dont get a good "satisfied" feeling re-using it. So, here it is, and I'd like to know what others would do instead:
I'm familiar with HTML::TokeParser and HTML::Parser, but since I do this a lot on non-HTML files, I'd like to extract the good parts with my loop and use the parsing modules to parse the stuff I want to parse (rather than the gristle).
thanks
--
Back to
Seekers of Perl Wisdom
|
|