Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Spliting a delimited string into variables

by sundialsvc4 (Abbot)
on Apr 07, 2011 at 17:36 UTC ( #898132=note: print w/ replies, xml ) Need Help??


in reply to Spliting a delimited string into variables

Personally, I am a very big fan of using high-level tools such as HTML::Parser to do as much “heavy lifting” as possible.

My rationale is that:   any HTML document does have a known structure (even if it is not obliged to adhere to it strictly, in actual practice), and that, “anytime you are dealing with a complex document having any known structure, the best way to deal with such a thing is to use a parser.”

There are many, many good parsing engines in Perl.   One that I recently had the privilege of beating to a bloody pulp (wink... it proceeded to do everything I asked it to, and more!!) was Parse::RecDescent.   (I am still in awe of its author!)   But in this case, the source-language is “simply HTML,” and HTML-specific tools abound.

All parsing engines are, so to speak, “engines that are really, really good at character-twiddling and which know the lay of the land.”   You rely upon them to go about their business and to call your code at strategic points, and to return data structures to you at those times.

This is, IMHO, a much stronger strategy than “regex hell,” which often yields solutions that work fine in initial test-cases but then require constant twiddling and head-banging.   Let the CPAN-authors do as much banging on your behalf as possible.   It will not, of course, eliminate the considerable amount of work that still remains to be done, but it might well make that work vastly easier.

HTH ...


Comment on Re: Spliting a delimited string into variables
Re^2: Spliting a delimited string into variables
by pissflaps (Initiate) on Apr 07, 2011 at 19:55 UTC

    Thank you for such an informative response! I'll be sure to look into more about Parse::RecDescent, but for now that may be too daunting to pick up for a novice. Is there an example using HTML::Parser you could describe for using in this situation? I'm unfamiliar with basically any module outside of CGI. :(

Re^2: Spliting a delimited string into variables
by Popcorn Dave (Abbot) on Apr 07, 2011 at 20:03 UTC
    I've got to second the vote for HTML::Parser or similar parsing engines.

    A long time ago, before RSS feeds, I wrote a program to parse various newspaper websites and did the regexes by hand. I had 24 different rules for 90+ papers. When I rewrote it, I got it down to 9 rules, mainly based on web page design, since I used a parsing engine.

    You're going to save yourself a ton of work since if the data changes you're going to have to rewrite your regexes each time.


    To disagree, one doesn't have to be disagreeable - Barry Goldwater

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://898132]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-12-28 10:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (180 votes), past polls