Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: A grammar for HTML matching

by brainpan (Monk)
on Nov 01, 2000 at 14:34 UTC ( #39395=note: print w/replies, xml ) Need Help??


in reply to A grammar for HTML matching

So far it sounds like people aren't interested in this idea. FWIW, I would make good use of something like this; as it stands now I have (hangs his head) a few shell scripts doing something similar for me (it was before I knew that 'Perl Syntax' wasn't an oxymoron; I had been led to believe that random ASCII strings were invariably valid perl code). Three days ago one of the sites changed their site design (they seem to have intentionally broken the strict hierarchy I was counting on), and I have yet to get around to figuring out what their new layout is. Having some standardized syntax for "find FOO, then parse out everything until BAZ" would solve this, and if done right would even survive all but the most severe site redesigns.

While I see the objections to departing from HTML::Parser, I agree with mcelrath that using a regex to skip past 85K to the 4K of text that you actually want (in many cases a simple grep is all that's needed), would be a Good Thing. If HTML::Parser ends up being part of the solution (if only after the desired portion of the document is reached), then so be it.

Having voiced his support for mcelrath's idea, brainpan steps down from his soapbox.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://39395]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2019-06-25 12:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (105 votes). Check out past polls.

    Notices?