You could probably automate the C&P from your favorite browser (under Win32 at least) or use one of the console browsers (Lynx etc.) under *nix.

The question is, what would you have achieved. Not only would you have used a parser (the one builtin to the browser), but you would have also used its rendering engine, spawned a new process and gone through some form of IPC whether its a pipe or clipboard. And you would still need to apply a regex to the result.

If your going to use a parser, then you mught as well use one of the many available to you via CPAN and avoid all that additional overhead:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



In reply to Re: Re: Scraping HTML: orthodoxy and reality by BrowserUk
in thread Scraping HTML: orthodoxy and reality by grinder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.