Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Browser automation to copy webpage to text

by Athanasius (Archbishop)
on Oct 21, 2015 at 07:57 UTC ( [id://1145524]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Browser automation to copy webpage to text
in thread Browser automation to copy webpage to text

For a Perl solution, you can try PDF::FromHTML — if you can get it to install. :-(

For automated, non-Perl solutions, you can look at something like HTMLDOC (free, but you have to build it from source), or Doxillion Document Converter (not free).

But you’ll probably get the best results by manually saving (or “printing”) the page to PDF format in your browser. For example, in Google Chrome select Print..., then under Destination click the Change button and select Save as PDF. In Firefox, install the “Save as PDF” add-on which places a Save as PDF by pdfcrown.com button on the address bar.

You may be able to automate this browser-based approach from Perl via a module such as WWW::Mechanize::Firefox; but that’s way outside my experience.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

  • Comment on Re^3: Browser automation to copy webpage to text

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1145524]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2024-04-18 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found