Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: I want to save web pages as text rather than as HTML.

by jcb (Deacon)
on Sep 06, 2019 at 23:01 UTC ( #11105758=note: print w/replies, xml ) Need Help??


in reply to I want to save web pages as text rather than as HTML.

Modern Mozilla browsers do not save the page source anymore; they serialize the DOM tree instead. If the information you seek is not in the page source, but does appear when saved, then it is being added to the page using JavaScript. You will need to use the Web Developer tools (Network tab) in Firefox to find the request that loads that data and figure out how to replicate that request and parse the response (probably JSON) in your Perl code.

Finding the request you need to make is the hard part. Making the request with LWP::UserAgent and parsing the response with JSON should be easy.

  • Comment on Re: I want to save web pages as text rather than as HTML.

Replies are listed 'Best First'.
Re^2: I want to save web pages as text rather than as HTML.
by anautismobserver (Sexton) on Sep 12, 2019 at 02:33 UTC

    << You will need to use the Web Developer tools (Network tab) in Firefox to find the request that loads that data and figure out how to replicate that request and parse the response (probably JSON) in your Perl code. >>

    Can you give me guidance regarding how to go about this? Or link to somewhere that explains it for novices like me?

    Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11105758]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2020-01-24 02:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?