There's more than one way to do things | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hello anautismobserver and welcome to the monastery and to the wonderful world of perl!
Perl is powerful enough to achieve this with a oneliner (pay attention to windows doublequotes) perl -MHTML::TreeBuilder -e "print HTML::TreeBuilder->new_from_url('http://perl.org')->as_text" The above combines two steps: getting the raw html content from the url (using LWP::UserAgent under the hood) and formatting the output as text. Web scraping is a dark art and could be achieved in many distinct ways. You can follow some link in my bibliotheca: web scraping or visit previous threads like Re: How can I download HTML and save it as txt? As you presented yourself as a principiant please note that the -M switch of perl import a module as described in perlrun and the concatenations of methods ( ->new_from_url(..)->as_text ) is just a shortcut to avoid unnecessary variable declaration. PS you can also use other modules to do the web scrape part as suggested by Task::Kensho that is a fairly good collection of modules from CPAN. Also other modules are worth to try like Mojo::Dom or Web::Scraper as suggested in The State of Web spidering in Perl L*
There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS. In reply to Re: I want to save web pages as text rather than as HTML. -- oneliner
by Discipulus
|
|