Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: HTML <=> Text convertion

by Aragorn (Curate)
on Dec 10, 2003 at 11:32 UTC ( #313690=note: print w/ replies, xml ) Need Help??


in reply to HTML <=> Text convertion

You can use an external text-browser like lynx to do the hard work for you. Open a pipe to lynx -dump <url> and read the resulting text-rendered page.

Arjen


Comment on Re: HTML <=> Text convertion
Select or Download Code
Re: Re: HTML <=> Text convertion
by TVSET (Chaplain) on Dec 10, 2003 at 11:51 UTC
      Well, if it's your last resort, you are wasting a huge amount of your time and effort. Lazy Programmers -- and you do aspire to be one -- always use the quickest solution first.

      I prefer w3m -dump over lynx for generating plain text from HTML. It handles tables properly. It runs CGI locally for testing HTML output.

      If you are wanting text you can reformat easily, use the -cols option. It's your friend for stripping markup.

      --
      bowling trophy thieves, die!

        There was a reason I wanted to do it the "Perl-way". I am not the only root on the system, but I pretty much the only doing Perl there. Therefor, nothing Perl-related changes without my knowledge on that machine. Lynx/links/w3m though can be removed/upgraded without me noticing. Easy to fix, I know, but good enough reason for me to try to find something else as a solution. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://313690]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (13)
As of 2014-12-29 15:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (192 votes), past polls