http://www.perlmonks.org?node_id=578401


in reply to Re: Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')
in thread Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^3: Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')
by blazar (Canon) on Oct 15, 2006 at 17:36 UTC
    Gosh! You didn't even take a look at what lynx -dump produces, did you?

    He didn't claim it would produce the same output, nor comparable one. He just pointed out it has a method for outputting plain text, which it has. Indeed I think it more or less amounts to the as_text() of the whole parse tree of the wanted page. Lynx and its variations are full fledged browser, so it is natural they go beyond the capabilities of a simple parser, aiming at being presentation friendly. But that's quite a lot of work. You may hack/roll your own by inserting horizontal and vertical whitespace suitably around individual elements before printing them as_text. Needless to say, this is necessarily going to be quite a lot of work, but maybe just inserting newlines after every single one of them may make everything more clear. Oh, and at the very least take care of paragraphs and breaks. But if you also want line wrap that's a whole another story. (A call for Text::Wrap, most probably.)

    OTOH did you look at the outcome of your post (as is recommended)?!? It screwed up the whole view for this thread. Use <code> tags around the stuff you pasted, although it's not strictly code. At least that has smart line wrap...

    Update: the post has been fixed, hence the above comment does not apply any more.

    Ciao

    A reply falls below the community's threshold of quality. You may see it by logging in.