Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: How to extract untouched content of html tag with HTML::Parser

by roboticus (Chancellor)
on Nov 28, 2010 at 16:09 UTC ( [id://874108]=note: print w/replies, xml ) Need Help??


in reply to How to extract untouched content of html tag with HTML::Parser

Lana:

I've not used it in a while, but as I read the documentation, I'd suggest passing "text" rather than "dtext" to the handler specification so it can print the original text rather than the decoded text.

...roboticus

  • Comment on Re: How to extract untouched content of html tag with HTML::Parser

Replies are listed 'Best First'.
Re^2: How to extract untouched content of html tag with HTML::Parser
by Lana (Beadle) on Nov 28, 2010 at 16:11 UTC
    I wish it was that simple :) But it isn't :(
      It is that easy. You have a logic error. Your start handler, which you call start_handler, does no printing. You text handler does printing, but as documented, the text handler handles text not start tags. Also, your end handler does no printing.
        OMG!!! I can't believe I was that blind! Thank you very much! :))

      OK, then, did you look at the htstrip example in the distribution? The documentation (at the end of the EXAMPLES section) indicates that you can modify it to do what you want:

      More examples are found in the eg/ directory of the HTML-Parser distribution: the program hrefsub shows how you can edit all links found in a document; the program htextsub shows how to edit the text only; the program hstrip shows how you can strip out certain tags/elements and/or attributes; and the program htext show how to obtain the plain text, but not any script/style content.

      ...roboticus

        Yes I did examined all examples and played with them alot. But still can't get what I need. I can't understand why using 'text' instead of 'dtext' produces the same result - plain text instead of returning untouched content of that HTML tag...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://874108]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-18 21:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found