Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: How to extract untouched content of html tag with HTML::Parser

by Lana (Beadle)
on Nov 28, 2010 at 16:11 UTC ( [id://874109]=note: print w/replies, xml ) Need Help??


in reply to Re: How to extract untouched content of html tag with HTML::Parser
in thread How to extract untouched content of html tag with HTML::Parser

I wish it was that simple :) But it isn't :(
  • Comment on Re^2: How to extract untouched content of html tag with HTML::Parser

Replies are listed 'Best First'.
Re^3: How to extract untouched content of html tag with HTML::Parser
by Anonymous Monk on Nov 28, 2010 at 17:26 UTC
    It is that easy. You have a logic error. Your start handler, which you call start_handler, does no printing. You text handler does printing, but as documented, the text handler handles text not start tags. Also, your end handler does no printing.
      OMG!!! I can't believe I was that blind! Thank you very much! :))
        I can believe it, it happens to me every day, usually in between naps and coffee breaks
Re^3: How to extract untouched content of html tag with HTML::Parser
by roboticus (Chancellor) on Nov 28, 2010 at 16:40 UTC

    OK, then, did you look at the htstrip example in the distribution? The documentation (at the end of the EXAMPLES section) indicates that you can modify it to do what you want:

    More examples are found in the eg/ directory of the HTML-Parser distribution: the program hrefsub shows how you can edit all links found in a document; the program htextsub shows how to edit the text only; the program hstrip shows how you can strip out certain tags/elements and/or attributes; and the program htext show how to obtain the plain text, but not any script/style content.

    ...roboticus

      Yes I did examined all examples and played with them alot. But still can't get what I need. I can't understand why using 'text' instead of 'dtext' produces the same result - plain text instead of returning untouched content of that HTML tag...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://874109]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-25 09:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found