Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How to use HTML::Parser to encode text with HTML entities?

by locust (Sexton)
on Dec 01, 2010 at 17:49 UTC ( [id://874701]=perlquestion: print w/replies, xml ) Need Help??

locust has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks!

I have a string that contains an HTML file. What I'd like to do is first decode any HTML entities contained in the text (only!) and then encode the text with entities that I can specify. What I want returned is the entire string, in the same order that it was in, with just the text encoded with HTML entities.

I have assumed that using a combination of HTML::Parser and HTML::Entities is the best way to achieve my goal, but if you have a better way, then let me here it

Anyhow, anyone know how to do this? I don't have much experience with HTML::Parser, and the documentation is not really clear to me on how to do this.

Thanks

Update

I used the HTML::TokeParser::Simple module and HTML::Entities to get the solution:

use HTML::Entities; use HTML::TokeParser::Simple; my $html = <some file>; #this is shorthand for example..assume the Fil +e has been opened in slurp mode my $parsed = parseHTML($html); sub parseHTML { my $html = shift; my $parsed; my $p = HTML::TokeParser::Simple->new(\$html); while ( my $token = $p->get_token ) { # This prints all text in an HTML doc (i.e., it strips the HTM +L) if ($token->is_text) { my $text = $token->as_is; encode_entities($text, '",' ); $parsed .= $text; } else { $parsed .= $token->as_is; } } return $parsed; }

Thanks!

Replies are listed 'Best First'.
Re: How to use HTML::Parser to encode text with HTML entities?
by Your Mother (Archbishop) on Dec 01, 2010 at 18:09 UTC

        It wasn't given here as the way to do stripping but as an approach to simple parsing with custom tags toward any end.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://874701]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-20 01:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found