Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

How to strip HTML using latest module

by f0dder (Novice)
on Aug 23, 2001 at 02:37 UTC ( #107175=perlquestion: print w/ replies, xml ) Need Help??
f0dder has asked for the wisdom of the Perl Monks concerning the following question:

I've been using HTML::FormatText to strip HTML tags. The web pages themselves are relatively simple but if possible I would like to use the latest widgets just in case 'simple' becomes 'complex'. Going through the help documents it appears HTML::parse may be what I want.
http://aspn.activestate.com/ASPN/Reference/Products/ActivePerl/site/lib/HTML/Filter.html

But going through the Parse documentation and examples most of it makes no sense to me. Can someone post a small example that starts with a call to a $webpage (ie using LWP::Simple) and finishes off by printing the stripped $webpage.

Comment on How to strip HTML using latest module
Re: How to strip HTML using latest module
by f0dder (Novice) on Aug 23, 2001 at 03:30 UTC
    I found an example on the net that works. (I renumbered the source to shorten response).
    However can someone post an example where you don't have to go through the intermediate step of reading from a file? Instead of $parser->parse_file($file) do a $parser->parse_text($html). Where $html comes from a LWP::Simple get call and parse_text($html) I made up.
    Are there any alternatives to using parse_text?
    1 #!/usr/bin/perl -w 2 package Example; 3 use strict; 4 5 require HTML::Parser; 6 7 @Example::ISA = qw(HTML::Parser); 8 my $parser = Example->new; 9 $parser->parse_file('index.html'); 10 11 print $parser->{TEXT}; 12 13 sub text 14 { 15 my ($self,$text) = @_; 16 $self->{TEXT} .= $text; 17 }
Re: How to strip HTML using latest module
by OeufMayo (Curate) on Aug 23, 2001 at 10:06 UTC

    Here's a version using the HTML::Parser v.2 interface:

    #!/usr/bin/perl -w use strict; use LWP::Simple qw(get); use HTML::Parser; my $parser = Example->new(); my $html = get("http://www.perlmonks.org") or die "Cannot fetch the HTML\n"; $parser->parse($html); package Example; use base qw(HTML::Parser); sub text { my ($self,$text) = @_; print $text; }

    And here's the same script, but using the HTML::Parser version 3 interface. This one is easier to use because you generally don't have to make a new package to parse the html (though you can, if you really want to!).

    #!/usr/bin/perl -w use strict; use LWP::Simple qw(get); use HTML::Parser; my $html = get("http://www.perlmonks.org"); my $parser = HTML::Parser->new( text_h => [ sub { print shift }, 'dtext' ] ); $parser->parse($html);
    <kbd>--
    my $OeufMayo = new PerlMonger::Paris({http => 'paris.mongueurs.net'});</kbd>
      Sweet!!! Thank You. I tried both examples and they work. I now feel so giddy. I also just learned how to turn on autocomplete in the NT cmd shell. This allows bash like autocomplete in both NT & W2k.

      In HKEY_CURRENT_USER|Software|Microsoft|CommandProcessor change CompletionChar to 9

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://107175]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2014-09-18 15:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (116 votes), past polls