Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

How to strip HTML using latest module

by f0dder (Novice)
on Aug 23, 2001 at 02:37 UTC ( #107175=perlquestion: print w/replies, xml ) Need Help??
f0dder has asked for the wisdom of the Perl Monks concerning the following question:

I've been using HTML::FormatText to strip HTML tags. The web pages themselves are relatively simple but if possible I would like to use the latest widgets just in case 'simple' becomes 'complex'. Going through the help documents it appears HTML::parse may be what I want.

But going through the Parse documentation and examples most of it makes no sense to me. Can someone post a small example that starts with a call to a $webpage (ie using LWP::Simple) and finishes off by printing the stripped $webpage.

Replies are listed 'Best First'.
Re: How to strip HTML using latest module
by OeufMayo (Curate) on Aug 23, 2001 at 10:06 UTC

    Here's a version using the HTML::Parser v.2 interface:

    #!/usr/bin/perl -w use strict; use LWP::Simple qw(get); use HTML::Parser; my $parser = Example->new(); my $html = get("") or die "Cannot fetch the HTML\n"; $parser->parse($html); package Example; use base qw(HTML::Parser); sub text { my ($self,$text) = @_; print $text; }

    And here's the same script, but using the HTML::Parser version 3 interface. This one is easier to use because you generally don't have to make a new package to parse the html (though you can, if you really want to!).

    #!/usr/bin/perl -w use strict; use LWP::Simple qw(get); use HTML::Parser; my $html = get(""); my $parser = HTML::Parser->new( text_h => [ sub { print shift }, 'dtext' ] ); $parser->parse($html);
    my $OeufMayo = new PerlMonger::Paris({http => ''});</kbd>
      Sweet!!! Thank You. I tried both examples and they work. I now feel so giddy. I also just learned how to turn on autocomplete in the NT cmd shell. This allows bash like autocomplete in both NT & W2k.

      In HKEY_CURRENT_USER|Software|Microsoft|CommandProcessor change CompletionChar to 9
Re: How to strip HTML using latest module
by f0dder (Novice) on Aug 23, 2001 at 03:30 UTC
    I found an example on the net that works. (I renumbered the source to shorten response).
    However can someone post an example where you don't have to go through the intermediate step of reading from a file? Instead of $parser->parse_file($file) do a $parser->parse_text($html). Where $html comes from a LWP::Simple get call and parse_text($html) I made up.
    Are there any alternatives to using parse_text?
    1 #!/usr/bin/perl -w 2 package Example; 3 use strict; 4 5 require HTML::Parser; 6 7 @Example::ISA = qw(HTML::Parser); 8 my $parser = Example->new; 9 $parser->parse_file('index.html'); 10 11 print $parser->{TEXT}; 12 13 sub text 14 { 15 my ($self,$text) = @_; 16 $self->{TEXT} .= $text; 17 }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://107175]
Approved by root
[erix]: makes me think of MMM too (mythical man month): "adding more people to a project makes it slower"
[erix]: that should be "to an already-too-late- project" I think
[ambrus]: oh yes, we have a pretty good example of that when they reorganized the grill place in the restaurant nearby.
[ambrus]: it now has two more people working there, and they're serving slower and more expensive.
[ambrus]: It used to be just a master cook who takes the order and puts the meat and eggs on the open grill plate table, and an assistant who removes them to a plate, adds the side dish, and gives the plate to the people in the queue, plus a cashier.
[ambrus]: Now it has five people instead of three, some sort of call number ticket system where people wait a lot for their food to get ready (it's the same kinds of grilled meat and fish on the same equipment, it won't actually fry slower),
[ambrus]: it's slow as hell, and the food costs significantly more.
[ambrus]: We no longer eat there.
LanX Everybody quotes it, some people read it, and a few people go by it
[ambrus]: LanX: why would I read it? isn't it a book for managers? I don't want to be a manager.

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2017-09-22 13:49 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (264 votes). Check out past polls.