Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Re: Re: Stopping HTML::Parser removing all tags in body

by andreychek (Parson)
on Jun 27, 2001 at 21:03 UTC ( #91986=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Stopping HTML::Parser removing all tags in body
in thread Stopping HTML::Parser removing all tags in body

I think what might really help is if you were able to post the actual Perl code you have which makes use of HTML::Parser. It's often difficult to determine the problem when the only information we have is one of the symptoms. However, if you post your code, we'll use our special Monk stethascopes to diagnose your problem :-) Thanks, -Eric


Comment on Re: Re: Re: Stopping HTML::Parser removing all tags in body
Re: Stopping HTML::Parser removing all tags in body
by Anonymous Monk on Jun 27, 2001 at 22:08 UTC
    In my rush for avatar-hood I forget to include the code!! :-D forgive my impulsiveness... cut away doctors I've modifed another program to do the stuff I want...
    #!/usr/bin/perl -w use strict; package GetUrls; use vars qw(@ISA); @ISA = qw(HTML::Parser); require HTML::Parser; use strict; my $parser = new GetUrls; $parser->parse_file('C:\eechoice\pmp-forms-english.htm'); for (keys %{$parser->{URLS}}) { print "$_ $parser->{URLS}{$_}\n"; } sub start { my($self,$tag,$attr,$attrseq,$orig) = @_; ################### TAGS TO KEEP ######################### if ( $tag eq 'li') { $self->{got_li}++; print "<$tag>"; } if ( $tag eq 'table') { $self->{got_table}++; print "<$tag>"; } if ( $tag eq 'tr') { $self->{got_tr}++; print "<$tag>"; } if ( $tag eq 'td') { $self->{got_td}++; print "<$tag>"; } if ( $tag eq 'th') { $self->{got_th}++; print "<$tag>"; } if ( $tag eq 'span') { $self->{got_span}++; print "<$tag>"; } if ( $tag eq 'ul') { $self->{got_ul}++; print "<$tag>"; } #################### TAGS TO STRIP ######################## if ( $tag eq 'a') { if ($self->{cur_url} = $attr->{href}) { $self->{got_href}++; } } if ( $tag eq 'p') { $self->{got_p}++; } if ( $tag eq 'body') { $self->{got_body}++; } if ( $tag eq 'meta') { $self->{got_meta}++; } if ( $tag eq 'html') { $self->{got_html}++; } if ( $tag eq 'head') { $self->{got_head}++; } if ( $tag eq 'title') { $self->{got_title}++; } if ( $tag eq 'br') { $self->{got_br}++; print "\n"} if ( $tag eq 'font') { $self->{got_font}++; } if ( $tag eq 'form') { $self->{got_form}++; } if ( $tag eq 'input') { $self->{got_input}++; } } sub end { my ($self,$tag) = @_; ################### TAGS TO KEEP ######################### if ($tag eq 'li' && $self->{got_li} ) {$self->{got_li}--; print "</$ +tag>";} if ($tag eq 'table' && $self->{got_table} ) {$self->{got_table}--; p +rint "</$tag>";} if ($tag eq 'tr' && $self->{got_tr} ) {$self->{got_tr}--; print "</$ +tag>";} if ($tag eq 'td' && $self->{got_td} ) {$self->{got_td}--; print "</$ +tag>";} if ($tag eq 'th' && $self->{got_th} ) {$self->{got_th}--; print "</$ +tag>";} if ($tag eq 'span' && $self->{got_span} ) {$self->{got_span}--; prin +t "</$tag>";} if ($tag eq 'ul' && $self->{got_ul} ) {$self->{got_ul}--; print "</$ +tag>";} ################### TAGS TO STRIP ######################## if ($tag eq 'a' && $self->{got_href} ) {$self->{got_href}--; } if ($tag eq 'body' && $self->{got_body} ) {$self->{got_body}--; } if ($tag eq 'p' && $self->{got_p} ) {$self->{got_p}--; } if ($tag eq 'meta' && $self->{got_meta} ) {$self->{got_meta}--; } if ($tag eq 'html' && $self->{got_html} ) {$self->{got_html}--; } if ($tag eq 'head' && $self->{got_head} ) {$self->{got_head}--; } if ($tag eq 'title' && $self->{got_title} ) {$self->{got_title}--; } if ($tag eq 'br' && $self->{got_br} ) {$self->{got_br}--; } if ($tag eq 'font' && $self->{got_font} ) {$self->{got_font}--; } if ($tag eq 'form' && $self->{got_form} ) {$self->{got_form}--; } if ($tag eq 'input' && $self->{got_input} ) {$self->{got_input}--; } + } sub text { my ($self,$text ) = @_; ######################### KEEP ALL TEXT ######################## if ($self->{got_href} ) { print $text; # $self->{URLS}{$self->{cur_url}} .= $text; } # if ($self->{got_body} ) { # print $text; # } if ($self->{got_li} ) { print $text; } if ($self->{got_p} ) { print $text; } if ($self->{got_span} ) { print $text; } if ($self->{got_ul} ) { print $text; } # if ($self->{got_html} ) { # print $text; # } if ($self->{got_title} ) { print "<TITLE>$text</TITLE>\n"; } if ($self->{got_font} ) { print "$text"; } ###################### EXCEPT ########################## # meta, head, br, form, input }
Re: Stopping HTML::Parser removing all tags in body
by NodeReaper (Curate) on Jun 27, 2001 at 22:15 UTC
Re: Stopping HTML::Parser removing all tags in body
by NodeReaper (Curate) on Jun 27, 2001 at 22:16 UTC
Re: Stopping HTML::Parser removing all tags in body
by NodeReaper (Curate) on Jun 27, 2001 at 22:22 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://91986]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2014-08-30 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (291 votes), past polls