Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Extracting Text After <pre> tag in HTML

by gellyfish (Monsignor)
on Sep 22, 2006 at 21:25 UTC ( #574437=note: print w/ replies, xml ) Need Help??


in reply to Extracting Text After <pre> tag in HTML

Just for the sake of completeness here's how you might do it with HTML::Parser:

use HTML::Parser; my $VAR1 = '<html><title>GAL7</title> <body bgcolor=white> <h2 align=center>GAL7</h2><hr> <form method="post" action="/cgi-bin/SCPD/getgene2?GAL7" enctype="appl +ication/x-www-form-urlencoded"> <input type="submit" name="action" value="Get mapped sites" /><input t +ype="submit" name="action" value="Get putative sites" /><input type=" +submit" name="action" value="Get interg enic region" /><br /><input type="submit" name="action" value="Retriev +e sequence" />Start<-ATG <input type="text" name="start" value="-450" + size="5" maxlength="5" />ATG->End <inp ut type="text" name="end" value="50" size="5" maxlength="5" /><div></d +iv></form><hr> <pre> >YBR018C GAL7 275433 275933 TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA </pre>Some other stuff</body></html>'; sub default_start { my ($self, $tagname) = @_; if ( $tagname eq 'pre' ) { $self->handler(text => \&get_text, "self,dtext"); $self->handler(end => \&end_text, "self,tagname"); } } sub get_text { my ($self, $text) = @_; if ( not exists $self->{_text} ) { $self->{_text} = $text; } else { $self->{_text} .= $text; } } sub end_text { my ( $self, $tagname) = @_; if ( $tagname eq 'pre' ) { $self->handler(text => ''); $self->handler(start => ''); $self->handler(end => ''); } } my $parser = HTML::Parser->new(start_h => [\&default_start,'self,tagna +me']); $parser->parse($VAR1); print $parser->{_text};
This might have the advantage over using other parsers if you are dealing with large documents as it doesn't build a preparsed representation of the documentation before handing the events to you.

/J\


Comment on Re: Extracting Text After <pre> tag in HTML
Download Code
Re^2: Extracting Text After <pre> tag in HTML
by Anonymous Monk on Mar 30, 2007 at 08:41 UTC
    how to work
     PRE tag within text area

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://574437]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (14)
As of 2015-07-07 13:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls