Anonymous Monk has asked for the
wisdom of the Perl Monks concerning the following question:
Hey can anyone tell hıow can i parse html contents to text, i am using the following code but it conversts the whole webpage to text i want just the results as text, how can i do that
my $html6 = $browser->content;
my $Format2 =HTML::FormatText->new(leftmargin =>3, rightmargin =>50 );
my $TreeBuilder2 =HTML::TreeBuilder->new();
my $parsed3= $Format->format($TreeBuilder);
I don't understand what difference there is between "the whole webpage to text" and "the results as text". Can you provide some data samples to show what sort of difference you're talking about?
Also, how about showing us a runnable code snippet, that actually uses some sample data and produces some output. Then explain how that output is different from the output you actually want. That will make it easier to help you.
Hi Anonymous Monk,
Please, if I can make an assumption that since you used the module HTML::FormatText you intended to get your output in plain text not have the whole html page with all the tags in as text. If this is what you want, then you can do like so:
use HTML::TreeBuilder 5 -weak;
my $tree = HTML::TreeBuilder->new_from_url("http://www.google.com");
my $format = HTML::FormatText->new(leftmargin=>3, rightmargin=>50);