Re: HTML stripper in WWW::Mechanize doesn't seem to work

in reply to HTML stripper in WWW::Mechanize doesn't seem to work

Based on the comment in your code ("The HTML is stripped off the contents and the text is stored in an array of strings") you're assigning the content incorrectly. Note that I don't have WWW::Mechanize installed so can't double check the docs for that.

The @stripped_html is an array, just like you need. But $stripped_html[$x] is only one element in that array, which means that it's really a scalar¹. Since the content sub returns an array, you're trying to assign an array to a scalar, and you'll end up with the number of things in the array.

You'll need to change your code a bit.

# Note that the $x isn't needed with this approach,
# so I took it out.

 my @stripped_html;
 @stripped_html = $webcrawler->content( format => "text" );

 # You can print the array directly, like this:
 print @stripped_html;

 # Or put it in a loop to specify what you want between
 # the array elements:
 for my $item (@stripped_html) {
    print "$item\n";
 }
[download]

As is, this code prints out the HTML contents twice. Just so you can see the different ways to print an array, which wasn't your question so I'll stop blathering on about that now.

¹ Yes, it could be another array or a hash or whatever, I'm talking simplest case scenario here.

Comment on Re: HTML stripper in WWW::Mechanize doesn't seem to work Select or Download Code

Replies are listed 'Best First'.
Re^2: HTML stripper in WWW::Mechanize doesn't seem to work by polettix (Vicar) on Aug 01, 2005 at 00:32 UTC
can't double check the docs for that No need to install anything in general. If you need docs for a module, you'd be able to find them on http://search.cpan.org (e.g. WWW::Mechanize). If you need core docs, you can check http://perldoc.perl.org (e.g. map or perlxstut). Flavio perl -ple'$_=reverse' <<<ti.xittelop@oivalf Don't fool yourself.	[reply]
Re^2: HTML stripper in WWW::Mechanize doesn't seem to work by lampros21_7 (Scribe) on Aug 01, 2005 at 01:21 UTC
Right, apologies for this but i 've confused you on one thing. I want one set of stripped HTML to be assigned to one element of the array. So, if www.google.com was my initial website all its contents would be stored in $stripped_html[0], then by doing a x = x + 1; i would move to the next element of the array and assign the next URL's contents to it.Thanks	[reply]
Re^3: HTML stripper in WWW::Mechanize doesn't seem to work by Nkuvu (Priest) on Aug 01, 2005 at 02:19 UTC
You can do this, but you'll have to do something like a `join` first. Consider the simpler example: `my @mango = ('one', 'two', 'three', 'penguin'); my $result = @mango; print "Result is $result\n"; # prints 4 $result = join ' ', @mango; print "Result is $result\n"; # prints "one two three penguin"` [download] If the content subroutine returns an array and you assign it in scalar context, you get the count of the things in the array. For your particular code you'll want something like: `$stripped_html = join ' ', $webcrawler->content( format => "text" );`	[reply] [d/l] [select]
Re^4: HTML stripper in WWW::Mechanize doesn't seem to work by sk (Curate) on Aug 01, 2005 at 03:49 UTC
I don't think `content()` returns an array. I checked the source code and this is what i see `sub content { my $self = shift; my $content = $self->{content}; return $content unless $self->is_html; ### More stuff there... ..... ..... return $content; }` [download] Looks like it is just a scalar. Does not look like any reference to an array either so I guess it is a scalar. So he should be able to store it in one element of his array.	[reply] [d/l] [select]
Re^5: HTML stripper in WWW::Mechanize doesn't seem to work by Nkuvu (Priest) on Aug 01, 2005 at 04:20 UTC

In Section Seekers of Perl Wisdom