http://www.perlmonks.org?node_id=479720


in reply to HTML stripper in WWW::Mechanize doesn't seem to work

Based on the comment in your code ("The HTML is stripped off the contents and the text is stored in an array of strings") you're assigning the content incorrectly. Note that I don't have WWW::Mechanize installed so can't double check the docs for that.

The @stripped_html is an array, just like you need. But $stripped_html[$x] is only one element in that array, which means that it's really a scalar1. Since the content sub returns an array, you're trying to assign an array to a scalar, and you'll end up with the number of things in the array.

You'll need to change your code a bit.

# Note that the $x isn't needed with this approach, # so I took it out. my @stripped_html; @stripped_html = $webcrawler->content( format => "text" ); # You can print the array directly, like this: print @stripped_html; # Or put it in a loop to specify what you want between # the array elements: for my $item (@stripped_html) { print "$item\n"; }
As is, this code prints out the HTML contents twice. Just so you can see the different ways to print an array, which wasn't your question so I'll stop blathering on about that now.

1 Yes, it could be another array or a hash or whatever, I'm talking simplest case scenario here.

Replies are listed 'Best First'.
Re^2: HTML stripper in WWW::Mechanize doesn't seem to work
by polettix (Vicar) on Aug 01, 2005 at 00:32 UTC
Re^2: HTML stripper in WWW::Mechanize doesn't seem to work
by lampros21_7 (Scribe) on Aug 01, 2005 at 01:21 UTC
    Right, apologies for this but i 've confused you on one thing. I want one set of stripped HTML to be assigned to one element of the array. So, if www.google.com was my initial website all its contents would be stored in $stripped_html[0], then by doing a x = x + 1; i would move to the next element of the array and assign the next URL's contents to it.Thanks
      You can do this, but you'll have to do something like a join first.

      Consider the simpler example:

      my @mango = ('one', 'two', 'three', 'penguin'); my $result = @mango; print "Result is $result\n"; # prints 4 $result = join ' ', @mango; print "Result is $result\n"; # prints "one two three penguin"
      If the content subroutine returns an array and you assign it in scalar context, you get the count of the things in the array. For your particular code you'll want something like: $stripped_html = join ' ', $webcrawler->content( format => "text" );
        I don't think  content() returns an array. I checked the source code and this is what i see

        sub content { my $self = shift; my $content = $self->{content}; return $content unless $self->is_html; ### More stuff there... ..... ..... return $content; }

        Looks like it is just a scalar. Does not look like any reference to an array either so I guess it is a scalar. So he should be able to store it in one element of his array.