http://www.perlmonks.org?node_id=1011735

mdro79 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am trying to better my understanding of WWW::Mechanize. I have built a simple website of a few pages to practice traversing with WWW::Mechanize and reading html tags, attributes and content with WWW::Mechanize::TreeBuilder.

The website I built is quite simple for now, it contains a top level index.html, which contains a single table. In the table rows are a few cells, containing text and links. I am trying to read the links, follow them to the next page, gather some data, print it, then come back to the next row of the table.

Ultimately I would like to traverse a large table, and make decisions row-by-row on whether to store data from that row, and follow a link to a following page, or whether to skip that row as it doesn't meet my criteria and move on to the next one with no action taken.

I am starting with a simple test skeleton, my index.html page, with rows and links leading to a few other pages -- s1.html, s2.html, s3.html

I run into problems after leaving the current page while looping through the list of links. I would like to leave, gather/print some data, and come back and continue my loop onto the next. What actually happens is my program crashes at this point, complaining of unitialized values in /path/to/HTML/Element.pm. With all that said, here is the code I am having problems with. If I can get my page following and retreating logic nailed down properly that will be a big step for me.

use WWW::Mechanize; use WWW::Mechanize::TreeBuilder; my $mech = ... my @list = $mech->look_down(_tag => "a", class => "links"); foreach (@list) { # see if I want to skip this row, or save/print some # data and follow link to next page # printing data works fine # following a link breaks the loop $mech->get($new_url); # finds the page no problem # do stuff on page, then go back $mech->back(); # complaint is of unitialized "tag", from the look_down # call I assume? }

What I believe is happening is the program runs the main loop OK, but when it leaves the current page, something happens to @list. I don't know what, but leaving the page with $mech->get() seems to break my program.