Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Scrapper outputing error -w

by parkprimus (Sexton)
on Apr 11, 2011 at 13:41 UTC ( [id://898705] : perlquestion . print w/replies, xml ) Need Help??

parkprimus has asked for the wisdom of the Perl Monks concerning the following question:

I am creating a scrapper that gets the url from an index and writes it to a dat file. Then another scrapper that reads the dat in a while loop and scrapes data from it. I have two different version of code one that works and one that doesn't. The one that works, I have defined the content in which to scrape. The one that doesn't work, the content is defined in a variable which is read from the dat file. If anyone can help, it would be much appretiated.
Works:
#!/usr/bin/perl use LWP::Simple; use HTML::TokeParser; use URI::Escape; &sw; sub sw { $content = get('http://example.com/test.html'); ....truncated
Doesn't work:
#!/usr/bin/perl -w use LWP::Simple; use HTML::TokeParser; use URI::Escape; use DBI; use DBD::mysql; $database = "Assets2"; $host = "localhost"; $port = "3306"; $user= "assets"; $password= "xxx"; $dsn = "DBI:mysql:$database:$host:$port"; $connect = DBI->connect($dsn,$user, $password); &get_index; sub get_index { $query1 = "Select * from devices"; $query_handle1 = $connect->prepare($query1); $query_handle1->execute; $query_handle1->bind_columns(\$id,\$device); while ($query_handle1->fetch) { print "id: $id\tname: $device\n"; &switch_properties; } } sub switch_properties { $content = get('http://example.com/$device'); ...truncated
Error:
Use of uninitialized value in substr at /usr/lib/perl5/HTML/PullParser +.pm line 80. Use of uninitialized value in length at /usr/lib/perl5/HTML/PullParser +.pm line 83.
Error with debugger:
Use of uninitialized value in substr at /usr/lib/perl5/HTML/PullParser +.pm line 80. at /usr/lib/perl5/HTML/PullParser.pm line 80 HTML::PullParser::get_token('HTML::TokeParser=HASH(0x9ce9fd0)' +) called at /usr/lib/perl5/HTML/TokeParser.pm line 52 HTML::TokeParser::get_tag('HTML::TokeParser=HASH(0x9ce9fd0)', +'tr') called at switchmapHTMLTEST1.pl line 51 main::switch_properties called at switchmapHTMLTEST1.pl line 2 +8 main::get_index called at switchmapHTMLTEST1.pl line 19 Use of uninitialized value in length at /usr/lib/perl5/HTML/PullParser +.pm line 83. at /usr/lib/perl5/HTML/PullParser.pm line 83 HTML::PullParser::get_token('HTML::TokeParser=HASH(0x9ce9fd0)' +) called at /usr/lib/perl5/HTML/TokeParser.pm line 52 HTML::TokeParser::get_tag('HTML::TokeParser=HASH(0x9ce9fd0)', +'tr') called at switchmapHTMLTEST1.pl line 51 main::switch_properties called at switchmapHTMLTEST1.pl line 2 +8 main::get_index called at switchmapHTMLTEST1.pl line 19

Replies are listed 'Best First'.
Re: Scrapper outputing error -w
by Corion (Patriarch) on Apr 11, 2011 at 13:48 UTC

    It would seem to me that the difference is that

    $content = get('http://example.com/$device');

    fails (or rather, returns undef) while

    $content = get('http://example.com/test.html');

    works. Maybe consider looking closer into what you actually get, and what is in $device and whether the request works from elsewhere.

      I definitely considered that and would only turn the perlmonks when all else fails. The output of $devices is as expected, I threw in a print statement to verify.

        You don't show us at all what path your data takes from get(...) until it gets to HTML::PullParser, where you get a warning (or two).

        What steps have you taken to confirm that you get the data you expect, and that you pass on the proper data down to HTML::PullParser in both cases?

        Please help us to help you better by posting a self-contained small program that exhibits the same failure. That way, we can more easily replicate your situation instead of taking guesses.

Re: Scrapper outputing error -w
by chromatic (Archbishop) on Apr 11, 2011 at 18:28 UTC
    $content = get('http://example.com/$device');

    If you want to interpolate the contents of $device, you need double quotes.

Re: Scrapper outputing error -w
by InfiniteSilence (Curate) on Apr 11, 2011 at 17:44 UTC

    If you can, get out of the habit of relying on global variables in functions like that. It is a sure fire way to get confused about what is happening in your code.

    In the debugger enter a session on your script and immediately do this:

    b switch_properties r
    When you get into the switch properties code step down until you hit the offending 'get' line and then do this:
    p $device
    What do you see?

    Celebrate Intellectual Diversity