Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Scrapper outputing error -w

by parkprimus (Sexton)
on Apr 11, 2011 at 13:41 UTC ( [id://898705]=perlquestion: print w/replies, xml ) Need Help??

parkprimus has asked for the wisdom of the Perl Monks concerning the following question:

I am creating a scrapper that gets the url from an index and writes it to a dat file. Then another scrapper that reads the dat in a while loop and scrapes data from it. I have two different version of code one that works and one that doesn't. The one that works, I have defined the content in which to scrape. The one that doesn't work, the content is defined in a variable which is read from the dat file. If anyone can help, it would be much appretiated.
Works:
#!/usr/bin/perl use LWP::Simple; use HTML::TokeParser; use URI::Escape; &sw; sub sw { $content = get('http://example.com/test.html'); ....truncated
Doesn't work:
#!/usr/bin/perl -w use LWP::Simple; use HTML::TokeParser; use URI::Escape; use DBI; use DBD::mysql; $database = "Assets2"; $host = "localhost"; $port = "3306"; $user= "assets"; $password= "xxx"; $dsn = "DBI:mysql:$database:$host:$port"; $connect = DBI->connect($dsn,$user, $password); &get_index; sub get_index { $query1 = "Select * from devices"; $query_handle1 = $connect->prepare($query1); $query_handle1->execute; $query_handle1->bind_columns(\$id,\$device); while ($query_handle1->fetch) { print "id: $id\tname: $device\n"; &switch_properties; } } sub switch_properties { $content = get('http://example.com/$device'); ...truncated
Error:
Use of uninitialized value in substr at /usr/lib/perl5/HTML/PullParser +.pm line 80. Use of uninitialized value in length at /usr/lib/perl5/HTML/PullParser +.pm line 83.
Error with debugger:
Use of uninitialized value in substr at /usr/lib/perl5/HTML/PullParser +.pm line 80. at /usr/lib/perl5/HTML/PullParser.pm line 80 HTML::PullParser::get_token('HTML::TokeParser=HASH(0x9ce9fd0)' +) called at /usr/lib/perl5/HTML/TokeParser.pm line 52 HTML::TokeParser::get_tag('HTML::TokeParser=HASH(0x9ce9fd0)', +'tr') called at switchmapHTMLTEST1.pl line 51 main::switch_properties called at switchmapHTMLTEST1.pl line 2 +8 main::get_index called at switchmapHTMLTEST1.pl line 19 Use of uninitialized value in length at /usr/lib/perl5/HTML/PullParser +.pm line 83. at /usr/lib/perl5/HTML/PullParser.pm line 83 HTML::PullParser::get_token('HTML::TokeParser=HASH(0x9ce9fd0)' +) called at /usr/lib/perl5/HTML/TokeParser.pm line 52 HTML::TokeParser::get_tag('HTML::TokeParser=HASH(0x9ce9fd0)', +'tr') called at switchmapHTMLTEST1.pl line 51 main::switch_properties called at switchmapHTMLTEST1.pl line 2 +8 main::get_index called at switchmapHTMLTEST1.pl line 19

Replies are listed 'Best First'.
Re: Scrapper outputing error -w
by Corion (Patriarch) on Apr 11, 2011 at 13:48 UTC

    It would seem to me that the difference is that

    $content = get('http://example.com/$device');

    fails (or rather, returns undef) while

    $content = get('http://example.com/test.html');

    works. Maybe consider looking closer into what you actually get, and what is in $device and whether the request works from elsewhere.

      I definitely considered that and would only turn the perlmonks when all else fails. The output of $devices is as expected, I threw in a print statement to verify.

        You don't show us at all what path your data takes from get(...) until it gets to HTML::PullParser, where you get a warning (or two).

        What steps have you taken to confirm that you get the data you expect, and that you pass on the proper data down to HTML::PullParser in both cases?

        Please help us to help you better by posting a self-contained small program that exhibits the same failure. That way, we can more easily replicate your situation instead of taking guesses.

Re: Scrapper outputing error -w
by chromatic (Archbishop) on Apr 11, 2011 at 18:28 UTC
    $content = get('http://example.com/$device');

    If you want to interpolate the contents of $device, you need double quotes.

Re: Scrapper outputing error -w
by InfiniteSilence (Curate) on Apr 11, 2011 at 17:44 UTC

    If you can, get out of the habit of relying on global variables in functions like that. It is a sure fire way to get confused about what is happening in your code.

    In the debugger enter a session on your script and immediately do this:

    b switch_properties r
    When you get into the switch properties code step down until you hit the offending 'get' line and then do this:
    p $device
    What do you see?

    Celebrate Intellectual Diversity

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://898705]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2025-06-21 17:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.