Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

can't get www::mechanize to work on a web site

by smackdab (Pilgrim)
on Mar 01, 2004 at 05:17 UTC ( #332780=perlquestion: print w/replies, xml ) Need Help??
smackdab has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get the lyrics for a song...w/o success
use WWW::Mechanize; use URI::URL; use strict; use warnings; my $artist = 'The Beatles'; my $title = 'Hey Jude'; my $mech = WWW::Mechanize->new(); my $search = join("+", split(/ /, $artist)) . "+" . join("+", split(/ +/, $title)); print "search=$search\n"; $mech->get("$search"); $mech->success() or die "Can't get the search page\n"; #print $mech->content(); #$mech->follow_link(url_regex=>qr/$title/i); $mech->follow_link(url_regex=>qr/hey jude/i); #$mech->follow_link(text=>"Hey Jude"); $mech->success() or die "Can't find song page\n"; print $mech->content();
I get the search results, but can't figure out how to get the "song link"...any help is appreciated!!!

-- update: changed title

Replies are listed 'Best First'.
Re: suck on www::mechanize question
by leira (Monk) on Mar 01, 2004 at 06:24 UTC
    It doesn't look like you were far off in your script, and I think the "Connection refused" error that Roger mentioned might be your problem. You should make sure that doing those actions in a normal browser does what you expect it to do, before you start blaming your script for unexpected results.

    You could try using HTTP::Recorder to record your WWW::Mechanize script. I tried it, and it generated this script:

    $mech->get(""); $mech->follow_link(text => "Hey Jude (lennon/mccartney)", n => "1");

    Since the "Hey Jude" link (without authors) produced a 500 (connection refused) error, so I chose another one for the example.


      Thanks for trying it!
      The connection refused error might be the problem, but I can get it to work in the browser, not www::mechanize. I'll look into HTTP::Recorder and see what it does...

      Did you get it to work??????

      I tried switching the artist/song to others w/o change. I also tried the ->follow_link having the artist and song as part of the search string, but no difference.

      The original $mech->get() has artist and song, if you know of any artist/song that works, maybe that would give me a hint...(or maybe not...)
        OK, further investigation suggests that it's not one bad link, but that the server just sometimes returns a 500 error. Other times it succeeds. If I run my script several times, it will sometimes succeed and sometimes fail -- and I get the same results if I try to follow the link several times in my browser.

        I was able to get around it like this:

        my $maxtries = 10; my $i = 1; while ($i <= $maxtries) { $mech->follow_link(text => "Hey Jude (lennon/mccartney)", n => "1" +); last if $mech->success; $mech->back(); $i++; } $mech->success() or die "Can't find song page\n";

        Of course, you can set $maxtries to whatever you think is prudent, and you can put in a sleep() in the loop if you think that might help.


Re: suck on www::mechanize question
by Roger (Parson) on Mar 01, 2004 at 05:35 UTC
    When I click on the title of the song in the search results page, I got the following error message:

    Error InterScan HTTP Version 3.8-Build_1080 $Date: 01/31/2003 16:12:0037$ Connecting to Connection refused

    May be that explains why your robot can't follow the link?

      strange, this part works manually for me: (overall it still doesn't grab the lyrics though)...

      So you got the search results to be correct then?
        My company has a firewall running, may be that has something to do with forbidden access. I will try again when I go home later on my own ISP and see if I get the same error. I am guessing that you might be looking for some regex to extract the song lyrics? ... If so, could you post some HTML on your notepad and state which part you want to get extracted?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://332780]
Approved by Roger
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2017-11-21 17:40 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (307 votes). Check out past polls.