Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: 2 questions for Corion regarding href with Mechanize::Firefox

by help_3452 (Initiate)
on Jul 15, 2013 at 22:46 UTC ( #1044467=note: print w/ replies, xml ) Need Help??


in reply to Re: 2 questions for Corion regarding href with Mechanize::Firefox
in thread 2 questions for Corion regarding href with Mechanize::Firefox

Error Code :
No link found matching '//a(@href = "https://www43.friendsprovident.com/CAA/jsp/register_customer.jsp;jsessionid=0001xxxxxxxxxxxxxxxxxxxxxxx:xxxxxxxxxxx?targeturl=https%3A%2F%2Fwww88.friendsprovident.com%2Fmembersite%2Factivate%2FhaveNoCRN.jhtml" or @src="https://www43.friendsprovident.com/CAA/jsp/register_customer.jsp;jsessionid=0001xxxxxxxxxxxxxxxxxxxxxxx:xxxxxxxxxxx?targeturl=https%3A%2F%2Fwww88.friendsprovident.com%2Fmembersite%2Factivate%2FhaveNoCRN.jhtml")'

Actual HTML:

<a id="register" tabindex="106" href="/CAA/jsp/register_customer.jsp;j +sessionid=0001xxxxxxxxxxxxxxxxxxxxxxx:xxxxxxxxxxx?targeturl=https%3A% +2F%2Fwww88.friendsprovident.com%2Fmembersite%2Factivate%2FhaveNoCRN.j +html"> Register</a>


Found @ : https://www43.friendsprovident.com/CAA/jsp/login.jsp;jsessionid=0001xxxxxxxxxxxxxxxxxxxxxxx:xxxxxxxxxxx?targeturl=https%3A%2F%2Fwww88.friendsprovident.com%2Fmembersite%2Flogin%2FMSLogin.jsp%3Fsite_id%3Dmembersite%26finaltargetURL%3Dhttps%3A%2F%2Fwww88.friendsprovident.com%2Fmembersite%2Findex.jhtml%26realmid%3D3DMS-UID

(Probably best found by navigating to https://www88.friendsprovident.com/membersite/ and then click login.)
The register link on the login page is what I'm trying demo $mech->follow_link

Using Code:
@arr = $mech->find_all_links; $link_obj = @arr[4];

Further on I have
eval {$mech->follow_link( url => $link_obj->url, tag => $link_obj-> +tag) };

These two methods don't work consistently.

To respond to all of you in turn.

moritz: Ok. I do need to have the ability to navigate/edit javascript. That part I have covered.
micheal: Yes I've looked at the documentation. Yes having now gone through pretty much all the code for www::mechanize::firefox it is clear to me. The code i've quoted is reasonable and logical. The documentation only needs one extra line to point out that follow_link will not necessarily work with the links found by find_all_links. Or if I understand the code follow_link will search the html while find_all_links will refer to the DOM, which is not documented.
anonymous monk: This is really aggressive! I'm just asking a question. 1. mechanize::firefox doesn't exit. ANS you are pedantic. 2. terms of service. I don't want to scrape either website. I fact i don't want to scrape any website, certainly not any that i don't have permission to. Both websites cited here are only mentioned so i can show the problem. 3. API db access. That is just aggressive and unhelpful. 4. The documentation could be a little clearer on the follow_links section. 5. submit a patch. sure no problem. This is my fifth day learning perl, so once i have the skill I would be very happy to help improve the code. 6. say something meaningful. www::mechanize::firefox is a super piece of work, it is very slick. I'm only quibbling about a small element, because I would like others to use this cool software and would not like them to get confused as i did.

bottom line is find_all_links returns objects which are then not usable with follow_link. This is not what I would expect.
Rolf: Cheers but no that is not the problem. It is not a question of the data being unavailable. I can access the data.


Comment on Re^2: 2 questions for Corion regarding href with Mechanize::Firefox
Select or Download Code
Replies are listed 'Best First'.
Re^3: 2 questions for Corion regarding href with Mechanize::Firefox
by Corion (Pope) on Jul 16, 2013 at 07:43 UTC

    Thanks for posting actual code and data.

    I've never used ->find_all_links in conjunction with ->follow_link. As ->find_all_links needs to return absolute links, and as there is no easy way for ->follow_link to determine URL equivalence for arbitrary href attributes, the problem is basically unsolvable that way.

    As a workaround, I would look at ->find_link_dom , which returns DOM objects instead of converting things to strings.



      Thank you so much for responding. Your code is dense and I wasn't sure I understood it correctly.

      Really appreciate you taking the time to respond.

      Should I come up with a good solution over the next while I will share it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1044467]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2015-07-08 08:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (98 votes), past polls