Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Script for a URL that constantly changes

by semrich (Initiate)
on Oct 20, 2011 at 00:00 UTC ( #932529=note: print w/ replies, xml ) Need Help??


in reply to Re: Script for a URL that constantly changes
in thread Script for a URL that constantly changes

This is what I have so far:

use strict; $|++; use Test::WWW::Mechanize; use WWW::Mechanize::Sleepy; use File::Basename; use WWW::Mechanize; use WWW::Mechanize::Image; use Storable; use HTTP::Cookies; use HTML::SimpleParse; my @sku = ('NUMBER'); for my $sku (@sku) { # sleep between 5 and 20 seconds between requests my $mech = WWW::Mechanize::Sleepy->new( sleep => '1..3' ); my $URL ="https://www.vwrsp.com/psearch/ControllerServlet.do?D=$sku&sp +age=header&CurSel=Ntt&Nty=1&Ntx=mode%2bmatchpartialmax&cntry=us&Ntk=A +ll&N=0&Ntt=$sku"; $mech->get( $URL ); $mech->success or die $mech->response->status_line +; $mech->success or die "post failed: "; my $url1= $mech->uri(); print "$url1\n"; my @links = $mech->find_all_links( url_regex => qr/\/catalog\/product/ +i); for my $links (@links) { $mech->get( $links->url() ); $mech->success or die $mech->response->status_line; $mech->success or die "post failed: "; my $pike = "|"; open (price_file, "FILE NAME") || die "can't open price.txt: $!\n"; my $some_html = $mech-> content(); my $p = new HTML::SimpleParse ($mech); print $p;

What happens on the site is, you go to the URL listed above to get a list that matches the SKU fed to it, then from there each listing has it's own page where it holds all the information you see on the initial page (where the list is). When you go to that second page, in the URL there is a catalog number, that changes depending on which listing you clicked on (which is why I am not using that URL to parse), and I need to somehow capture that catalog number each time the SKU changes so I can get the correct information from the "second" page. I am not sure if that made any more sense. It is sort of complicated to explain.


Comment on Re^2: Script for a URL that constantly changes
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://932529]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (13)
As of 2014-10-22 13:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (118 votes), past polls