Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Rotating IP Addresses for Scraping

by jandrewc (Initiate)
on May 18, 2011 at 03:40 UTC ( #905411=perlquestion: print w/ replies, xml ) Need Help??
jandrewc has asked for the wisdom of the Perl Monks concerning the following question:

Hey, I'm sorry--I'm a super-nube. I'm trying to write a scraper for some data I'd like to include in my dataset for an economics project. To do this I need to change my IP address every once in a while (which I'm not really so sure how to do). So far, when I run the code it reports "Can't locate Net/IP.pm in @INC (@INC contains: C:/Perl64/site/lib C:/Perl64/lib.) at...

The following is my code.

########################################################## #use WWW::Mechanize; #CRAIGS use LWP::Simple; #use this until "mechanize" works properly use Net::IP; print "Can you see this"; #$dir = "J:\Halibalu"; #put your directory here $dir = "C:\\workspaceP"; $out = "$dir\\output.csv"; $site = "http://zipinfo.com/cgi-local/zipsrch.exe?cnty=cnty&ll=ll&zip= +"; #put your web site here $zip_in = "$dir\\zip.csv"; #my $mech = WWW::Mechanize->new(); #CRAIGS #$mech->agent( 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-u +s) AppleWebKit/533.17.8 (KHTML, like Gecko) Version/5.0.1 Safari/533. +17.8' ); #CRAIGS open OUT, ">$out"; close OUT; open IN, "<$zip_in"; my $a = 10; my $b = 0; my $c = 0; my $d = 0; $ip = new Net::IP ('$a.$b.$c.$d'); my $count = 1; foreach $Z (<IN>) { if ($count<30) { chomp($Z); #MINE print "$Z\n"; #$mech->get($site); #CRAIGS my $page = get("$site${Z}&Go=Go"); #if($dist = $mech->submit_form(form_number=>1, fields=>{'field nam +e'=>$Z})) #search for the forms and figure out which one you need, th +en find the names of the fields if ($page =~ /Longitude<BR>(.*?)<\/font><\/td><\/tr><\/table>/s) { my $info = $1; $info =~ s/<td align=center>/,\s/g; $info =~ s/(West)//g; $info =~ s/<.*?>//g; print "${info}\n"; #if ($dist->decoded_content() =~ /find information here/s) { open OUT, ">>$out"; #print OUT "$Z, $1\n"; print OUT "$info\n"; #MINE close OUT; #} } $count++; else { #sleep(60*60*24); #sleep timer $count=1; if ($d<255){ $d++; } else{ $d=1; $c++; } $ip = new Net::IP ('$a.$b.$c.$d'); } } #Take everything in between "Longitude<BR>" and "</font></td></tr></ta +ble>" (these are verified unique) #make each "<td align=center>" into a comma (",") #Scrap "(West)", "</th>", "</tr>", "<tr>", "</font>" #########################################################

I'd be really grateful to anyone who can lend their insight here.

~Andrew

Comment on Rotating IP Addresses for Scraping
Download Code
Re: Rotating IP Addresses for Scraping
by Plankton (Priest) on May 18, 2011 at 04:50 UTC
      Except that Net::IP won't actually do what the OP wants (ie. auto-magically change their IP address when requesting a web page to any IP address of their choosing) and nor should they be trying to do that anyway for reasons noted in other replies below.
Re: Rotating IP Addresses for Scraping
by GrandFather (Cardinal) on May 18, 2011 at 05:16 UTC

    This looks like an attempt to abuse a site's "free trial" offer. Why should we help you do that??

    True laziness is hard work

      Didn't you read the post? It's for an "economics" project. It's for science... for academia. That makes it entirely legit.

      if this was exploitive, he'd be obliged to state that it was for a "Personal economics" project... but he didn't, so it's all good.

        That's like saying "I stole a car* but it's okay because it was for an 'economics' project so it's legit."

        * Or book, CD, movie, whatever...

Re: Rotating IP Addresses for Scraping
by Anonymous Monk on May 18, 2011 at 06:13 UTC
    I'd be really grateful to anyone who can lend their insight here.

    It is cheaper to buy a membership than abuse free trials

Re: Rotating IP Addresses for Scraping
by dsheroh (Parson) on May 18, 2011 at 08:03 UTC
    To do this I need to change my IP address every once in a while
    Assuming you're using the site within the limits of its Terms of Service, I can't imagine any reason why you might need to do that.

      Oh, I'm sorry. I didn't know this wasn't kosher. It's not a free trial issue although its similar; I need the longitude and latitude of each zip code and this is the only place I've been able to find this data and they only allow you to submit thirty zip codes a day. This would take nearly three years.

      The project has to do with the race-perception effects of inducing slave ownership in the 1850s. If someone is "induced" to own slaves by owning at the time of the cotton gin's penetration optimal cotton land, does this create in the family a self-serving bias in the form of racism. The longitude and latitude data are useful for giving distance to the nearest port which is also a factor for how optimal cotton production is in any particular zip code as transportation costs are theoretically proportional to distance.

      Will you consider helping me understand what to do?

        I need the longitude and latitude of each zip code and this is the only place I've been able to find this data ...

        ORLY? Free Zip Code Latitude and Longitude Database

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

        Your source offers US (and Canadian) postal code data as of May 2011, including lat/lon for USD79.95.

        The project description sounds like a graduate level (PhD dissertation?) project. If this is indeed that, or, in fact, other than a self-funded project, one might think that a reasonable budget would include that kind of allowance for data acquisition.

        If it doesn't, /methinks one would still * NOT * be entitled to steal data.

        This sounds rather an XY problem to me. What you actually seem to want are travel distances between various pairs of locations unrelated to zip codes (I doubt in the 1850s zip codes were much in vogue) and actually unrelated (directly) to lat/long values. Calculating a distance between two points doesn't actually provide a very good indicator of the travel distance or travel difficulty between the two points, but is at best an approximate proxy for the data you really want. Using zip codes to specify locations is just another layer of inexact proxies.

        Maybe there are better ways of determining the information you actually want than "stealing" data to provide and approximation of and approximation to the data you really want?

        True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://905411]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-10-22 10:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (114 votes), past polls