I'm trying to do some screen scraping. I'm using this regex to capture the variable portion of the URLs. However in my output, I only have the first match. My error messages also indicate that I'm only capturing one entry when I should be capturing about forty.
@game_array = ($gamepage = ~m/onclick="document\.location\.href='([.]+
Update: The data I'm trying to scrape is the URL's of pages for specific SC2 replays from the site. "http://www.sc2rep.com/" I'm trying to scrape the individual game pages and output them to a file for use with a DBI script.
On closer inspection the match I'm getting is incorrect. Here is a more complete script. Sorry about the incomplete information.
my $page ="http://sc2rep.com/"; #URL of page withlist ofgame pages to
+scrape, must be from sc2rep.com
my $gamepage= `curl $page`;
@game_array= ($gamepage =~ m/onclick="document\.location\.href='([.]+)
open (OUT,">sc2data") or die$!;
$game_data = `curl "http://sc2rep.com$game_array[$index]"`;
print OUT "$game_data" . "\n end_replay \n";
print "looped successfully.\n";
The fix by stevieb made the code work. Removing the bracket made it work like a charm. Thank you all for your help and time.