I'm trying to do some screen scraping. I'm using this regex to capture the variable portion of the URLs. However in my output, I only have the first match. My error messages also indicate that I'm only capturing one entry when I should be capturing about forty.
@game_array = ($gamepage = ~m/onclick="document\.location\.href='([.]+
+)'"/g);
Update: The data I'm trying to scrape is the URL's of pages for specific SC2 replays from the site. "http://www.sc2rep.com/" I'm trying to scrape the individual game pages and output them to a file for use with a DBI script.
On closer inspection the match I'm getting is incorrect. Here is a more complete script. Sorry about the incomplete information.
#!/usr/bin/perl -w
use strict;
use DBI;
use Data::Dumper;
my $page ="http://sc2rep.com/"; #URL of page withlist ofgame pages to
+scrape, must be from sc2rep.com
my $index=0;
my $gamepage= `curl $page`;
my $game_data;
my @game_array;
my $counter;
@game_array= ($gamepage =~ m/onclick="document\.location\.href='([.]+)
+'"/g);
open (OUT,">sc2data") or die$!;
for($index<40)
{
$game_data = `curl "http://sc2rep.com$game_array[$index]"`;
print OUT "$game_data" . "\n end_replay \n";
print "looped successfully.\n";
$index++;
}
close OUT;
exit;
The fix by stevieb made the code work. Removing the bracket made it work like a charm. Thank you all for your help and time.