Attached below is a PERL snippet that gets a specific recipe form allrecipes.com and writes the xml source code to a txt file.
#! C:/Perl64/bin/perl # Calls the PERL interpreter
use strict; #use features that provide detailed warning and cleanup
use warnings; #use features that provide detailed warning and cleanup
use autodie; #sometimes prevents the program from hanging and kills i
use LWP::Simple; # PERL module to connect to Internet
my $out_file = 'recipe.txt'; #Defines output
my $encoding = ":encoding(UTF-8)"; #Defines encoding type for text t
+ypical of western webpages
open (my $handle2, ">> $encoding", $out_file) || die "Could not open $
+out_file: $!"; #Opens output and assigns an internal name independe
+nt of file name
+e/") or die "ouch"; #Gets the webpage as xml
print $handle2 $content."\n"; #Writes the URL as text to the output
My question is how i can build a program with this snippet that loops through a series of recipes (lets say from recipe 11253 to 11300) and writes the individual xml code to separate files, each having a variable file name. So i just have to insert a loop somewhere in this code that pulls the source code of each recipe in the range I specify from Allrecipes.com and dumps the text into a file that is than collected into one of my directories containing all the recipe files. I am getting the recipe ID numbers from the allrecipes url. Every recipe has a unique number, so www.allrecipes.com/recipe/10413/ for example, is an easy valentine day cookie recipe.
i know that I will need to insert a sleep command between url fetches or i will be flagged as a bot and kicked out. And that it needs to be at some random time. It will probably look like: sleep(rand(10
also, i have to choose a sensible way of naming each recipe file
In the end, I should be able to have 10,000 named txt files of the recipes I specify in a directory. After this step is when i will parse the info that I need form the text to do my analysis.