Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^4: Making an array from a downloaded web page

by malomar66 (Acolyte)
on Jan 18, 2007 at 05:27 UTC ( #595191=note: print w/replies, xml ) Need Help??


in reply to Re^3: Making an array from a downloaded web page
in thread Making an array from a downloaded web page

This will probably get me dinged in the reputation department but I have to ask.

From what I can tell of the "get" function in Net::FTP it sends the results directly out to a file. It seems I can append the various files I will need to each other provided that I explicitly name the file and provide some kind of offset (0?). How might I be able to adapt this so that I can keep the information in a string and parse it, prior to making a master file of all the links?

  • Comment on Re^4: Making an array from a downloaded web page

Replies are listed 'Best First'.
Re^5: Making an array from a downloaded web page
by moklevat (Priest) on Jan 18, 2007 at 16:40 UTC
    It's a reasonable question, but you still haven't posted any code, so you may indeed get dinged. In the monestary, code begets code. If your original post had included some kind of code, you probably would have had a lot more input from monks and might have a working solution by now.

    On to your question.

    The documentation for the get() method in Net::FTP mentions that for get(REMOTE_FILE[,LOCAL_FILE,WHERE]), LOCAL_FILE may be a filename or a filehandle. If you open() a filehandle for writing you can write as many index files as you want and they will be concatenated in the order they were written. WHERE is optional in the method, but you could use it to skip the first unnecessary header bytes of the index file. You can also open() an "in memory" filehandle that is held as a scalar. This is probably what you want. Here is a quick script that grabs the index files for all 4 quarters in 2 years and writes the concatenated indices to a file. I have also included a commented out option to use a scalar as a filehandle. This is what you will probably ultimately want to use.

    #!/usr/bin/perl use strict; use warnings; use Net::FTP; my $host = "ftp.sec.gov"; my $username = 'anonymous'; my $password = 'yourmail@domain.com'; my $indexdir = '/edgar/full-index'; my @years = qw/2005 2006/; my @quarters = qw/QTR1 QTR2 QTR3 QTR4/; my $indexbyfirm = 'company.idx'; my $indexoutfile = "./complete_index"; ##This opens an "in memory" filehandle as a scalar #open my $indexsave, '>', \ my $pseudo_file # or die "Couldn't open memory handle: $!"; open my $indexsave, '>', $indexoutfile or die "Couldn't open filehandle: $!"; my $ftp= Net::FTP->new("$host", Timeout => 30, Debug => 1) or die "Couldn't connect: $@\n"; $ftp->login($username, $password) or die "Couldn't authenticate.\n"; for my $year (@years) { for my $quarter (@quarters) { $ftp->cwd("$indexdir/$year/$quarter") or die "Couldn't change directories : $!\n"; $ftp->get($indexbyfirm, $indexsave) or die "Couldn't fetch $indexbyfirm : $!\n"; } } ## You can work with the "in memory" file like any scalar # print "$pseudo_file"; $ftp -> quit();
Re^5: Making an array from a downloaded web page
by Anonymous Monk on Jan 18, 2007 at 06:30 UTC
    Try

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://595191]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2021-06-17 21:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (86 votes). Check out past polls.

    Notices?