Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Split large file and upload to ftp server with multiple threads

by mdc76 (Acolyte)
on Oct 02, 2008 at 01:37 UTC ( #714933=perlquestion: print w/ replies, xml ) Need Help??
mdc76 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to split a large file into smaller chunks, and then use threads to upload the chunks to an FTP server. This is to speed up the transfer with multiple connections uploading a piece of the file. Here's what I've tried so far but its not working when using threads, the program seems to hang or block, and the file on the FTP server is 0 size. Any help would be appreciated.
#!/usr/bin/perl -w use strict; use warnings; use Net::FTP; use Config; $Config{useithreads} or die('Recompile Perl with threads to run this p +rogram.'); use threads; my $file = 'c:\sdwork\dla495.txt'; my $ftp = Net::FTP->new("ftpserver") or die "can't get ftp object\n"; $ftp->login("fakeusr", "secret") or die "can't login to ftp server\n"; $ftp->binary(); $ftp->cwd('/home/fakeusr/tmp'); open (FH, "<$file") or die "Could not open source file. $!"; binmode(FH); my $i = 0; my $start_byte = 0; my @THRD_LIST; my $thrd; while (1) { my $chunk; print "process part $i\n"; $i ++; if (!eof(FH)) { my $bytes_read = read(FH, $chunk, 10000); #bytes print "bytes read: $bytes_read\n"; #test with threads $thrd = threads->create (\&transfer_chunks, $start_byte, $chunk) + or die "Failed to start the thread: $@\n"; push (@THRD_LIST, $thrd); #test without threads #transfer_chunks($start_byte, $chunk); $start_byte += $bytes_read + 1; print "start byte: $start_byte\n"; } last if eof(FH); } # start the threads for $thrd (@THRD_LIST) { $thrd->join(); } $ftp->quit; sub transfer_chunks { my ($start_byte, $chunk) = @_; # convert perl string into a filehandle open(TEST, '<', \$chunk); $ftp->restart($start_byte); $ftp->put(*TEST, 'testfile.txt'); ##$ftp->append(*TEST, 'testfile.txt'); close(TEST); }

Comment on Split large file and upload to ftp server with multiple threads
Download Code
Re: Split large file and upload to ftp server with multiple threads
by GrandFather (Cardinal) on Oct 02, 2008 at 02:37 UTC

    Maybe Net::FTP is unhappy?

    use strict; use warnings; use Config; BEGIN { $Config{useithreads} or die ('Recompile Perl with threads to run this program.'); }; use threads; my $i = 0; my $size = 200; my $start_byte = 0; my @THRD_LIST; my $thrd; my $targetSize = $size / 10; while ($size) { my $chunk = $targetSize; $i++; $chunk = $size if $chunk > $size; $size -= $chunk; print "Process part $i. Size is $chunk\n"; $thrd = threads->create (\&transfer_chunks, $i, $chunk) or die "Failed to start the thread: $@\n"; push (@THRD_LIST, $thrd); } # start the threads for $thrd (@THRD_LIST) { $thrd->join (); } print "All done\n"; sub transfer_chunks { my ($i, $chunk) = @_; while ($chunk--) { print "$i "; sleep 1 + rand (3); } print "\nprocess $i complete\n"; }

    prints (for example):

    Process part 1. Size is 4 1 Process part 2. Size is 4 Process part 3. Size is 4 2 Process part 4. Size is 4 3 Process part 5. Size is 4 4 Process part 6. Size is 4 5 Process part 7. Size is 4 6 Process part 8. Size is 4 7 Process part 9. Size is 4 8 Process part 10. Size is 4 9 10 1 3 4 5 6 10 3 4 8 9 2 3 6 7 8 10 1 5 7 8 9 process 3 complete 4 10 2 process 4 complete 6 1 5 7 process 8 complete 9 2 process 6 complete process 9 complete process 10 complete process 1 complete process 5 complete process 7 complete process 2 complete All done

    which is as expected on my XP system. Can you reproduce the problem without using FTP?


    Perl reduces RSI - it saves typing
      Thanks for the quick reply. The threading works ok. I'm not sure if I'm using the right FTP commands to attempt something like this.

        Have you tried passing Debug => 0 to the FTP constructor?

        Have you tried constructing a different FTP object for each thread?

        An hour? That's a slow reply!


        Perl reduces RSI - it saves typing
Re: Split large file and upload to ftp server with multiple threads - SANITY CHECK
by ww (Bishop) on Oct 02, 2008 at 03:46 UTC
    As I read this (incorrectly, I hope), you are going to end up with multiple chunks (files!) at the server end.

    How do you expect to reconstitute the original file there? How will you know which chunk is first and which the next and so on?

    And second, why would you expect "multiple connections" "...to speed up the transfer..." unless you have "multiple connections" (in the sense of independent circuits which, combined, provide greater bandwidth than a single circuit)?

    If so, does the server end have resources to accept the torrent that will ensue?

      Independent circuits combined providing greater bandwitdh than a single circuit is correct. I wasn't sure how to reconstitue the files either, or if it was even possible with FTP.
        Independent circuits combined providing greater bandwidth than a single circuit is correct
        Unless you have two different physical connections between the local and remote network (and that means two modems, two DSLs, or two whatever at the local side), you don't have two independent circuits.

        On the old days, when networks had lots of errors and packets dropped, using several TCP connections (all going over the same physical layer) to send data in parallel was an effective mean to increase throughput. But nowadays, that the networks are quite reliable, it doesn't make sense anymore, you will only get a marginal improvement (if any) on the transfer speed.

        Effective ways to reduce the transfer time are:

        • Use compression, if you are already using compression, use a better compressor (bzip2 is better than gzip and z7 is usually better than bzip2).
        • Sometimes your data can be represented in a more compact format. For instance, CSV is better than XML in this regard.
        • Can you use rsync?
Re: Split large file and upload to ftp server with multiple threads
by BrowserUk (Pope) on Oct 02, 2008 at 04:02 UTC
Re: Split large file and upload to ftp server with multiple threads
by Illuminatus (Curate) on Oct 02, 2008 at 14:35 UTC
    This has already been alluded to, but I wanted to say it directly; you cannot concurrently share a single FTP object. As long as you opened an ftp session within each thread it should transfer fine. As for re-assembling them, you could use either the telnet or ssh modules to do this.
Re: Split large file and upload to ftp server with multiple threads
by Zenshai (Sexton) on Oct 02, 2008 at 22:53 UTC
    I'd like to suggest a great utility you can use for spiltting files.

    http://unxutils.sourceforge.net/

    There's a utility named split in there that works wonders on tasks like this.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://714933]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2014-12-25 12:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls