Re: Splitting big file or merging specific files?

As a general thing only deal with the data you need to deal with immediately. In this case for phase one that means open your input file then while there is more data read a couple of lines and write them to the next output file. For phase 2 that means while there is another file read it and write its contents to your output.

Note there isn't a "for" there anywhere. It's all "while something". Let's see how that coould look:

#!usr/bin/perl
use strict;
use warnings;

=pod

Use this script as the input file to be cut up. We'll put the generate
+d files
into a throw away directory that is a sub-directory to the directory w
+e are
running the script from.

This script creates the split files and the rejoined text. The rejoine
+d text
doesn't get saved, but is compared to the original script as a check t
+hat
everything worked. 

=cut

my $subDir = './delme';

# Create a throw away sub-directory for the test. Wrapped in an eval b
+ecause
# we don't care if it fails (probably because the dir already exists).
eval {mkdir $subDir};

seek DATA, 0, 0; # Set DATA to the start of this file

my $origText = do {local $/; <DATA>}; # Slurp the script text to check
+ against

seek DATA, 0, 0; # Back to the start again

# Create the split files

my $fileNum = 0;

while (!eof DATA) {
    my $fileLines;
    
    $fileLines .= <DATA> for 1 .. 2;
    last if !defined $fileLines;

    ++$fileNum;
    
    open my $outFile, '>', "$subDir/outFile$fileNum.txt";
    print $outFile $fileLines;
    close $outFile;
}

# Join the files back up again
my $joinedText;

$fileNum = 1;

while (open my $fileIn, '<', "$subDir/outFile$fileNum.txt") {
    $joinedText .= do {local $/; <$fileIn>}; # Slurp the file
    ++$fileNum;
}

print "Saved and Loaded OK\n" if $joinedText = $origText;

__DATA__
[download]

The "slurp" bits set a Perl special variable to ignore line breaks so we can read an entire file in one hit. On modern systems with plenty of memory that works fine for files of hundreds of megabytes so it sould be fine for our toy example.

The for 1 .. 2 fetches 2 lines from the input file. If there is an odd number of lines in the input it doesn't matter - we end up concatenating undef to $fileLines which amounts to a no-op so no harm done.

Premature optimization is the root of all job security

Comment on Re: Splitting big file or merging specific files? Select or Download Code


P is for Practical
	PerlMonks