Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Splitting big file or merging specific files?

by GrandFather (Saint)
on Jun 30, 2017 at 04:05 UTC ( [id://1193887]=note: print w/replies, xml ) Need Help??


in reply to Splitting big file or merging specific files?

As a general thing only deal with the data you need to deal with immediately. In this case for phase one that means open your input file then while there is more data read a couple of lines and write them to the next output file. For phase 2 that means while there is another file read it and write its contents to your output.

Note there isn't a "for" there anywhere. It's all "while something". Let's see how that coould look:

#!usr/bin/perl use strict; use warnings; =pod Use this script as the input file to be cut up. We'll put the generate +d files into a throw away directory that is a sub-directory to the directory w +e are running the script from. This script creates the split files and the rejoined text. The rejoine +d text doesn't get saved, but is compared to the original script as a check t +hat everything worked. =cut my $subDir = './delme'; # Create a throw away sub-directory for the test. Wrapped in an eval b +ecause # we don't care if it fails (probably because the dir already exists). eval {mkdir $subDir}; seek DATA, 0, 0; # Set DATA to the start of this file my $origText = do {local $/; <DATA>}; # Slurp the script text to check + against seek DATA, 0, 0; # Back to the start again # Create the split files my $fileNum = 0; while (!eof DATA) { my $fileLines; $fileLines .= <DATA> for 1 .. 2; last if !defined $fileLines; ++$fileNum; open my $outFile, '>', "$subDir/outFile$fileNum.txt"; print $outFile $fileLines; close $outFile; } # Join the files back up again my $joinedText; $fileNum = 1; while (open my $fileIn, '<', "$subDir/outFile$fileNum.txt") { $joinedText .= do {local $/; <$fileIn>}; # Slurp the file ++$fileNum; } print "Saved and Loaded OK\n" if $joinedText = $origText; __DATA__

The "slurp" bits set a Perl special variable to ignore line breaks so we can read an entire file in one hit. On modern systems with plenty of memory that works fine for files of hundreds of megabytes so it sould be fine for our toy example.

The for 1 .. 2 fetches 2 lines from the input file. If there is an odd number of lines in the input it doesn't matter - we end up concatenating undef to $fileLines which amounts to a no-op so no harm done.

Premature optimization is the root of all job security

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1193887]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found