Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Splitting a text file

by Dr Manhattan (Beadle)
on Mar 19, 2013 at 10:24 UTC ( #1024249=perlquestion: print w/ replies, xml ) Need Help??
Dr Manhattan has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I am trying to split a text file so that it prints the first 1000 lines to 1.txt, the 2nd 1000 lines to 2.txt etc etc. I tried this and it seems to be working fine

my $length = $#array; my $fileNR = 1; for (my $x = 0; $x < $length; $x++) { open ($fileNR, ">$fileNR.txt") or die "can't open"; print $fileNR "$array[$x]\n"; if ($x % 1000 == 999) { close ($fileNR); $fileNR++; } }

Except that it doesn't print line 1 - 1000 to a file(as it should do), it just prints every 1000th line in a file, and then it goes on to the next file. Any idea why this happens?

Any help would be appreciated

Comment on Splitting a text file
Download Code
Re: Splitting a text file
by jethro (Monsignor) on Mar 19, 2013 at 10:41 UTC

    If you are only interested in a solution (and are on something unixoide) you could simply use "split", a command line utility

    If you are interested in learning perl, I'll give you a few hints instead of the solution on a platter:

    Your inner loop makes no sense. An inner loop is run completely for every invocation of the outer loop. So since your outer loop is going through the lines of your input file, you seem to open a new file for each line of your input file. Not good.

    What you need is a counter (similar to $fileNR, but for input lines). This counter counts to 1000. If it reaches 1000 you open a new file and reset the counter to 0. That's it

    Further optimization: If you want you can use $x as the counter. You just don't reset it and use modulo arithmetic instead, i.e. if $x modulo 1000 is 0, then open a new file. The modulo operator in perl is "%"

Re: Splitting a text file
by Ratazong (Prior) on Mar 19, 2013 at 10:51 UTC

    Hi,

    you are already on the right track. You might try to adapt your algorithm as follows (to keep most of your ideas/code)

    • open output-file #1
    • loop through the array (as in your code; for (my $x = 0; $x < $length; $x++))
      • write the current line ($array[$x])
      • check if you already wrote 1000 lines (use the modulo-operator, e.g. if ($x % 1000 == 999))
        • close the current file
        • open the next one
    • clean-up: close the last file
    Hope that helps! Rata

      Hi all

      I tried this and it seems to be working fine

      my $length = $#array; my $fileNR = 1; for (my $x = 0; $x < $length; $x++) { open ($fileNR, ">$fileNR.txt") or die "can't open"; print $fileNR "$array[$x]\n"; if ($x % 1000 == 999) { close ($fileNR); $fileNR++; } }

      Except that it doesn't print line 1 - 1000 to a file(as it should do), it just prints every 1000th line in a file, and then it goes on to the next file. Any idea why this happens?

        Hi Dr Manhattan,

        have a look at the difference between my algorithm and your solution: you open the file in each iteration of the loop. And by opening a file for writing, you erase the previous content.

        The solution is to open the file before the loop. And (additionally) inside the if-block, when you want to switch to the next file.

        HTH, Rata
Re: Splitting a text file
by daxim (Chaplain) on Mar 19, 2013 at 10:54 UTC
    Know your GNU coreutils. split already does what you want. The empty string as last argument suppresses the default prefix, the letter x.

    $ split --lines=1000 --numeric-suffixes=1 --suffix-length=5 --addition +al-suffix=NR.txt /the/file ''

    Even in Perl, I think you have too much code for so little work. One doesn't need to open the input file oneself, it is simpler to pipe the content from the shell and read from STDIN in Perl. One doesn't need to loop over the input oneself, -n already does that. $. contains the current line number, $_ contains the current line content, see perlvar.

    </the/file perl -MIO::File -ne' my $nr = 1+int($./1_000); $handles[$nr] = IO::File->new(sprintf("%05dNR.txt", $nr), "w") unl +ess defined $handles[$nr]; $handles[$nr]->print($_); '

      Thanks, daxim, it was instructive to me to figure out how the pieces fit together. One thing though, your Perl code as written will put only 999 lines into the first file. This can be fixed by replacing $. with ($.-1)

      </the/file perl -MIO::File -ne' my $nr = 1+int(($.-1)/1_000); $handles[$nr] = IO::File->new(sprintf("%05dNR.txt", $nr), "w") unl +ess defined $handles[$nr]; $handles[$nr]->print($_); '

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1024249]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-07-13 11:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (249 votes), past polls