http://www.perlmonks.org?node_id=42919

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How do I split a file into parts based on specific delimeter? I have a file and I want to split it into a new file every time my prog. enounters a certain pattern. So, something like, "for every instance of "XXFFDDF" create new file". Ideas? Thanks

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How do I split a file into parts
by chipmunk (Parson) on Nov 28, 2000 at 01:48 UTC
    This is probably the simplest approach:
    $/ = 'XXFFDDF'; # set the input record separator my $base = 'filename'; my $i = 0; while (<>) { # read one section at a time my $filename = "$base$i"; # generate a new filename open(OUT, ">$filename") # create and write a new file or die "Can't open $filename: $!\n"; print OUT; $i++; }

    You can generate the filenames however you prefer. I just chose a very simple way of generating the names as an example.

    If each section is very long, you might want to read the file in in smaller chunks to conserve memory.

    One final note: this puts the XXFFDDF at the end of each file. If you want to put it at the beginning, the code will need to be somewhat different.

Re: How do I split a file into parts
by Fastolfe (Vicar) on Nov 22, 2000 at 18:41 UTC
    You can probably do something like this:
    local $/ = "XXFFDDF"; # delimiter while (<INPUT>) { open(OUTPUT, "> output.$.") # output.1, output.2, etc or die "output.$.: $!"; print OUTPUT; close(OUTPUT); }
Re: How do I split a file into parts
by galande (Initiate) on Nov 23, 2000 at 10:00 UTC
    Hi,
    If you want to split one file into lot of files use that pattern as your record separator, that is $/.
    Try this one ...
    #! /usr/bin/perl -w my $infil = $0; my $separator = "XXFFDDF"; local $/ = $separator; sysopen(INFIL,$infil,O_RDONLY) || die "Can't open $infil: $!.\n"; while(<INFIL>){ my $out_file = "$infil.$."; # s/$separator//; ### if you want to remove your file separator also, uncomment above li +ne ..... sysopen(OUT,$file,O_RDWR|O_CREAT|O_EXCL) || die "Can't open for wri +te tst.$.: $!.\n"; print OUT $_; } close(INFIL);
      This code would work fine but has the disadvantage of storing an entire file of data in $_. While the file you are splitting would not be likely to have huge sections between the $seperator's you might never know if the code could be used in a situation where we are placing large amounts of data into memory.
Re: How do I split a file into parts
by repson (Chaplain) on Nov 23, 2000 at 17:29 UTC
    There are several ways of accomplishing this task that I can think of. The one best one I can think of in terms of flexibility and efficiency is this (untested code).
    my $fil_count = 0; my $delim = 'XXFFDDF'; open IN, 'in.txt' or die "Can't open in.txt: $!\n"; open OUT, '> out0.txt' or die "Can't write to out0.txt: $!\n"; while (<IN>) { if (/^(.*?)$delim(.*)$/) { print OUT $1 if $1; close OUT; $fil_count++; open OUT, '> out' . $fil_count . '.txt' or die "Can't write to out +${fil_count}.txt: $!\n"; print OUT $2 if $2; } else { print OUT $_; } } close IN;
      If you can stand putting the entire line with the delimiter in the new file, there's a quick and dirty version for this:
      perl -pe 'BEGIN { $x = "aaa000"; } open STDOUT, ">".(++$x) if /XXFFDDF +/' <input >initial_output

      -- Randal L. Schwartz, Perl hacker

Re: How do I split a file into parts
by tedv (Pilgrim) on Nov 28, 2000 at 02:31 UTC
    Coming up with the "best" solution depends a lot on variables like how large the files are, what kind of performance you need, and how you'll come up with the new file name. However, here's the simplest way of solving that (if memory usage and time are no issues).
    use strict; local $/ = undef; # grab everything from file open FILE, "my_file" or die $!; foreach $data_block (split /match_instance/, <FILE>) { open OUTPUT, "new_file_name" or next; print OUTPUT $data_block; close OUTPUT; }
    Note that whatever string we look for ("match_instance" in this example) will get deleted by the nature of split. You can enclose match_instance in parenthesis if you want it included. But then you'll end up with some array that looks like "match_instance", "data", "match_instance", "more data", etc. So you couldn't use a foreach to process it.

    -Ted