Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

How do I split a file into parts

( #42919=categorized question: print w/ replies, xml ) Need Help??
Contributed by Anonymous Monk on Nov 22, 2000 at 18:36 UTC
Q&A  > files


Description:

How do I split a file into parts based on specific delimeter? I have a file and I want to split it into a new file every time my prog. enounters a certain pattern. So, something like, "for every instance of "XXFFDDF" create new file". Ideas? Thanks

Answer: How do I split a file into parts
contributed by chipmunk

This is probably the simplest approach:

$/ = 'XXFFDDF'; # set the input record separator my $base = 'filename'; my $i = 0; while (<>) { # read one section at a time my $filename = "$base$i"; # generate a new filename open(OUT, ">$filename") # create and write a new file or die "Can't open $filename: $!\n"; print OUT; $i++; }

You can generate the filenames however you prefer. I just chose a very simple way of generating the names as an example.

If each section is very long, you might want to read the file in in smaller chunks to conserve memory.

One final note: this puts the XXFFDDF at the end of each file. If you want to put it at the beginning, the code will need to be somewhat different.

Answer: How do I split a file into parts
contributed by Fastolfe

You can probably do something like this:

local $/ = "XXFFDDF"; # delimiter while (<INPUT>) { open(OUTPUT, "> output.$.") # output.1, output.2, etc or die "output.$.: $!"; print OUTPUT; close(OUTPUT); }
Answer: How do I split a file into parts
contributed by galande

Hi,
If you want to split one file into lot of files use that pattern as your record separator, that is $/.
Try this one ...

#! /usr/bin/perl -w my $infil = $0; my $separator = "XXFFDDF"; local $/ = $separator; sysopen(INFIL,$infil,O_RDONLY) || die "Can't open $infil: $!.\n"; while(<INFIL>){ my $out_file = "$infil.$."; # s/$separator//; ### if you want to remove your file separator also, uncomment above li +ne ..... sysopen(OUT,$file,O_RDWR|O_CREAT|O_EXCL) || die "Can't open for wri +te tst.$.: $!.\n"; print OUT $_; } close(INFIL);
Answer: How do I split a file into parts
contributed by repson

There are several ways of accomplishing this task that I can think of. The one best one I can think of in terms of flexibility and efficiency is this (untested code).

my $fil_count = 0; my $delim = 'XXFFDDF'; open IN, 'in.txt' or die "Can't open in.txt: $!\n"; open OUT, '> out0.txt' or die "Can't write to out0.txt: $!\n"; while (<IN>) { if (/^(.*?)$delim(.*)$/) { print OUT $1 if $1; close OUT; $fil_count++; open OUT, '> out' . $fil_count . '.txt' or die "Can't write to out +${fil_count}.txt: $!\n"; print OUT $2 if $2; } else { print OUT $_; } } close IN;
Answer: How do I split a file into parts
contributed by tedv

Coming up with the "best" solution depends a lot on variables like how large the files are, what kind of performance you need, and how you'll come up with the new file name. However, here's the simplest way of solving that (if memory usage and time are no issues).

use strict; local $/ = undef; # grab everything from file open FILE, "my_file" or die $!; foreach $data_block (split /match_instance/, <FILE>) { open OUTPUT, "new_file_name" or next; print OUTPUT $data_block; close OUTPUT; }
Note that whatever string we look for ("match_instance" in this example) will get deleted by the nature of split. You can enclose match_instance in parenthesis if you want it included. But then you'll end up with some array that looks like "match_instance", "data", "match_instance", "more data", etc. So you couldn't use a foreach to process it.

-Ted

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (4)
    As of 2014-08-30 07:06 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (291 votes), past polls