Contributed by Anonymous Monk
on Nov 22, 2000 at 18:36 UTC
Q&A
> files
Description: How do I split a file into parts based on specific delimeter? I have a file and I want to split it into a new file every time my prog. enounters a certain pattern. So, something like, "for every instance of "XXFFDDF" create new file". Ideas?
Thanks
Answer: How do I split a file into parts contributed by chipmunk This is probably the simplest approach:
$/ = 'XXFFDDF'; # set the input record separator
my $base = 'filename';
my $i = 0;
while (<>) { # read one section at a time
my $filename = "$base$i"; # generate a new filename
open(OUT, ">$filename") # create and write a new file
or die "Can't open $filename: $!\n";
print OUT;
$i++;
}
You can generate the filenames however you prefer. I just chose a very simple way of generating the names as an example.
If each section is very long, you might want to read the file in in smaller chunks to conserve memory.
One final note: this puts the XXFFDDF at the end of each file. If you want to put it at the beginning, the code will need to be somewhat different.
| Answer: How do I split a file into parts contributed by Fastolfe You can probably do something like this:
local $/ = "XXFFDDF"; # delimiter
while (<INPUT>) {
open(OUTPUT, "> output.$.") # output.1, output.2, etc
or die "output.$.: $!";
print OUTPUT;
close(OUTPUT);
}
| Answer: How do I split a file into parts contributed by galande Hi,
If you want to split one file into lot of files use
that pattern as your record separator, that is $/.
Try this one ...
#! /usr/bin/perl -w
my $infil = $0;
my $separator = "XXFFDDF";
local $/ = $separator;
sysopen(INFIL,$infil,O_RDONLY) || die "Can't open $infil: $!.\n";
while(<INFIL>){
my $out_file = "$infil.$.";
# s/$separator//;
### if you want to remove your file separator also, uncomment above li
+ne .....
sysopen(OUT,$file,O_RDWR|O_CREAT|O_EXCL) || die "Can't open for wri
+te tst.$.: $!.\n";
print OUT $_;
}
close(INFIL);
| Answer: How do I split a file into parts contributed by repson There are several ways of accomplishing this task that I can think of.
The one best one I can think of in terms of flexibility and efficiency is this (untested code).
my $fil_count = 0;
my $delim = 'XXFFDDF';
open IN, 'in.txt' or die "Can't open in.txt: $!\n";
open OUT, '> out0.txt' or die "Can't write to out0.txt: $!\n";
while (<IN>)
{
if (/^(.*?)$delim(.*)$/) {
print OUT $1 if $1;
close OUT;
$fil_count++;
open OUT, '> out' . $fil_count . '.txt' or die "Can't write to out
+${fil_count}.txt: $!\n";
print OUT $2 if $2;
}
else {
print OUT $_;
}
}
close IN;
| Answer: How do I split a file into parts contributed by tedv Coming up with the "best" solution depends a lot
on variables like how large the files are, what
kind of performance you need, and how you'll come
up with the new file name. However, here's the
simplest way of solving that (if memory usage and
time are no issues).
use strict;
local $/ = undef; # grab everything from file
open FILE, "my_file" or die $!;
foreach $data_block (split /match_instance/, <FILE>) {
open OUTPUT, "new_file_name" or next;
print OUTPUT $data_block;
close OUTPUT;
}
Note that whatever string we look for ("match_instance"
in this example) will get deleted by the nature of split.
You can enclose match_instance in parenthesis
if you want it included. But then you'll end up with some
array that looks like "match_instance", "data", "match_instance", "more data", etc. So you couldn't use a foreach to
process it.
-Ted |
Please (register and) log in if you wish to add an answer
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|