http://www.perlmonks.org?node_id=843691

tomdbs98 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings perl monks,

Today my dilemma is reorganizing files. I actually have both a problem and a question. Say I'm starting with 2 files R1 and R2, as follows:
R1-something.txt R2-something.txt DESCRP a DESCRP a 2 3 3 1 DESCRP b DESCRP c 3 5 4 9
and I want to rearrange them so instead I have files
a.txt b.txt c.txt R1a R1b R2c 2 3 5 3 4 9 R2a 3 1
I have the files R1, R2 stored in fileArray and I think this should output them the way I would like:
foreach my $file (@{$fileArray}) { open (INPUT, "< $file") || die "Could not open $file\n"; while (<INPUT>) { if ($_ =~ /^DESCRP/) { #get name of element chomp (my $newFileName = substr($_, 7)); #open/create file of that name open OUTPUT, ">> $newFileName.txt" || die "Could not open +$_.txt\n"; #print the specific elements name print OUTPUT substr($file,0,2).$newFileName."txt\n"; } else { #or print the values following the element print OUTPUT $_; } } close OUTPUT; close INPUT; }

First, my problem is that when I try to print the values after each element, my OUTPUT handle is closed because it is out of scope from where I open it, and I am not sure how to keep it open.

My question is: Do you have a more efficient/aesthetic way of doing this?

Thanks for your time :)

-Thomas

P.S. the data sets I will actually be working with are much larger (1000s of lines and a dozen 'R_' files), but they will be generally like this.

Replies are listed 'Best First'.
Re: Reorganizing file contents
by rjt (Curate) on Jun 08, 2010 at 17:27 UTC

    The following should be reasonably efficient. It does keep files open, one per unique DESCR element. Depending on how many DESCR elements you have in your real data, you may need to rethink this, possibly with a least-recently-used scheme.

    use warnings; use strict; my %fh_of; # Hash of filehandles foreach my $file (<R*-*.txt>) { open INPUT, "<$file" or die "Couldn't open $file: $!"; while (<INPUT>) { my $fh; if (/^DESCRP\s+(.+?)$/) { my $des = $1; unless (exists $fh_of{$des}) { open $fh_of{$des}, ">>$des.txt" or die "Couldn't open $des.txt: $!"; } $fh = $fh_of{$des}; $file =~ /^(R.+?)-/; # Glob guarantees match print $fh "$1$des\n"; next; } print $fh $_ if $fh; } close INPUT; } close $_ for (values %fh_of);

    With the input files as you've given them, I get the expected output. All errors are fatal; you might want to handle them more gracefully depending on your application. If an input file does not start with a DESCRP line, $fh will not be defined, so I just throw away records until I see a DESCRP line. Again, you may want to handle this differently.

      Worked beautifully, unfortunately I am out of votes. :P

      I should be able to touch up your solution just fine for the real deal.

Re: Reorganizing file contents
by choroba (Cardinal) on Jun 08, 2010 at 16:37 UTC
    If you want to keep your files open (might get problematic if there are too many of them), you can store filehandles in a hash as in the following example:
    my %handles; foreach my $num (1..10){ $file = $num % 5; open($handles{$file},'>>',"$file.txt") unless exists $handles{$file} +; print {$handles{$file}} $num,"\n"; }
Re: Reorganizing file contents
by bluescreen (Friar) on Jun 09, 2010 at 00:32 UTC

    Just a quick suggestion it is recommended to use indirect filehandles instead of INPUT, OUTPUT because those are global and if your application use same name in another part they might collide. So it is recommended the following syntax:

    open(my $output_fh, '>> myoutputfile'); open(my $input_fh, '< myoutputfile');

    that way you ensure the scope of the file handle is your function/method or package. For further reading read perlopentut

    On your problem you don't need a hash to keep the file handle open, you just need to define it in the correct scope, for example:

    my $output_fh; while(<$input_fh>) { if ( $some_condition ) { open($output_fh, '>> myfile'); } else { print $fh "What ever i want: $_" if ($fh); #In case you have +some lines before $some_condintion is true; } } close($output_fh);

    choroba is right, the scope is for the indirect filehandles

      True for indirect filehandles. Not so true for the filehandle scope, though: OP needs several filehandles and writes randomly to any of them.