Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Splitting a large file into smaller files according to indexes

by shmem (Chancellor)
on Apr 06, 2018 at 08:43 UTC ( #1212408=note: print w/replies, xml ) Need Help??


in reply to Splitting a large file into smaller files according to indexes

Can't use string ("00000140") as a symbol ref while "strict refs" in use at ./partition_file.pl line 26, <$FH> line 6.

The code you posted doesn't produce this error message, it most likely stems from an earlier version of your program, in which you e.g. used $OUT overall instead of $NEW. In this case, in line 26, you would be trying to use a string as a filehandle.

Are you sure that your "mrule" tokens are grouped? If after mrule=150 some line containing mrule=140 is found further down in the source file, you overwrite the file 00000140.log with that line.

To avoid that, open a file for every mrule token and store the open filehandles in a hash:

my %fh; while ( my $line = <$FH> ) { if ( $line =~ /mrule=([0-9]+)/){ my $mrule = sprintf("%08d.log", $1); if ( ! $fh{$mrule} ) { open my $fh, '>', $mrule or die "Can't write to '$mrule': +$!\n"; $fh{$mrule} = $fh; } print {$fh{$mrule}} $line; } } close $fh{$_} for keys %fh; # close all files

Note the extra braces in the print statement around $fh{$mrule}. These are necessary to disambiguate $fh{$mrule} as being a filehandle rather than part of the LIST for print.

perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Replies are listed 'Best First'.
Re^2: Splitting a large file into smaller files according to indexes
by cryptoperl (Novice) on Apr 10, 2018 at 20:38 UTC

    > Are you sure that your "mrule" tokens are grouped ?

    Yes, I am sure the "mrule" tokens are grouped , You would not see something like this
    mpg=1 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=100 mpg=2 mrule=140 reg=7989 score=10625 rank=0 perc=100 mp_demand=40 mpg=3 mrule=150 reg=7989 score=0 rank=0 perc=100 mp_demand=20 mpg=4 mrule=150 reg=7989 score=10625 rank=0 perc=100 mp_demand=40 mpg=3 mrule=140 reg=7989 score=0 rank=0 perc=100 mp_demand=20
    That is, we would not see a line with "mrule=140" repeat before and after a different maprule.

    I tried running your code, but probably it is hitting the file limit. After creating 252 files, I get an error as

    bos-mp96h:~ jvx$ ./partition_file.pl asgn.txt Can't write to '00000021.log': Too many open files bos-mp96h:~ jvx$ ls -ltr | grep 'log' | wc -l 252
      After creating 252 files, I get an error as ...

      Summing STDOUT,STDERR,STDIN to 252 gives 255... very tight limits you've got on your machine. What OS are you running?

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        I am running this on mac OS.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1212408]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2020-12-02 14:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (41 votes). Check out past polls.

    Notices?