Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Problem with multiple loops

by chinamox (Scribe)
on Oct 24, 2006 at 11:53 UTC ( [id://580249]=perlquestion: print w/replies, xml ) Need Help??

chinamox has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I am working on finishing up a class project involving splitting up a large file into several smaller ones using IO::File. I have nearly completed writing the script but now when I try to run it on my *INX machine I get no Error messages or feedback, but am not returned to the command line either. When I sign back on to the machine, I find that only one new file has been written in the directory. I think there for that this is likely something wrong in my second while() statement.

I have been working on this problem for a few hours and haven't gotten anywhere. I think this likely results from a very stupid newbie mistake, and thus I am asking for your help. As always, many thanks to any of you who spare me some time.

#!/opt/bin/perl -w use strict; use IO::File; use File::Basename qw(basename); my $input = ""; #used as counters my $fn_count = $ARGV[0] - ($ARGV[0] - 1); my $fn_cyc_count= $ARGV[0]; my $org_fn; #used for File::Basename my $base; my $path; my $type; main(); exit(0); sub main{ while($fn_cyc_count >= 1){ open(IN, $ARGV[1])||die "Cannot open $ARGV[1]:$!\n"; die "No number of lines\n" unless($ARGV[0] =~ /^\d+$/); #Get $max_ln_count per file using $ARGV[0] $lines = 0; $lines ++ while(<IN>); close IN; my $max_ln_count = int($lines/$ARGV[0]); #Get Input from $ARGV[1] open(INFILE, '<' ,$ARGV[1]) ||die "Cannot open $ARGV[1] + for data input!\n"; my $fh = IO::File->new($ARGV[1], 'r') ||die "unable to open $ +ARGV[1] for reading.\n"; my @all_lines = $fh->getlines(); my @rev_all_lines = reverse (@all_lines); #Output one of the new files in the current directory while($max_ln_count >= 1){ #Use File::Basename to get the file name. $org_fn = $ARGV[1]; my $base = basename $org_fn; my $file = $base . $fn_count; #Cut and Write Output my $line = pop(@rev_all_lines); open (OUTFILE, '>' ,$file)||die "Cannot open $file to + write output!\n"; print OUTFILE "$line" . "\n"; #Subtract one from the $max_ln_count counter $max_ln_count = $max_ln_count - 1; #Close method when counter reaches 0 close (OUTFILE) if ($max_ln_count <= 0); } #Subtract one from the $fn_cyc_count counter $fn_cyc_count = $fn_cyc_count - 1; #Close method when counter reaches 0 close (INFILE) if ($fn_cyc_count <= 0); } }

Also there is one other piece of advice I would ask of more experienced Perl coder. As you can see from the code, I am determining the number of lines to be written in each new file by using the / operator. My problem is what to do if there is a remainder? Is there a way to scoop up the remainder as and save it as a $scalar? Any advice on using that to add an extra line to each file until $remainder == 0?

Thank you again for any and all help,

-mox

Update

Sorry all the command line looks somthing like this

username $: perl myprog.pl 4 /path/path/path/flie.ext ##$ARGV[0] is the number of output files desired.

May God bless you monks, one and all!

Replies are listed 'Best First'.
Re: Problem with multiple loops
by davorg (Chancellor) on Oct 24, 2006 at 12:18 UTC

    If you indented your code a bit, it would be a lot easier to read :-)

    Looks like you're making it all a bit more complex than it needs to be. Here's a program that I wrote to do the same thing - perhaps it will be useful to you.

    #!/usr/bin/perl use strict; use warnings; use POSIX 'ceil'; die "Usage: splitfile num_of_files initial_file\n" unless @ARGV >= 2; my ($count, $infile) = @ARGV; open INFILE, '<', $infile or die $!; my @lines = <INFILE>; my $lines_per_file = ceil(@lines / $count); my $i = 1; open OUTFILE, '>', "$infile.$i" or die $!; my $lines_left = $lines_per_file; foreach (@lines) { unless ($lines_left) { close OUTFILE; $i++; open OUTFILE, '>', "$infile.$i" or die $!; $lines_left = $lines_per_file; } print OUTFILE; $lines_left--; }
    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Sorry about the indents I went back in fixed that ASAP

      Thank you very much for your help. As you can see am still very new to this whole Perl thing and I tend to overthink this just a little ;). I believe I am suppose to be learning how to use File::IO for this assignment, and so would like the resist using solutions that are too clever for me to fully understand

      That said, your code is amazingly useful, as in the last five minutes I think I have already seen several problem/imporvements that I should fix or make.

      Thanks again,

      -mox
Re: Problem with multiple loops
by liverpole (Monsignor) on Oct 24, 2006 at 12:26 UTC
    Hi chinamox,

    I fully agree with davorg on the indentation tip.

    Secondly, you didn't say how you're calling the program.  I've tried to guess what your arguments are, but it "works" (comes back to the command-line) for me every time.

    A simple way to debug is to sprinkle print statements throughout, and see which ones get executed and which ones don't.  I always use something like:

    print "TFD> [1] Got here\n";
    to tell me I got to a certain line.  The "TFD>" stands for "temporary for debug"; so I can take all the debug lines out later.

    Hopefully that gives you a good start in finding your problem.

    If not, please give us more specific information about what your command line looks like, and what your input files look like.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

      Thank you very much for the Debuging hint. I will use that in the future. (AKA next week!)

      -mox
Re: Problem with multiple loops
by Melly (Chaplain) on Oct 24, 2006 at 12:48 UTC

    First off, use the indents, Luke. They will help.

    You seem a little unclear (judging from your variable names) whether the first param is the number of lines per output file, or the number of files to generate - I'm assuming the latter.

    I also note that you use two different forms of opening the input file for the proper read - and you seem to use them at the same time. I've gone with the standard, non-module, based version.

    First off (and assuming that your param is number of files, not max lines), just let any surplus lines be written to your final file. I'll leave you to spot the rest of the changes (comments included)...

    #!/opt/bin/perl -w use strict; # we want a valid positive integer for number of files to split the or +iginal file into my $fn_count = $ARGV[0]; die "Bad number of files\n" unless($fn_count =~ /^\d+$/ and $fn_count +> 0); main(); exit(0); sub main{ # Work out how many lines per output file - there may be some additi +onal lines in the final file # e.g. a 10-line file split into 3 files will be 3 lines for the fir +st two, then 4 in the final file # also, if the result of the div is 0, then make it 1 open(IN, $ARGV[1])||die "Cannot open $ARGV[1]:$!\n"; my $lines = 0; $lines ++ while(<IN>); close IN; my $lines_per_file = int($lines/$fn_count); $lines_per_file = 1 if !$lines_per_file; # Set the current line count to 0 # and the current output file to 1 my $ln_count = 0; my $file_number = 1; # Re-open the input file open(IN, $ARGV[1])||die "Cannot open $ARGV[1]:$!\n"; # Open our first output file (original filename + _<file_number>) open(OUT, ">$ARGV[1]_$file_number")||die "Cannot open $ARGV[1]_$file +_number for write:$!\n"; while(<IN>){ # If we've reached our total output for this file (providing it's +not the last file)... if($ln_count == $lines_per_file and $file_number < $fn_count){ $file_number ++; #...bump this up for the next file... $ln_count = 0; # ...reset line count back to 0... # ... close the current file and open the next one close OUT; open(OUT, ">$ARGV[1]_$file_number")||die "Cannot open $ARGV[1]_$ +file_number for write:$!\n"; } # print to the current output file print OUT $_; # inc our line count $ln_count ++; } # close the final file close OUT; }
    Tom Melly, tom@tomandlu.co.uk

      Thank you for sample code with the extensive comments!

      They have helped to clear up nearly all of my questions. Without the help of programmers like you I fear I would have defenestrated my computer long ago. You have my gratitude


      -mox

        You're welcome:

        • You didn't try to hide that this was an assignment
        • Your code showed you'd made an effort (lack of tabbing notwithstanding)
        • You explained your problem clearly
        Tom Melly, tom@tomandlu.co.uk
Re: Problem with multiple loops
by Samy_rio (Vicar) on Oct 24, 2006 at 12:30 UTC

    Hi chinamox, Try to use File::Split module.

    use File::Split; my $fs = File::Split->new({keepSource=>'1'}); #####Based on Number of Lines my $files_out = $fs->split_file({'lines' => 1000},'E:\test\test.xml') +; #####Based on Number of Files my $files_out = $fs->split_file({'parts' => 10},'E:\test\test.xml');

    I have tested in Windows only. Not in Unix.

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

      Do I have to install the File::Split module or is it part of the Default Perl package?

        I'd be a little wary of using File::Split since this is an assignment... afaik it's not part of the standard distrib, so, even ignoring whether getting a module to do all your work is really going to convince your tutor, it might present problems when showing your work.

        Tom Melly, tom@tomandlu.co.uk

        It's not part of the standard distribution. You can check that by reading the perlmodlib man page that came with your version of Perl.

        --
        <http://dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: Problem with multiple loops
by graff (Chancellor) on Oct 25, 2006 at 06:35 UTC
    Since you've gotten lots of sample code to look at, let me focus just on the code you posted. I recall from your earlier post about this class assignment that you are dealing with a rather large file, and since your main probem is that your script is too slow, my comments are mostly about optimizing.

    Despite advice in an earlier thread, you have chosen to read the entire file into an array in memory, rather than dealing with just one line at a time. (This is after you have already read the file once just to count the lines -- but you would know the line count by checking the size of the array.)

    On top of that, you make a complete second copy of the file contents in memory, making a second array in reverse order. Hint: data storage in Perl always consumes more memory than you might expect. When memory required by data exceeds physical memory on the machine, your process begins to use swap space (memory-resident data needs to be swapped out to disk when not in use, and swapped back in when needed). That slows you down tremendously.

    Then, for some reason, you have chosen to open an output file, write just one line to it, and then close it, and you have to do this until you've processed the number of lines intended for the given output file, using "pop @rev_all_lines". (Why didn't you just use "shift" on the original array?)

    The OS/system-lib overhead of opening and closing files can become profoundly significant when this is done many thousands of times in a single process. The time spent per unit of data being processed becomes remarkably long when only a small amount of data is being moved for each open/close cycle. (We all prefer to open a file once, write everything to it till it's done, then close it.)

    Knowing from the earlier threads that you are just trying to split a big file into a small number of smaller files, I'm trying to figure out what is actually going on here with your arithmetic and loops. It's pretty obscure...

    my $fn_count = $ARGV[0] - ($ARGV[0] - 1); # doesn't that just always set $fn_count = 1 ? # (is it a goal of this class project to write obfuscated code?)

    But probably the worst thing is that you have two while loops, one nested within the other; the outer loop iterates over the number of output files requested on the command line. If that number is "4", then the input file must be read, beginning to end, 8 times, two copies of its contents must be created (and then destroyed/garbage-collected) 4 times, and the output file must be opened and closed as many times as there are lines to be written to that file. You absolutely do not need more than one while loop.

    If you were running on a multi-user system, I would expect other users may have been resenting all this, because I'm sure they would have been noticing the consequences...

    Altogether, the logic of the OP code falls into the category of "delete it and start over". I hope you'll be able to do that before the due date for the assignment.

    Perl happens to be a language where decent, coherent pseudo-code can translate pretty directly into executable, efficient code. But you have to start with decent, coherent pseudo-code. The way I usually push this issue is to say: start by documenting clearly and concisely what the code will do and how it will do it (in other words, figure out and describe what the solution will look like first), then write the code according to the documentation.

    Oh... and I don't know if anyone answered this question yet:

    what to do if there is a remainder? Is there a way to scoop up the remainder as and save it as a $scalar?

    That's what the "modulo" operator ("%") is for:

    $remainder = ( 12 % 5 ); # i.e. 2 $remainder = ( 14 % 7 ); # i.e. 0 # etc...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://580249]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (8)
As of 2024-03-28 15:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found