Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Can I do an "inner read"?

by digger (Friar)
on Mar 10, 2004 at 21:24 UTC ( [id://335608]=perlquestion: print w/replies, xml ) Need Help??

digger has asked for the wisdom of the Perl Monks concerning the following question:

I am in the midst of a project now that requires me to make some changes to a ~30MB file. My script will be acting like an lpr filter script to modify incoming postscript jobs. This script may be running multiple instances simultaneously, so I can't slurp the file into an array. I have examined a variety of methods to solve the following problem.

When the current line is a chapter end I have to do 3 things.
  1. Insert a chapterization command
  2. Insert a command to change the paper output tray after the next page start.
  3. On the third page following the end of a chapter, insert a command to send the newly started chapter to the binder
I can't know ahead of time how many lines, or pages will be in these chapters. I have no problem with using simple regexes to match and change the text. It is the chapter level process that has me stumped.
Here is my current code
#!/usr/bin/perl use strict; use warnings; #since first page gets treated like the end of a chapter #we start with end_chapter being true. my $end_chapter = 1; my $time = time(); #line required to redirect output to postprocessor (ie perfectbinder) my $to_pp = '<</OutputType(postprocessor)>>setpagedevice \n'; #line to force chapterization my $chapter = 'true [] /110pProcs /ProcSet findresource /setchapters g +et exec \n'; #my $outfile = "/var/spool/drop_box/autoq/".$time.".print"; my $outfile = "d:/customer files/WPS/Quint.out"; my $infile = $ARGV[0]; open (OUT, ">".$outfile) or die "Can't create temp file!!!!!!!!!"; open (IN, "<".$infile); while (<IN>){ #if its the KDKHost line, skip it my $line = del_KDKHost($_); #handle chapter endings - currently denoted by null OutputType $line = chapterize($line); if ($end_chapter) { handleSeps(); } else { print OUT "$line"; } } sub del_KDKHost { #if this the KDKHost line, delete it my $line = shift; if ($line =~ m/^%KDKHost:/){ $line = ""; } return $line; } sub chapterize { my $line = shift; if ($line =~ m!<</OutputType \(\)>>setpagedevice!) { $line = $chapter; } $end_chapter = 1; return $line; } sub handleSeps { #if we just made a chapter, or this is the first page of the file #we have to make the next 2 pages come out of the top exit my $counter= 0; #if this is the pagenumber line, increment counter #we only need work with 2 pages while (<IN> && $counter<=3) { #if we have started a new page #increment page counter and if we are on the 3rd page #since chapter break, insert line for output to perfect binder if ($_ =~ /%%BeginPageSetup/){ print "STARTED NEW PAGE"; $counter++; if ($counter==3){ $_ .= "\n $to_pp"; $counter = 0; } #if it is the OutputType line for this page, change to top out +put elsif ($_ =~ m!<</OutputType\(Stacker\)>>setpagedevice!){ $_ =~ s/Stacker/top/; } } print OUT $_; } $end_chapter = 0; }
Currently, I only get the first line of the file repeated. Its as if it is never reading past line 1. Is it possible to read from a file within a sub that is called from a while loop that is based on reading the same file? I got the idea and the term inner read from this node. Am I insane? Is there an elegant way to do this?

As always, thanks for the pointers,
digger

Replies are listed 'Best First'.
Re: Can I do an "inner read"?
by kappa (Chaplain) on Mar 11, 2004 at 13:35 UTC
    Ahhh, the same trap again :)
    while(<IN>) { print; }
    works as you expect, but
    while(<IN> && 'perl rulez') { print; }
    does not. This is documented in perlop, section "I/O Operators".
      I wish I could ++ you again. I moved the additional test inside the while loop as an if, and got the output I desired. There are a few logic errors that need to be fixed, but at least the behavior makes sense to me now.

      Thank you, thank you, and thank you again,
      digger.
Re: Can I do an "inner read"?
by revdiablo (Prior) on Mar 10, 2004 at 22:18 UTC

    I have not dissected your code, and do not know what is causing the problem you describe, but I have a couple of general comments.

    • Add some vertical whitespace! You have what appears to be a good amount of comments, but it could really use some empty lines to break things up. It makes me dizzy to try and discern the logical chunks.
    • Always check the return status of open -- even when opening a file for input. On a related note, in your die on the output file's open call, you do not include $! in the output string. This is often very helpful.
    • Perhaps instead of using a global bareword filehandle, you could use a lexically scoped, autovivified filehandle, which you pass into your subroutines as a normal argument?

    Here's a quick example demonstrating these points:

    open my $fh, "<", $input or die "$input: $!"; foo($fh); sub foo { my ($fh) = @_; my $line = <$fh>; print $line; }

    Hopefully one of these suggestions will lead you to find the problem on your own.

Re: Can I do an "inner read"?
by Roy Johnson (Monsignor) on Mar 10, 2004 at 22:17 UTC
    It sounds like to handle #3, you just need to set a flag/counter to zero and continue processing. Each time you hit a page, you increment your counter. When it gets to three, you're on the third page, and you do your magic.

    Have I misunderstood?


    The PerlMonk tr/// Advocate
      I originally started out with that solution in mind, but it seemed to get a little messy when passing the lines to a sub like I'm doing. I was hoping to just start reading where my main loop left off, and do all the chapter level processing in my handleSeps sub. Since I know I won't have any of the processes from steps 1 and 2 to deal with, I could bypass those, and just handle 3 seperately. It seems more "elegant" to me, and more maintainable in the future if I have to add more changes.

      Thanks for your input,
      digger
Re: Can I do an "inner read"?
by TilRMan (Friar) on Mar 11, 2004 at 01:44 UTC

    Use pseudocode to get a better general grasp of the problem.

    First, restate your project description more accurately (in English). Only one of the three things you actually do when the current line is a chapter end. Here's the best I could make of it:

    1. When the current line is a chapter end, insert a chapterization command.
    2. In the first two pages of a chapter, change the paper output tray.
    3. On the third page following the end of a chapter, insert a command to send the newly started chapter to the binder.

    Then translate that to pseudocode. I don't really recommend using obscenely long variable and subroutine names, but the point is that you should factor out the little pieces so that you can look at the structure of the loops and conditionals all at once.

    my $pages_this_chapter = 0; while (<IN>) { if (current_line_is_the_kdkhost_line()) { next } # Don't forget the special case # if (current_line_is_a_chapter_end() || $. == 1) { print OUT $chapterization_command; $pages_this_chapter = 0; next; # ??? } if (current_line_starts_a_new_page()) { $pages_this_chapter++; if ($pages_this_chapter == 3) { print OUT $send_to_binder } } if ($pages_this_chapter <= 2) # 3? { change_the_paper_output_tray() } print OUT; }

    That's not exactly what you want, but you get the idea. Now fill in the blanks. Remember that $_ and $. are visible everywhere. (You may prefer to change your subroutines to accept the current line and/or line number as arguments instead of peeking at $_ and $. .)

    sub current_line_is_the_kdkhost_line { m/^%KDKHost:/; } sub current_line_is_a_chapter_end { m!<</OutputType \(\)>>setpagedevice!; } sub current_line_starts_a_new_page { /%%BeginPageSetup/; } sub change_the_paper_output_tray { m!<</OutputType\(Stacker\)>>setpagedevice! and s/Stacker/top/; }

    -- 
    LP^>

Re: Can I do an "inner read"?
by captain_haddock (Novice) on Mar 12, 2004 at 16:11 UTC
    digger,

    Apologies that I don't have time to completely review your code, but I noticed a few things upfront.

    $end_chapter is initialized to 1 and always set to 1 by chapterize so that handleSeps will always get called on each line of input from the main loop.

    In handleSeps $counter is initialized to zero and incremented only at %%BeginPageSetup, but in the same if clause if it hits 3, it is reset to 0. So, $counter never makes it past three, and handleSeps will only return at EOF.

    Also, note that the first line of input (which is stored in $line) will never actually get printed to the output, as this only happens on the else clause of the main loop which is never hit.

    Of course the real killer is the catch that <> only assigns to $_ in a while statement if it is the only thing inside the conditional. In the absence of that, and since $_ is not localized here, it will retain it's previous value -- The first line read from the file. This will be continously output (as $counter is not incremented) to your temp file.

    If you are lucky enough that the first line of your input file contains %%BeginPageSetup or Output... then a slightly different string will be output to your temp file.

    After that, the temp will grow as large as your filesystem permits (or until you hit ctrl-c)

    Hope that helps....

      Thanks for the analysis captain.

      I found the first problem shortly after posting.

      I found the second issue this morning when I was able to pick this little project up again.

      Issue number 3 was the "real killer" as you say. I put my solution together, and couldn't see the side effects of other errors while this issue remained. Once kappa pointed out the underlying problem, I was able to solve the other issues pretty quickly.

      Thanks again for taking the time to look at my code.
      digger

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://335608]
Approved by PERLscienceman
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-24 00:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found