Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

splitting an input stream

by Anonymous Monk
on Nov 07, 2003 at 11:27 UTC ( [id://305274]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a unix program lets call it x that processes a file and takes the following parameters:
--fileName: just a text file name --firstLine: integer --lastLine: integer
so a typical invokation could be
x --fileName=foo --firstLine=50 --lastLine=100
this will read foo from line 5 to line 100 inclusive and produce do something with them

I want to call x few times with (--firstLine,--lastLine) like this

(1,1000) (1001,2000) ....

so effectively I am processing foo in chunks of 1000 lines

if I do this and foo has millions of lines the processing time will increase as I progress with the chunks because x is scanning foo to reach firstLine.

I cannot change x and want to write a Perl wrapper for x that reads foo from line 1 to the end and send every chunk of 1000 lines to x and this way I avoid scannig foo multiple times.

Is this possible?

update (broquaint): added formatting

Replies are listed 'Best First'.
Re: splitting an input stream
by Corion (Patriarch) on Nov 07, 2003 at 11:32 UTC

    I'm sure that this is possible in Perl, and also relatively easy, but an even easier solution is available from the shell, if you have enough diskspace to hold twice the input file:

    split -1000 filename splitted for i in (splitted*); do x --filename $i # or, if x can't be started with only a filename: x --filename --firstLine=1 --lastLine=1000 done

    The split command splits your file by lines, and then the shell iterates over the resulting files and calls x for each file.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: splitting an input stream
by BrowserUk (Patriarch) on Nov 07, 2003 at 12:21 UTC

    Update: Corion ++ pointed out that I reversed the sense of your problem, sorry for my confusion.

    Instead, have your wrapper script read 1000 lines from foo, put them in a temporary file and then invoke X. Repeat till done.

    #! perl -sw use strict; open FOO, '<', 'foo' or die $!; while( not eof( FOO ) ) { my @lines; push @lines, scalar <FOO> for 1..1000; open TEMPFOO, '>tempfoo' or die $!; print TEMPFOO @lines; close TEMPFOO; qx[ x --filename:tempfoo -firstline:1 -lastline:1000 ]; } close FOO;

    Instead of having X supply you 1000 lines at a time, ask for them all and pipe the result to your perl script. In your perl script, read 1000 lines from the pipe into an array, process them, then loop back and get the next 1000 lines. Repeat till done.

    The output of X will be blocked while your script processes each batch of 1000 lines. Your script will only ever have to hold 1000 lines in memory at a time. X will never have to backtrack or skip over any lines.

    A silly example. (One liner wrapped for display)

    perl -ne"print" junk | perl -le" while( not eof(STDIN) ) { $,=' '; push @a, scalar <> for 1..10; chomp @a; print reverse @a; @a=(); }" #Outputs 10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15 14 13 12 11 30 29 28 27 26 25 24 23 22 21 40 39 38 37 36 35 34 33 32 31 50 49 48 47 46 45 44 43 42 41 60 59 58 57 56 55 54 53 52 51 70 69 68 67 66 65 64 63 62 61 80 79 78 77 76 75 74 73 72 71 90 89 88 87 86 85 84 83 82 81 100 99 98 97 96 95 94 93 92 91 110 109 108 107 106 105 104 103 102 101 ...

    The first instance of perl just reads the file junk (one integer per line), and prints it to stdout. The second instance, loops, reading 10 lines, chomping them, reversing them and printing them before emptying the array and going back for the next 10.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

      I would have done something very similar, but you should check that the last "lot" has a variable number of lines - if your input file has - say - 1340 lines, the first invocation will be from line 1 to 1000, the second from line 1 to 340 (I guess you can do that quite easily....)
Re: splitting an input stream
by ptkdb (Monk) on Nov 07, 2003 at 13:03 UTC
    Your script 'x' could output the offset into the file that it found itself at when it finished. Your wrapper script captures this, and the next time that it runs you use 'seek'(ref pg 779 Programming Perl 3rd Ed) to advance the file handle to that position the next time it runs

    You add two parameters --echo-offset and --start-at-offset=fileoffset

      I cannot change x


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://305274]
Approved by Corion
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-04-16 09:42 GMT
Find Nodes?
    Voting Booth?

    No recent polls found