Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

line number ($.) problem ?

by 5mi11er (Deacon)
on Mar 27, 2007 at 16:29 UTC ( #606790=perlquestion: print w/replies, xml ) Need Help??
5mi11er has asked for the wisdom of the Perl Monks concerning the following question:


I've got a bit of an odd one here. After attempting, several times, to utilize the good tool csplit on a 4.6 Gig file, and having it seg fault on me, I resorted to rolling my own short, sweet perl script.

Essentially I've discovered a kernel log file that hasn't been rotated since September of 2005, and I'd like to keep only the data for the year 2007; I'd done a grep with line numbers and discovered that the first line for 2007 was located at line 19,035,437.

#!/usr/bin/perl $skip = 19035437; while(<>) { if(! ($. % 1000000)) { printf STDERR "processed line $.\n" } last if $. == $skip; } while(<>) { printf STDOUT $_; }
And when run (with a different skip value) on a different 2.5 Gig log file, it worked beautifully, but when run on the 4.6 Gig file, I'm getting this:
processed line 1000000 processed line 2000000 processed line 3000000 processed line 4000000 processed line 5000000 processed line 6000000 processed line 7000000 processed line 8000000 processed line 9000000 processed line 10000000 processed line 11000000 processed line 12000000 processed line 13000000 processed line 14000000 processed line 15000000 processed line 16000000 processed line 17000000 processed line 18000000 processed line 19000000 Modification of a read-only value attempted at line 13, <> +line 20804003.
Line 13 corresponds to the "printf STDOUT $_;" line.

Thinking this might be an overflow of some sort, I checked the binary representation of 20804003, and it happens to be: 1001111010111000110100011, so it's certainly not what I would think would be near an overflow.

Does anyone have any ideas of what might be happening here?

This is perl 5.8.5 running on dual intel 32 bit xeon 2.4 gig chips.


Replies are listed 'Best First'.
Re: line number ($.) problem ?
by kyle (Abbot) on Mar 27, 2007 at 16:55 UTC

    Why is this using printf and not print? Perhaps the line you're printing (in $_) contains some formatting that printf is trying to interpret, and that causes the failure.

    UPDATE: This gets the message you're getting:

    printf '%n';

    According to sprintf, the %n format is "special: *stores* the number of characters output so far into the next variable in the parameter list".

    So! What's the text on line 20804003?

      I'm relatively certain that line is:

      PIIX4: not 100% native mode: will probe irqs later


      Updated: I verified that was the problem line.

Re: line number ($.) problem ?
by shmem (Canon) on Mar 27, 2007 at 16:57 UTC
    I don't think this has to do with $., since $. isn't read-only.

    It would be interesting to see what's in $_ at that moment. From perldoc -f printf (emphasis mine):

    printf FORMAT, LIST
    Equivalent to "print FILEHANDLE sprintf(FORMAT, LIST)", except that "$\" (the output record separator) is not appended. The first argument of the list will be interpreted as the "printf" format. See "sprintf" for an explanation of the format argument. If "use locale" is in effect, the character used for the decimal point in formatted real numbers is affected by the LC_NUMERIC locale. See perllocale.

    Don't fall into the trap of using a "printf" when a simple "print" would do. The "print" is more efficient and less error prone.

    Try Use

    print STDOUT $_;

    Though I can't conceive a way in which $_ could be seen as a format that touches some read-only value while interpolating an empty list, it could be that's just what's happening here.

    update: Try

    perl -le '$_ = "%n"; printf STDOUT $_'

    hmm, always late today...


    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: line number ($.) problem ?
by BrowserUk (Pope) on Mar 27, 2007 at 16:45 UTC
Re: line number ($.) problem ?
by 5mi11er (Deacon) on Mar 27, 2007 at 17:02 UTC
    Ah, yes, thanks guys; I was thinking I needed the 'f' when specifying a filehandle. I will attempt to run it again with that fixed and report back.

    Update: That was it. Thanks again.


    (And yes, you're exactly right ikegami, I was being confused by my memories of my c coding days.)

      When stdio in C, IO functions are preceeded by f (e.g. fopen, fprintf, fputs and fgetc) to avoid confusion and conflicts with system calls and non-file version.

      That's not the case in Perl. Perl labels the system calls instead (e.g. sysopen and syswrite) instead of the high-level call, since they the system calls are not used as often. To differentiate between file and non-file version, Perl uses polymorphism (e.g. print ... vs print FILE ..., printf ... vs printf FILE ...).

Re: line number ($.) problem ?
by bsdz (Friar) on Mar 27, 2007 at 17:17 UTC
    There is always more than one way to do it. Not sure how well 'tail' works with large files though: -
    $ cat my.log 2005 2005 2005 2006 2006 2006 2007 2007 2007 2007 $ ls -l my.log -rw-r--r-- 1 bsdz mygroup 50 Mar 27 18:00 my.log $ grep -bm 1 2007 my.log 30:2007 $ tail -c $((50-30)) my.log 2007 2007 2007 2007
Re: line number ($.) problem ?
by Errto (Vicar) on Mar 28, 2007 at 15:40 UTC
    I would just use tail +19035437

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://606790]
Approved by kyle
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2017-04-23 06:09 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (430 votes). Check out past polls.