Overhead of __DATA__?

by dws (Chancellor)
by dws (Chancellor)
on Sep 05, 2001
I seek the wisdom of those with knowledge of The Source.

What overhead is incurred when a script reads from <DATA>?

Does the runtime actually open the script file and seek to a known (remembered) location? Or is some deeper optimization at work?

Is it any more or less expensive to put data after __DATA__ vs. storing it in a separate file? Are there advantages beyond convenience and the assumption that the script will still be in the kernel's disk cache?

Re: Overhead of __DATA__?
by stefp (Vicar) on Sep 05, 2001 at 02:36 UTC
    There is less overhead than storing data in a separate file. No seek necessary because the filehandle is already at the right position after the script compilation. Also, if your data section is very long and never explicitely accessed using the DATA handle, it will never be loaded from the disk (except the beginning by normal read-ahead of the OS).

    In fact if you "rewind" it by seek( DATA, 0, 0) than do print <DATA> you do print the whole script:

    seek( DATA, 0, 0); print <DATA>; __DATA__ this is no quine but downright cheating

    note to refute a remark on the CB

    To my own surprise: you can pipe this script to Perl and you will get the same result as executing it from a file because of the IO buffering. Indeed, the perl seek() does not translate on a seek() system call. So, if buffering permits, you can perl-seek a non seekable stream!!!

Re: Overhead of __DATA__?
by clintp (Curate) on Sep 05, 2001 at 02:43 UTC
    As far as I can tell (and I'm not an expert on the Perl sources), during tokenization when __DATA__ is encountered Perl creates a filehandle for the current package called DATA and leaves the file pointer right there.

    Other nonsense happens if you're on a platform that cares about binmode() and whether the script was found on stdin...<handwave/>

    (Curiously, DATA is assumed to be untainted. I wonder if this could be used to a JAPH's advantage...)

      Actually, Cirollo did just that with an Obfu... very interesting, and the first I had ever encountered it.
      What is curious about DATA being untainted when read? Would you rather the program consider itself tainted? =)

