Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Static Data ("__DATA__" vs. "our")

by ChrisR (Hermit)
on Sep 18, 2003 at 18:52 UTC ( #292489=perlquestion: print w/replies, xml ) Need Help??
ChrisR has asked for the wisdom of the Perl Monks concerning the following question:

I have noticed an increase in the use of the __DATA__ token in the posts on PerlMonks. I understand that the data following the __DATA__ token is not loaded until requested which could be an advantage if you have a large amount of static data. However, I think it is true that same data can only be read once and only in a sequential manner which could be a disadvantage in certain situations. I suppose you could set the file pointer using seek and get a specific portion of data or re-read the data but that just doesn't sound good to me. With our declarations, the data is easily reusable in a random access manner but it gets loaded at the beginning whether it's needed later on or not.

I am curious what others think about the use of __DATA__ versus the use of our including any specific advantages/disadvantages to either way of doing it. The following examples don't have much data but it's the idea I'm interested in.
#!/usr/bin/perl -w use strict; our @data = ("a1","a2","a3","a4","a5","a6","a7","a8","a9","a0"); foreach (@data) { print "$_\n"; } exit;
#!/usr/bin/perl -w use strict; while (<DATA>) { print "$_"; } exit; __DATA__ a1 a2 a3 a4 a5 a6 a7 a8 a9 a0

Replies are listed 'Best First'.
Re: Static Data ("__DATA__" vs. "our")
by davido (Archbishop) on Sep 18, 2003 at 19:06 UTC
    I find your second example easier to read and maintain. The first is just a lot of extra typing (quotes around everything, etc.). Of course it is a simple example, and the first example could be simplified somewhat with the qw// mechanism.

    Also, remember that just as the first example assigns a list of values to an array, you could also slurp in the entire contents of the __DATA__ section with the simple my @array = <DATA>; construct. __DATA__ gives you an option: slurp, don't slurp, iterate over it several times, etc. And unlike a typical array assignment, you can manipulate the data all you want inside of the variable you read it into, and still have the original dataset there if you need to revert back to the original.

    __DATA__ can also be used to roughly simulate a here document. That too can be very convenient.


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

Re: Static Data ("__DATA__" vs. "our") (handles)
by tye (Sage) on Sep 18, 2003 at 19:15 UTC

    If you use __DATA__ in a module, then please be careful to close DATA soon after the module is first required.

    Otherwise you are tying up one of the process's finite supply of file handles. It is easy for a Perl script to use a lot more modules than it does file handles, so this has the potential to be a big problem.

                    - tye
Re: Static Data ("__DATA__" vs. "our")
by tcf22 (Priest) on Sep 18, 2003 at 19:08 UTC
    I usually use DATA for like small templates for my script output. This is really the only time I use it in the real world. I think is just a matter of preference, and not performance, but only Benchmark can tell the truth about that.

    I think the reason you see it a lot here, is that it is very easy to show what is going on when file processing questions are asked, because you can read it in like a Filehandle, but it is in the script, so you can post with the code.

    - Tom

      I think the reason you see it a lot here, is that it is very easy to show what is going on when file processing questions are asked, ....
      Yes.   Oftentimes we are trying to help posters who are having quite enough difficulty grasping basic Perl syntax, without asking them to understand why setting up a data structure and manipulating in a particular way is 'like' reading a file.   Using __DATA__ is the _gentlest_ way of helping them to understand the point of the blessed reply, without looking like we are cursing at them in perLatin.
      I'm new to this __DATA__ concept.. is ther an __END__ to define the end of the data or does the end of file define the end of data?

      This would mean that only one file can be included in one file of code?

      or can you use anything like:

      my @lines = <PEANUT>; __PEANUT__ Bing Bong Bang
      ___ /\__\ "What is the world coming to?" \/__/
        From perldata:
          __END__ and __DATA__ may be used to indicate the logical
          end of the script before the actual end of file. Any
          fol­lowing text is ignored ...
          For compatibility with older scripts written before
          __DATA__ was introduced, __END__ behaves like __DATA__
          in the toplevel script (but not in files loaded with
          "require" or "do")
        There is no token to define the end of data, but there is Inline::Files. :)
        use Inline::Files; print while <PEANUT>; print while <BUTTER>; __PEANUT__ Bing Bong Bang __BUTTER__ Foo Bar Baz


        (the triplet paradiddle with high-hat)
Re: Static Data ("__DATA__" vs. "our")
by sgifford (Prior) on Sep 18, 2003 at 19:38 UTC
    I think the reason __DATA__ is frequently used on PerlMonks (probably more than in real code) is because it's particularly well-suited for giving sample code. Many questions here are about how to process data from a sequential file, and putting the data into a __DATA__ section gives file semantics for operating on the data, but without having to include in your post "Now put this data into and then run the program with standard input redirected from".
      To similar ends, __DATA__ can also be used to test code. If you want to verify that a particular file format is being handled correctly by your program, either use an external file, or just include an example of the file's format as the __DATA__ section of the code.

      I haven't tried this, but it's possible that another use is to get clever and redirect STDIN to __DATA__. With that technique, you should be able to provide hard-wired simulated user interaction to automate testing. Say, for example, you want to test some portion of the code that requires that the user jump through several hoops before that portion of the code is executed. Just temporarily redirect STDIN to __DATA__ and then restore it as you get to that key section of code. Then hardwire into the __DATA__ section the keystrokes that will get you there. Just an untested thought. ;)


      "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

Re: Static Data ("__DATA__" vs. "our")
by ChrisR (Hermit) on Sep 18, 2003 at 21:09 UTC
    Thanks to all for your replies. It appears that the __DATA__ token is quite often a better way to handle static data simply because there are more options for what you can do with the data. Thanks again for the info.
Re: Static Data ("__DATA__" vs. "our")
by simonm (Vicar) on Sep 19, 2003 at 17:52 UTC
    suppose you could set the file pointer using seek

    One thing to watch out for -- the DATA filehandle is open to the original source file, so if you seek back to the beginning, you'll need to scan forward to the __DATA__ marker again to read the contents a second time.

    You can use this to turn your script into a quine, that is, to enable it to print its own source code:

    seek DATA, 0, 0; print <DATA> __DATA__

    (I recently found this trick in the TinyWiki source code...)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://292489]
Approved by gmax
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2017-03-29 01:33 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (343 votes). Check out past polls.