The Trick

Frequently I find myself with a one-off data file that I need to analyze with a one-off script. It's nice to keep the script with the data, so I generally put the script in the data file, with a __DATA__ marker separating the two parts.

At this point, I have another problem. One-off scripts benefit tremendously by the use of Perl's command line flags that write code for you, the '-p' and '-n' flag in particular. Those two flags wrap your code in a while (<>) ... loop, which unfortunately reads all the files listed on the command line, or STDIN if there aren't any. My data, needless to say, is in the DATA file handle. I mentioned this in the chatterbox, and choroba had the answer:

BEGIN { *ARGV = *DATA unless @ARGV }

I especially like the unless clause, since it lets me override the data source. I can see using this for test cases, where I have a bunch of default test data but can easily test against other data files as well.

Why It Works

We're overwriting one typeglob (*ARGV) with another (*DATA). A typeglob contains Perl's internal representation of everything known about the given name, which includes any scalars, arrays, hashs, or filehandles. In this case, the ARGV set of variables have several "magical" properties, which are listed in perlvar:

$ARGV
Contains the name of the current file when reading from <>.
@ARGV
The array @ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself. See $0 for the command name.
ARGV
The special filehandle that iterates over command-line filenames in @ARGV . Usually written as the null filehandle in the angle operator <> . Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <> . In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in @ARGV.

The assignment *ARGV = *DATA will replace all of these with the only-slightly-less magical DATA values, which is cleverly not mentioned in perlvar, only in perldata. In this case, only the filehandle has any special properties. This means that the assignment also overwrites the $ARGV and @ARGV values with the undefined values of $DATA and @DATA, but I can't see many cases where you'd need those values once ARGV is gone. If I'm wrong, however, ambrus has pointed out that you could change the IO slot only, by *ARGV = *DATA{IO}

See Also...

'perl -e' and '__DATA__' What's wrong?

Re: $. - smarter than you might think

Many Thanks to...

First and foremost, choroba presented the idea in chatterbox.

shmem prodded me to write the "Why It Works" section, and also provided two of the "See also" links.

ambrus reminded us how to overwrite just one slot in a typeglob.

Comment on Using the DATA file handle for ARGV
Select or Download Code
Re: Using the DATA file handle for ARGV
by Voronich (Hermit) on May 30, 2013 at 14:39 UTC

    I really love this idea. I've been frustrated by the situation that this solves too many times to count (in most scripts in fact.)

    It's funny. I've always thought in terms of 'arguments overriding defaults' and coded with that in mind. Hijacking ARGV with defaults would never have occurred to me if I hadn't seen it here.

Re: Using the DATA file handle for ARGV
by choroba (Abbot) on May 30, 2013 at 15:21 UTC
    And now I have the credit and you gather the XP... ;-)
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hey! Go look at my profile, and tell me who needs those XP more? :-)
Re: Using the DATA file handle for ARGV
by Grimy (Monk) on Jun 01, 2013 at 02:21 UTC

    This snippet is really interesting and could prove useful later, thanks for sharing (:

    There's something I find strange. Typeglob aliasing usually transfers all magic. Try this:

    *a = *\; $a = 42; print; # prints 42

    However, in this case, it looks like @ARGV and $ARGV are stripped of all magic by the manipulation (@DATA and $DATA ain't getting any magic neither). Can somebody explain what's happening there?

      However, in this case, it looks like @ARGV and $ARGV are stripped of all magic by the manipulation (@DATA and $DATA ain't getting any magic neither). Can somebody explain what's happening there?

      Why would there be "magic", and what would this "magic" do?

        @ARGV = 'echo 42 |'; print <>; # prints 42 print $ARGV; # prints echo 42|

        Unless I'm mistaken, this illustrates that both @ARGV and $ARGV are usually magic.

        But I thing I got it. There's isn't actually any magic in @ARGV and $ARGV. It's just the magic *ARGV{IO} that pops @ARGV and writes $ARGV. Once the magic *ARGV{IO} is gone, those variables are nothing special. Or am I still completely mistaken?

      it looks like @ARGV and $ARGV are stripped of all magic by the manipulation

      No, they were never magical in the first place.

      >perl -MDevel::Peek -e"Dump($ARGV,1);" 2>&1 | find "MAGIC" >perl -MDevel::Peek -e"Dump(\@ARGV,1);" 2>&1 | find "MAGIC" >perl -MDevel::Peek -e"Dump(\%ENV,1);" 2>&1 | find "MAGIC" MAGIC = 0x4c7ed8 MG_TYPE = PERL_MAGIC_env(E)

      They are simply set by reading from the magical file handle ARGV, which is no longer being read from.

Re: Using the DATA file handle for ARGV
by GotToBTru (Hermit) on Nov 13, 2013 at 17:16 UTC

    I found a simple example to test this on.

    use strict; use warnings; BEGIN { *ARGV = *DATA unless @ARGV } my $timestamp = $ARGV[0]; my ($year,$month,$day,$hour,$minute,$second) = ($timestamp =~ /(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/); printf "%s is %d seconds\n",$timestamp, ((($year * 365 + $day) * 24 + +$hour) * 60 + $minute) * 60 + $second; __DATA__ 620731142301

    $timestamp is not picking up the value from DATA. What am I doing wrong?

      The trick influences how the diamond operator <> works, i.e. what *ARGV{IO} does. The behaviour of @ARGV is a different story and is not related to the trick. To see it work, change the line 5 to
      my $timestamp = <>;
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        This works for DATA, but not for an argument. The point of this trick, as I understood it, was to allow the program to use DATA if an argument was not passed.

        use strict; use warnings; BEGIN { *ARGV = *DATA unless @ARGV } my $timestamp = <>; my ($year,$month,$day,$hour,$minute,$second) = ($timestamp =~ /(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/); printf "%s is %d seconds\n",$timestamp, ((($year * 365 + $day) * 24 + +$hour) * 60 + $minute) * 60 + $second; __DATA__ 620731142301
        $: perl secsis.pl 620731142301 is 1957962181 seconds $: perl secsis.pl 131113135710 Can't open 131113135710: No such file or directory at secsis.pl line 5 +. Use of uninitialized value in pattern match (m//) at secsis.pl line 7. ...

      The point of this is to replace *ARGV with *DATA so you can use the <> operator to read from the DATA filehandle instead of files named on the command line. So your input line should be:

      my $timestamp = <>;

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.