Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Using the DATA file handle for ARGV

by samwyse (Scribe)
on May 30, 2013 at 14:30 UTC ( #1036065=perlmeditation: print w/replies, xml ) Need Help??

The Trick

Frequently I find myself with a one-off data file that I need to analyze with a one-off script. It's nice to keep the script with the data, so I generally put the script in the data file, with a __DATA__ marker separating the two parts.

At this point, I have another problem. One-off scripts benefit tremendously by the use of Perl's command line flags that write code for you, the '-p' and '-n' flag in particular. Those two flags wrap your code in a while (<>) ... loop, which unfortunately reads all the files listed on the command line, or STDIN if there aren't any. My data, needless to say, is in the DATA file handle. I mentioned this in the chatterbox, and choroba had the answer:

BEGIN { *ARGV = *DATA unless @ARGV }

I especially like the unless clause, since it lets me override the data source. I can see using this for test cases, where I have a bunch of default test data but can easily test against other data files as well.

Why It Works

We're overwriting one typeglob (*ARGV) with another (*DATA). A typeglob contains Perl's internal representation of everything known about the given name, which includes any scalars, arrays, hashs, or filehandles. In this case, the ARGV set of variables have several "magical" properties, which are listed in perlvar:

Contains the name of the current file when reading from <>.
The array @ARGV contains the command-line arguments intended for the script. $#ARGV is generally the number of arguments minus one, because $ARGV[0] is the first argument, not the program's command name itself. See $0 for the command name.
The special filehandle that iterates over command-line filenames in @ARGV . Usually written as the null filehandle in the angle operator <> . Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <> . In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in @ARGV.

The assignment *ARGV = *DATA will replace all of these with the only-slightly-less magical DATA values, which is cleverly not mentioned in perlvar, only in perldata. In this case, only the filehandle has any special properties. This means that the assignment also overwrites the $ARGV and @ARGV values with the undefined values of $DATA and @DATA, but I can't see many cases where you'd need those values once ARGV is gone. If I'm wrong, however, ambrus has pointed out that you could change the IO slot only, by *ARGV = *DATA{IO}

See Also...

'perl -e' and '__DATA__' What's wrong?

Re: $. - smarter than you might think

Many Thanks to...

First and foremost, choroba presented the idea in chatterbox.

shmem prodded me to write the "Why It Works" section, and also provided two of the "See also" links.

ambrus reminded us how to overwrite just one slot in a typeglob.

Replies are listed 'Best First'.
Re: Using the DATA file handle for ARGV
by choroba (Chancellor) on May 30, 2013 at 15:21 UTC
    And now I have the credit and you gather the XP... ;-)
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hey! Go look at my profile, and tell me who needs those XP more? :-)
Re: Using the DATA file handle for ARGV
by Voronich (Hermit) on May 30, 2013 at 14:39 UTC

    I really love this idea. I've been frustrated by the situation that this solves too many times to count (in most scripts in fact.)

    It's funny. I've always thought in terms of 'arguments overriding defaults' and coded with that in mind. Hijacking ARGV with defaults would never have occurred to me if I hadn't seen it here.

Re: Using the DATA file handle for ARGV
by Grimy (Pilgrim) on Jun 01, 2013 at 02:21 UTC

    This snippet is really interesting and could prove useful later, thanks for sharing (:

    There's something I find strange. Typeglob aliasing usually transfers all magic. Try this:

    *a = *\; $a = 42; print; # prints 42

    However, in this case, it looks like @ARGV and $ARGV are stripped of all magic by the manipulation (@DATA and $DATA ain't getting any magic neither). Can somebody explain what's happening there?

      it looks like @ARGV and $ARGV are stripped of all magic by the manipulation

      No, they were never magical in the first place.

      >perl -MDevel::Peek -e"Dump($ARGV,1);" 2>&1 | find "MAGIC" >perl -MDevel::Peek -e"Dump(\@ARGV,1);" 2>&1 | find "MAGIC" >perl -MDevel::Peek -e"Dump(\%ENV,1);" 2>&1 | find "MAGIC" MAGIC = 0x4c7ed8 MG_TYPE = PERL_MAGIC_env(E)

      They are simply set by reading from the magical file handle ARGV, which is no longer being read from.

      However, in this case, it looks like @ARGV and $ARGV are stripped of all magic by the manipulation (@DATA and $DATA ain't getting any magic neither). Can somebody explain what's happening there?

      Why would there be "magic", and what would this "magic" do?

        @ARGV = 'echo 42 |'; print <>; # prints 42 print $ARGV; # prints echo 42|

        Unless I'm mistaken, this illustrates that both @ARGV and $ARGV are usually magic.

        But I thing I got it. There's isn't actually any magic in @ARGV and $ARGV. It's just the magic *ARGV{IO} that pops @ARGV and writes $ARGV. Once the magic *ARGV{IO} is gone, those variables are nothing special. Or am I still completely mistaken?

Re: Using the DATA file handle for ARGV
by GotToBTru (Prior) on Nov 13, 2013 at 17:16 UTC

    I found a simple example to test this on.

    use strict; use warnings; BEGIN { *ARGV = *DATA unless @ARGV } my $timestamp = $ARGV[0]; my ($year,$month,$day,$hour,$minute,$second) = ($timestamp =~ /(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/); printf "%s is %d seconds\n",$timestamp, ((($year * 365 + $day) * 24 + +$hour) * 60 + $minute) * 60 + $second; __DATA__ 620731142301

    $timestamp is not picking up the value from DATA. What am I doing wrong?

      The trick influences how the diamond operator <> works, i.e. what *ARGV{IO} does. The behaviour of @ARGV is a different story and is not related to the trick. To see it work, change the line 5 to
      my $timestamp = <>;
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        This works for DATA, but not for an argument. The point of this trick, as I understood it, was to allow the program to use DATA if an argument was not passed.

        use strict; use warnings; BEGIN { *ARGV = *DATA unless @ARGV } my $timestamp = <>; my ($year,$month,$day,$hour,$minute,$second) = ($timestamp =~ /(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/); printf "%s is %d seconds\n",$timestamp, ((($year * 365 + $day) * 24 + +$hour) * 60 + $minute) * 60 + $second; __DATA__ 620731142301
        $: perl 620731142301 is 1957962181 seconds $: perl 131113135710 Can't open 131113135710: No such file or directory at line 5 +. Use of uninitialized value in pattern match (m//) at line 7. ...

      The point of this is to replace *ARGV with *DATA so you can use the <> operator to read from the DATA filehandle instead of files named on the command line. So your input line should be:

      my $timestamp = <>;

      Aaron B.
      Available for small or large Perl jobs and *nix system administration; see my home node.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1036065]
Approved by marto
Front-paged by Corion
[Discipulus]: we too; using opsview alarms
[marto]: the key word: outsourcing ;)
[Corion]: marto: Yeah, feels like that ;) You could set up the cronjob that auto-creates tickets :-))
[marto]: the ticketing system does not accept calls via email, nor has it a working API. It's tied into Active Directory for authentication and the Solaris boxes aren't on that domain
[Corion]: The one thing I haven't figured out a solution to is how to get an edge-trigger instead of sending an email every 5 minutes if the usage is above 90%. I want one mail when it goes over 90% but no more emails as long as it stays between 90% and 95%.
[Corion]: marto: Clever! ;)
[Corion]: You can only reach me by pager
[Corion]: Maybe the solution would be to launch a cron job every minute that takes two measurements a minute apart and sends a mail if the usage is below on the first and above threshold on the last measurement
[marto]: that's essentially it :)
[marto]: I think the long term solution would be to have sysadmins that do their job, so I don't have to do everything :P

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2017-01-24 10:10 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (203 votes). Check out past polls.