Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

read ARGV ==> read on unopened filehandle

by ambrus (Abbot)
on Sep 17, 2005 at 13:15 UTC ( #492858=perlmeditation: print w/ replies, xml ) Need Help??

This meditation is about a peculiarity of perl we've discussed in the CB ages ago. I'm sorry but I can't remember who answered my question that time. Anyway, I think this is worthwile to write down, so that if anyone else runs into the same problem, they would find it with super-search.

You have to take care if you try to read the special ARGV filehandle, as some functions – including read – don't know about it being special.

For example, the following code

'read ARGV, $b, 1<<12 or die "cannot read ARGV: $!";
dies with the following message whether it's got a file name as argument or not:
read() on unopened filehandle ARGV at -e line 1. cannot read ARGV: Bad file descriptor at -e line 1.

To correct this, a simple way is to use the eof function which does know about ARGV:

() = eof(); read ARGV, $b, 1<<12 or die "cannot read ARGV: $!";
This code does what you mean, i.e. it reads at most 4K bytes from the standard input or the file given as command line argument.

(The () = is there only to supress the warning about eof in void context.)

(Update: spelling corrected, as noted by cchampion)

Update: the consensus is currently that using read on ARGV is bad practice. Nevertheless, even that can be valuable information for someone who is struggling with the same bug, so this meditation still has a point. Thanks for all explanations.

Comment on read ARGV ==> read on unopened filehandle
Select or Download Code
Re: read ARGV ==> read on unopened filehandle
by pg (Canon) on Sep 17, 2005 at 20:32 UTC

    This behavior is actually well documented in perlfunc.

    "Using eof() with empty parentheses is very different. It refers to the pseudo file formed from the files listed on the command line and accessed via the <> operator. Since <> isn't explicitly opened, as a normal filehandle is, an eof() before <> has been used will cause @ARGV to be examined to determine if input is available. Similarly, an eof() after <> has returned end-of-file will assume you are processing another @ARGV list, and if you haven't set @ARGV, will read input from STDIN;"

    You can simply read ARGV by doing the following (if your file is line-by-line):

    use strict; while (<>) { print; }
Re: read ARGV ==> read on unopened filehandle
by sauoq (Abbot) on Sep 18, 2005 at 00:39 UTC

    EEEEeeeek! Please don't do that...

    I'm sorry ambrus, but you have been badly mislead. Even pg has missed the point here. If he had gone on to say that you should only use <> to read from ARGV, he would have been on the mark. The correction you offer is no correction at all. ARGV doesn't maintain its magical properties outside a diamond operator (<>) at all. And, if you aren't using those properties, then you are probably better off not using ARGV.

    To see what I mean, put your read() in a while loop, save it in script.pl, dump some data in two test files, and try calling your script as script.pl test1.txt test2.txt. You will find that your script never gets to the data in test2.txt.

    Generally speaking, if you find yourself trying to use a useless call like () = eof(); in order to fix something... there is almost certainly a better way. And, in this case, even if your proposed fix did work without introducing the potential bugs that it does, you really wouldn't be buying much for the price you paid with obfuscated code.

    Speaking of obfuscation... using 1<<12 instead of 4096 is, uhm, perhaps a bit misguided. I'm not recommending this, but even 2**12 would be better! Using notation like that makes some sense when you are, for example, enumerating bit flags¹ but otherwise it's needlessly confusing.

    Going back to the issue at hand, though, you almost never need to reference ARGV explicitly. You can do it in your <> for clarity, of course. And doing an explicit close(ARGV); is another reason. You could pass it to a function that used <>, but that should be avoided because it's just a bug waiting to happen when someone goes and changes the implementation of that function to use read() or something. So, the bottom line is, if you want to use ARGV, just use <> and be happy you don't have to write more code. If you really need read(), then you'll have to do a bit more work.

    1. I.e. something like:

    use constant F_FOO => 1 << 0; use constant F_BAR => 1 << 1; use constant F_BAZ => 1 << 2; use constant F_QUX => 1 << 3;
    And so on... It's okay here because it's obvious what you are doing and why. And the shift contains useful information: the position of the bit associated with the flag.

    Update: changed "><" to the intended "<>" in last para.

    -sauoq
    "My two cents aren't worth a dime.";
    
      "To see what I mean, put your read() in a while loop, save it in script.pl, dump some data in two test files, and try calling your script as script.pl test1.txt test2.txt. You will find that your script never gets to the data in test2.txt."

      I knew that you had clearly stated read(), but I still want to specifically mention that this is not an issue with <>, so that nobody got confused.

      Create two data files, test1.txt:

      test1 line1 test1 line2 test1 line3

      And test2.txt:

      test2 line1 test2 line2 test2 line3

      Use the same code that I mentioned in my first post in this thread, and run perl -w blah.pl test1.txt test2.txt, and you get:

      test1 line1 test1 line2 test1 line3 test2 line1 test2 line2 test2 line3

        Uh, yes pg... that's sort of the point. You are, I think, restating the obvious. Essentially, you are saying that ARGV works correctly when used correctly. That's true, of course. It just isn't exactly what this discussion was about.

        The original node was proposing a solution to using ARGV with read() and the point is that there is no good solution to that. That's not how ARGV should be used. Instead, it should be used with <> only.

        -sauoq
        "My two cents aren't worth a dime.";
        

      You are saying that I shouldn't use read on ARGV. I think you are right here.

      I was using it as a short notation to avoid an explicit open. This was a one-liner. It didn't matter if it could handle only one file, as the one-liner didn't even have a loop: it called read once only. However, it's quite stupid to do this, as it's much easier to read from STDIN instead and use shell redirection. In a script (not a one-liner), it's of course better to use an explicit open.

      As for using 4096, I disagree with you. It doesn't really matter whether I use 2**12 or 1 << 12, they mean the same for me. (Except that 1 << 12 is a bit more verbose as it often needs to be parenthisized.) However, there's no way I'll use 4096, even in a constant definition like sub HEADER_SIZE { 4096 } instead of these. The reason is simple: once I wrote a script where I had to read a string of 256 records of 32 bytes each. I wrote 8092 instead of 8192, and I had a very bad time searching for the bug. So, I've learnt that if I want to read four kilobytes, I write 4*1024 or 4<<10 or 1<<12, but never calculate 4096 in my head.

        As a shorter notation I don't think you gained much unless I'm missing someing, which is entirly possible. From here it looks like you only saved 1 char give or take some spacing.

        () = eof(); read ARGV, $b, 1<<12 or die "cannot read ARGV: $!"; open(FH,shift) or die "cannot read file: $!";read FH, $b, 1<<12;

        ___________
        Eric Hodges

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://492858]
Approved by gmax
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-07-30 01:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls