Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: Perl Rename

by afoken (Parson)
on Jul 04, 2010 at 08:02 UTC ( #847959=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Perl Rename
in thread Perl Rename

You've hit a much deeper problem here:

Passing Arguments

On Systems derived from Unix (Linux, *BSD, and nearly every operating system with an X in its name, except for Windows XP), a program is invoked with an (ordered) list of arguments. This list was called the argument vector (from the mathematical term "vector" as a one-dimensional array of values), or short argv in C or @ARGV in perl. There is a minor difference between argv and @ARGV: The first element of argv is the name of the invoked program, the remaining elements are the "real" arguments. Perl's @ARGV contains only the "real" arguments, the program name is in $0, the result of a simple shift operation happening long before your code starts.

On Systems derived from DOS (MS-DOS, DR-DOS, FreeDOS, all versions of Windows, OS/2, and nearly every operating system with "DOS" in its name), things are different: A program is invoked with a single argument string. In old times, it was limited to as few as 126 bytes, but I think the new limit is larger with the Windows API. There originally was no way to know the name that was used to invoke the program. Late DOS versions stored that information in a well-known location in memory, Windows has an API function to get that information.

The Shells

Imagine something simple like perl whipeout.pl *.secret typed into the shell:

On DOS systems (including all Windows versions), the shell searches for perl.bat, perl.com, perl.exe (and a few more extensions on NT-based Windows versions), and invokes the first found, giving it a command line string of whipeout.pl *.secret. That's all.

On Unix systems, the shell searches for a file named perl that has at least one of the executable bits (x in ls -l output) set. Then, it expands all arguments. The string whileout.pl contains no placeholders, so it is not modified. *.secret is replaced with a list of all files in the current directory whose names end with .secret if such files exist, else it is passed as is.

Runtime Library

On Unix systems, nothing special happens. After some housekeeping for the runtime, control is passed to main().

On DOS systems, the runtime splits the command line string into elements, and some rare implementations also expand the elements. Both splitting and expanding depend on the runtime library implementation, there is no formal specification. Most split on space and tabs, and if they expand, they use rules similar to those the shell uses for its build-in commands. Finally, control is passed to main(), with the splitted and sometimes expanded command line string passed in argv. Depending on the compiler and the compiler flags used, it is also possible that the WinMain() function is invoked instead, without any processing of command line arguments, leaving that to the application. In both cases, the application can get the original command line string and parse it independant from the runtime.

Application

Perl uses the main() interface, so it expects its arguments splitted into argv.

On both systems, the Perl interpreter starts, consideres argv, initialises $0 and @ARGV from argv, loads whipeout.pl from the current directory, and runs it.

On Unix, @ARGV is either a one-element array containing *.secret, this happens when the shell could not expand *.secret, because there were no *.secret files. Or @ARGV is an array containing one or more file names, all ending with .secret. Note that the first case highly depends on the shell, and it would be ok for a shell to pass no value at all in this case.

On DOS, @ARGV is always a one-element array containing *.secret, simply because no code expanded it.

The big difference here is that on Unix, the shell is responsible for parsing and expanding user input, whereas on DOS, the shell parses the user input only partially, and passes the remaining input (nearly) unmodified to the application. It is the job of each single application to parse and expand user input. This causes lots of trouble, since the parsing is not predictable. It depends on the runtime library and the application itself.

Workaround

One workaround is to use Win32::Autoglob, which takes care of expanding @ARGV automatically only on Windows and not when the cygwin perl is used (cygwin expands the arguments). Unlike most other modules from the Win32:: namespace, it should run with any perl 5 on any operating system. Note that Win32::Autoglob does not work on DOS and OS/2 ports of perl, simply because the author forgot to check ($^O eq 'dos') or ($^O eq 'os2').

Simply calling glob for every element of @ARGV that contains * or ? is a bad idea, especially when it is done independant from the operating system. In the best case, you are only wasting time, in the worst case, you are damaging your arguments. Remember that Unix shells may also pass * and ? literally when the user told them to do so, like in perl -e 'print join " ",@ARGV' \* \<-- this is a star, isn\'t it\?

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)


Comment on Re^3: Perl Rename
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://847959]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (10)
As of 2014-12-27 18:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls