Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Why are there no errors when opening a filename that contains colons on Win10

by Lotus1 (Vicar)
on Oct 09, 2019 at 19:51 UTC ( #11107261=perlquestion: print w/replies, xml ) Need Help??

Lotus1 has asked for the wisdom of the Perl Monks concerning the following question:

I'm running Perl 5.24.1 x86 on a Windows 10 (Edit) Server 2012 R2 system. My question is why are there no errors when I open a filename that contains a colon ( ':' ) character? What happens is the open works and printing to the filehandle seems to work but the file that is created has a truncated name and no output is actually put into that file. The filename is truncated at the colon character.

The reason I'm asking is that before this I trusted the open to tell me if there was a problem with creating an output file. I was trying to test my logic in a larger program and put ':' in the filename to trigger an open error and it did not appear. The program silently failed with a garbled filename for the logfile.

Testcase number 8 in this program shows the problem. All the other tests either work or produce a file open error.

use warnings; use strict; my $count = 1; my @test_characters = qw( * . " / \ [ ] : ; | ); push @test_characters, ' '; push @test_characters, ','; foreach my $char (@test_characters){ print "char = \'$char\'\n"; my $logfile = $0; $logfile =~ s/\.pl$/_$count-$char---\.log/; $count++; print "logfile = >$logfile<\n"; if(not open my $fh_log, ">", $logfile) { print "Error *** Couldn't open logfile for output: $logfile - +$! : $^E ---\n",'-'x79,"\n"; next; } else { print "-- Opened logfile for output: $logfile ---\n"; if( not print $fh_log "-- Opened logfile for output: $logfile +---\n",'-'x79,"\n" ){ print "Couldn't print to file handle.$!--\n"; } print '-'x79,"\n"; } }

This is what is printed to the display:

char = '*' logfile = >testlog_1-*---.log< Error *** Couldn't open logfile for output: testlog_1-*---.log - Invalid argument : The filename, directory name, or volume label syntax is incorrect -- +- ---------------------------------------------------------------------- +--------- char = '.' logfile = >testlog_2-.---.log< -- Opened logfile for output: testlog_2-.---.log --- ---------------------------------------------------------------------- +--------- char = '"' logfile = >testlog_3-"---.log< Error *** Couldn't open logfile for output: testlog_3-"---.log - Invalid argument : The filename, directory name, or volume label syntax is incorrect -- +- ---------------------------------------------------------------------- +--------- char = '/' logfile = >testlog_4-/---.log< Error *** Couldn't open logfile for output: testlog_4-/---.log - No such file or directory : The system cannot find the path specified --- ---------------------------------------------------------------------- +--------- char = '\' logfile = >testlog_5-\---.log< Error *** Couldn't open logfile for output: testlog_5-\---.log - No such file or directory : The system cannot find the path specified --- ---------------------------------------------------------------------- +--------- char = '[' logfile = >testlog_6-[---.log< -- Opened logfile for output: testlog_6-[---.log --- ---------------------------------------------------------------------- +--------- char = ']' logfile = >testlog_7-]---.log< -- Opened logfile for output: testlog_7-]---.log --- ---------------------------------------------------------------------- +--------- char = ':' logfile = >testlog_8-:---.log< -- Opened logfile for output: testlog_8-:---.log --- ---------------------------------------------------------------------- +--------- char = ';' logfile = >testlog_9-;---.log< -- Opened logfile for output: testlog_9-;---.log --- ---------------------------------------------------------------------- +--------- char = '|' logfile = >testlog_10-|---.log< Error *** Couldn't open logfile for output: testlog_10-|---.log - Invalid argument : The filename, directory name, or volume label syntax is incorrect -- +- ---------------------------------------------------------------------- +--------- char = ' ' logfile = >testlog_11- ---.log< -- Opened logfile for output: testlog_11- ---.log --- ---------------------------------------------------------------------- +--------- char = ',' logfile = >testlog_12-,---.log< -- Opened logfile for output: testlog_12-,---.log --- ---------------------------------------------------------------------- +---------

Here is the contents of the folder that shows the truncated filename 'testlog_8-'

Volume in drive D is APPS Volume Serial Number is XXX Directory of D:\scripts\clear_dmp_files\test 10/09/2019 03:28 PM <DIR> . 10/09/2019 03:28 PM <DIR> .. 10/09/2019 03:28 PM 0 testdir.txt 10/09/2019 03:01 PM 833 testlog.pl 10/09/2019 03:27 PM 136 testlog_11- ---.log 10/09/2019 03:27 PM 136 testlog_12-,---.log 10/09/2019 03:27 PM 135 testlog_2-.---.log 10/09/2019 03:27 PM 135 testlog_6-[---.log 10/09/2019 03:27 PM 135 testlog_7-]---.log 10/09/2019 03:27 PM 0 testlog_8- 10/09/2019 03:27 PM 135 testlog_9-;---.log 10/09/2019 03:27 PM 2,629 testlog_out.txt 10 File(s) 4,274 bytes 2 Dir(s) 129,251,168,256 bytes free

Replies are listed 'Best First'.
Re: Why are there no errors when opening a filename that contains colons on Win10
by rjt (Deacon) on Oct 09, 2019 at 20:25 UTC

    You asked Windows to create a file, and gave it a name. Windows said, "OK, no problem!", and you got a successful return from open, but behind the scenes, Windows munged the filename. That's why you don't get an error, and Perl probably can't be reasonably expected to catch this sort of thing, as it runs on over a hundred different platforms, and does not guarantee filename in = filename out.

    The colon is of course the volume separator on Win32, so while it's not a valid Win32 filename character, it is a valid path character supplied to open, (just as you might open C:\Windows\Crash). It of course isn't doing what you expect, and then I don't see how a "volume" of testlog_8- makes any sense. Chopping off the : and anything that follows, and saving a file with what would be the volume name makes even less sense, but it is what it is, I guess. DOS is, well, a bit different. :-)

    Win32 naming is, well, a bit convoluted. This MSDN naming guide article illustrates the rules.

    I'm not sure why you'd end up with random characters in your output logfile when you are generating the filename, but the solution is probably one of either a) making sure your filename generator function doesn't use those characters, or b) filtering out invalid characters before the call to open (or erroring out on invalid characters, perhaps).

    If you need cross-platform support, it's a little trickier, but you also just be extra-picky and allow alnum, underscore, and dash, for example. I'm not aware of a module that portably processes filenames, otherwise that's what I'd recommend. In practice, a simple regex usually does the trick: $filename =~ s/[^\w-]//g;, but see File::Spec for some help with volume and path components, if need be. Mind the encoding.

    use strict; use warnings; omitted for brevity.

      rjt, thanks for responding.

      The colon is of course the volume separator on Win32, so while it's not a valid Win32 filename character, it is a valid path character supplied to open, (just as you might open C:\Windows\Crash). It of course isn't doing what you expect, and then I don't see how a "volume" of testlog_8- makes any sense. Chopping off the : and anything that follows, and saving a file with what would be the volume name makes even less sense, but it is what it is, I guess. DOS is, well, a bit different. :-)

      Have a look at the output from test characters 4 and 5. Those were '/' and '\'. For those two the error given was: No such file or directory : The system cannot find the path specified. It was trying to create a file called '---.log' in a folder that did not exist. If the colon is a valid path character on Windows I would expect the same thing to happen.

      I'm not sure why you'd end up with random characters in your output logfile when you are generating the filename,[...],

      I don't follow what you mean about random characters in the output files. The print statement wrote to the files that were created successfully except for test character 8, the colon. In that case the file is empty. It seems to be a valid filehandle but the print didn't write to the file and the print seemed to succeed since it returned a true value.

        Exactly, that goes back to what I'm saying: every operating system has its own rules for valid pathnames, and its own unique semantics for what happens when you step outside of those rules. Perl, for the most part, respects those semantics, leaving it to the programmer to decide how best to handle them. If you ask to open a file, Perl passes that request along to the OS, and the return value you get is a function of what the OS itself returns. The errors for your other cases are coming from Windows, not Perl.

        I completely agree this Win32 behavior is complex and weird in spots, but it's Win32, not Perl that is giving you this behavior.

        I don't follow what you mean about random characters in the output files. The print statement wrote to the files that were created successfully except for test character 8, the colon. In that case the file is empty. It seems to be a valid filehandle but the print didn't write to the file and the print seemed to succeed since it returned a true value.

        See the Alternate Data Streams discussion in the MSDN article I linked. And add that to the list of surprises in support of validating your filenames! If you read back the file with the same script, you will actually get the contents back, even though foo appears to exist but be empty (and type foo outputs nothing):

        use autodie; my $filename = 'foo:bar.txt'; if ($ARGV[0]) { say "Skipping write."; } else { say "Writing $filename. Run $0 -skip to skip writing."; open my $fh, '>', $filename; say $fh 'Test text'; close $fh; } open my $read, '<', $filename; print "$filename: ' . <$read>; close $read; __END__ Test text

        This "works" because we're opening the Alternate Data Stream "bar.txt" of the file "foo". If you delete "foo", then "foo:bar.txt" will no longer be readable either.

        And this nicely highlights the general issue: Perl doesn't know or care what you are opening, only whether it succeeds. Perl trusts you to know what you are asking the OS to do. Similarly, if the Windows system call confirms 12 bytes were written, Perl will take Windows' word for it. It's up to you to verify the bytes written if desired, which isn't always possible (e.g., when writing to sockets). Knowing the semantics of your target platform(s) is incumbent on you whenever you are dealing with system-level code.

        Lastly, I do understand that this is experimental for you, and you have uncovered some interesting Win32 behavior. I hope these insights are helpful in answering your question.

        use strict; use warnings; omitted for brevity.
Re: Why are there no errors when opening a filename that contains colons on Win10
by soonix (Abbot) on Oct 10, 2019 at 05:35 UTC

      soonix, thanks! Your explanation and links helped me to finally understand. The Microsoft page provided a simple demonstration at the command prompt:

      > echo hello > test:stream > more < test:stream
Re: Why are there no errors when opening a filename that contains colons on Win10
by BillKSmith (Prior) on Oct 10, 2019 at 15:32 UTC

      Hi Bill.

      You expect open to identify and reject all invalid names.

      What I actually said in my OP was:

      [...] before this I trusted the open to tell me if there was a problem with creating an output file.

      In this case with a colon in the filename, a file was created successfully and output was correctly written to the Alternate Data Stream as the other monks demonstrated to me. I bet I'm not the only one who had never heard of this newer feature of Windows. I was surprised and curious about this. My current script doesn't really need to completely validate the filenames since the names are not from user inputs. However I have coworkers who sometimes make clumsy changes so I try to make things bulletproof.

      Have you seen cases where the open() function did not correctly identify and reject invalid names?

        Four of your ten test cases were valid filenames which opened correctly. Five of the six invalid names correctly failed to open. For the sixth invalid name, windows took the strange corrective action you describe as a 'feature'. I do not have a clue what makes this name special. We have no idea how other invalid names may be handled. Code which makes any assumptions about invalid names is simply not 'bullet-proof'. Your simple test probably finds enough errors to be useful. You may even consider it an advantage that it allows some special cases that are not strictly valid.
        Bill
Re: Why are there no errors when opening a filename that contains colons on Win10
by Anonymous Monk on Oct 11, 2019 at 19:26 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11107261]
Front-paged by rjt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2019-10-14 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?