Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Perl stops reading __DATA__ when meeting SUB character on Windows

by yfnecz (Novice)
on Jan 02, 2014 at 23:06 UTC ( #1069043=perlquestion: print w/replies, xml ) Need Help??
yfnecz has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I need to write a script which would carry tar.gz archive in itself, which upon running it will save to a file on filesystem.

This script will be run under Windows environment.

I've decided to use __DATA__ section in order to store file contents.

However, I cannot later read the whole contents. File which I write is binary (tried both .dll and .tar.gz), and both contain SUB symbol (Notepad++ shows it that way). As I understand, it is equal to Ctrl^Z, and perl treats it as EOF and stops reading rest of the DATA.

But I need the whole data to be read and written to file. Any ideas how that can be accomplished?

Here is sample code (I already tried like 20-30 different ways, all with the same result):

my $outfile = "$HF_DIR/binary_copy.dat"; open (OUTFILE, ">", $outfile) or die "Not able to open the file for wr +iting. \n"; binmode (OUTFILE); binmode(DATA); do { local $/; print OUTFILE <DATA>;} ; close (DATA) or die "Not able to close the file: DATA \n"; close (OUTFILE) or die "Not able to close the file: $outfile \n";

and at the end there is __DATA__ section, with all the binary data:

__DATA__ ..binary data....SUB character......binary data... ......

So, perl only reads and writes til the first SUB character. And I need it to read and write all of the data.

Any help would be very much appreciated! Thank you in advance...

UPDATE: I decided to use base64 encryption and now all works, thanks for all your help guys!

Replies are listed 'Best First'.
Re: Perl stops reading __DATA__ when meeting SUB character on Windows (buffering)
by tye (Sage) on Jan 03, 2014 at 02:28 UTC

    binmode(DATA) would likely solve the problem if you could do it soon enough.

    My bet is that the Perl interpreter has already read far enough past __END__ / __DATA__ (reading from disk is done using buffers of several kB, even if only one line is returned to the caller) for the CTRL-Z to have caused EOF to be detected. Throwing in binmode at that point doesn't clear the EOF and you are still left with only being able to read the bytes that have been left in the handle's buffer.

    You can work around this by re-opening your sourcecode file and doing binmode() immediately. But it gets tricky to seek to the exact offset for "immediately after __END__ / __DATA__", because reading w/o binmode on Win32 means "\r" characters likely got stripped and so tell is likely to be off by several bytes.

    I'd just re-open the source code and read until I found "\n__END__\s*\n" or "\n__DATA__\s*\n" and then read all of the bytes after that. It isn't hard to write Perl code such that you never include "__END__" at the front of any line before the one where you used it to mark the end of your Perl code.

    - tye        

      Thanks, I'll try that, probably that'd work. Just wanted to know, maybe there is some other simpler way to work this around :)
      By the way, when I try to read/write from file, not from DATA, the same binary content, then after using binmode it all gets read & written. So, using this way should help.

      I don't think it matters when reading from a file

      If memory serves Ctrl-Z only ever worked on STDIN, as in

      $ perl - use Data::Dump qw/ dd /; while(<DATA>){ dd($_); } __DATA__ a "a\n" asdf^Z "asdf\32\n" ^Z
Re: Perl stops reading __DATA__ when meeting SUB character on Windows
by roboticus (Chancellor) on Jan 02, 2014 at 23:51 UTC


    This used to be a common problem 15 years ago. I haven't seen it lately, though.

    Anyway back in the bad old days of CP/M, directory entries didn't contain the exact file length. Instead it contained the number of sectors in the file. ^Z was used as a marker at the end of text files to show where the actual end of the document was.

    Various compilers and operating systems had hackarounds to emulate the behavior to "simplify" the porting of software to newer systems. In all cases, I believe it caused more trouble than it ever saved. ;^(

    It sounds like you somehow have a component of your system that's in that mode. I tried to replicate it, but can't.

    Anyway, to get around it, I'd expect that you could use the fixed-length record mode of reading the data, or perhaps using sysopen with the appropriate flag to ensure that it's opened in binary mode.

    # Using fixed-length records is something like... do { local $/ = \100; while (<DATA>) { print OUTFILE $_; } }

    Note that if you use fixed length records, you'll need to process the data further if you want it split on newlines...


    When your only tool is a hammer, all problems look like your thumb.

      Thanks, I tried that, but it didn't help either... Still it stopped reading after SUB
Re: Perl stops reading __DATA__ when meeting SUB character on Windows
by MidLifeXis (Monsignor) on Jan 03, 2014 at 13:22 UTC

    How about uuencoding / uudecoding the __DATA__ section?


      I thought about base 64, but not so sure that all the binary data can be encoded/decoded correctly. What do you think?

        That is its function - to armor binary data in an ASCII-only environment. It is (IIRC) how binaries were distributed on NNTP, for example.


Re: Perl stops reading __DATA__ when meeting SUB character on Windows (ddumper)
by Anonymous Monk on Jan 03, 2014 at 01:05 UTC
    Sure it doesn't :) ... prove it
    #!/usr/bin/perl -- use strict; use warnings; use Path::Tiny qw/ path /; path( 'yada' )->spew_raw( "#!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; while(<DATA>){ dd( \$_ ); } __DATA__ before sub the \32 after sub" ); system $^X, 'yada'; path( 'yada' )->remove;
Re: Perl stops reading __DATA__ when meeting SUB character on Windows
by ikegami (Pope) on Jan 05, 2014 at 18:37 UTC

    That's very odd. I've been using Perl on Windows, and ^Z has never interrupted reading, even without binmode.

    Without binmode, the following outputs 10 ("foo\n\cZ\nbar\n"), and 13 with ("foo\r\n\cZ\r\nbar\r\n").

    #binmode(DATA); local $/; $x = <DATA>; print(length($x), "\n"); # 10, 13 with binmode __DATA__ foo <Character 1A here> bar

    I use ActivePerl. How did you build your Perl? If binmode doesn't solve your problem, you have a broken perl. I'd be interested in seeing the output of perl -V. (That's an uppercase "V".)

      I've been using Perl on Windows, and ^Z has never interrupted reading, even without binmode.

      Perl on MS Windows has a long history of stopping reading at CTRL-Z (yes, even in files). That it predates your history with Perl on MS Windows (or just your memory) doesn't cause that history to no longer exist.

      Your program produces 4 for me, whether binmode is commented out or not (v5.12).

      I recall MS Windows changing its (default) treatment of CTRL-Z in files. This thread makes it clear that Perl did as well.

      Some experiments lead me to believe that Perl has/had separate logic for stopping at CTRL-Z when reading Perl source code (and that binmode doesn't/didn't disable/reset that). I'll leave it to others to find that in the Perl source code and determine when that got removed as well (probably after v5.12 but before ikegami's version, which he doesn't mention).

      At least, the DATA that Perl v5.12 opened for me stops at CTRL-Z whether binmode has been called on it or not (even after I control for the buffering problem I had experienced before and described elsewhere in this thread) while a DATA that I open myself reads CTRL-Z just fine, whether binmode is used or not.

      - tye        

        That it predates your history with Perl on MS Windows (or just your memory) doesn't cause thaat history to no longer exist.

        1. The OP is using binmode. That should never stop at ^Z!

        2. Unless you're saying the OP is using a Perl older than 5.6, I don't see your point. I consider anything older than 5.8 irrelevant unless explicitly mentioned.

        Your program produces 4 for me, whether bindmode is commented out or not (v5.12).

        Then I ask you the same question I asked the OP: Could I see your perl -V output?

        Update: I'm sure I had an old version of Perl that didn't treat ^Z specially, but it doesn't seem to be the case. It does appear to be a new fix, so disregard the request. I couldn't test until now.

      perl -v This is perl 5, version 12, subversion 1 (v5.12.1) built for MSWin32-x +64-multi-thread Copyright 1987-2010, Larry Wall
      P.S. i do not personally use this perl, this is perl shipped together with our software to customers, so we can use it for executing scripts.
      >perl\bin\perl -v This is perl 5, version 12, subversion 1 (v5.12.1) built for MSWin32-x +64-multi-thread Copyright 1987-2010, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at, the Perl Home Pa +ge. >perl\bin\perl -V Summary of my perl5 (revision 5 version 12 subversion 1) configuration +: Platform: osname=MSWin32, osvers=6.1, archname=MSWin32-x64-multi-thread uname='Win32 strawberryperl #1 Thu Jul 29 19:04:04 2010 x +64' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=und +ef use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags =' -s -O2 -DWIN32 -DHAVE_DES_FCRYPT -DWIN64 -DCO +NSERVATIVE -DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fno +-strict-al iasing -mms-bitfields -DPERL_MSVCRT_READFIX', optimize='-s -O2', cppflags='-DWIN32' ccversion='', gccversion='4.4.3', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='lo +ng long', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-s -L"c:\strawberry64\perl\lib\CORE" -L"c:\str +awberry64\ c\lib"' libpth=c:\strawberry64\c\lib c:\strawberry64\c\x86_64-w64-mingw32\ +lib libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 +-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm + -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdl +g32 -ladva pi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lw +inmm -lver sion -lodbc32 -lodbccp32 -lcomctl32 libc=, so=dll, useshrplib=true, libperl=libperl512.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -s -L"c:\strawberry64\perl\lib\CO +RE" -L"c:\ strawberry64\c\lib"' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_64_BIT_I +NT USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_PE +RL_ATOF USE_SITECUSTOMIZE Built under MSWin32 Compiled at Jul 29 2010 19:18:08 @INC: ................... .

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1069043]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2018-06-22 21:09 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.