Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

binmode i/o for perl -pi in-place editing

by pryrt (Prior)
on Dec 07, 2016 at 23:24 UTC ( #1177452=perlquestion: print w/replies, xml ) Need Help??

pryrt has asked for the wisdom of the Perl Monks concerning the following question:

how do you set binmode for the input and output of a perl -pi script.pl *.file in-place edit? I had hoped by doing binmode ${^LAST_FH}; binmode STDOUT;, that would cover it...

pi.pl:

use warnings; use strict; use Data::Dumper; my ($d0, $count); BEGIN { $Data::Dumper::Terse=$Data::Dumper::Useqq=1; $Data::Dumper::Indent=0; $/=qq(\0); $main::total = 0; $main::l = 0; binmode STDIN; binmode STDOUT; }; binmode ${^LAST_FH}; $d0 = Dumper($_); $main::total += $count = s/w/v/g || 0; $main::l+=length($_); printf STDERR qq(%-32s vs %-32s: cnt=%d tot=%d l=%d\n), $d0, Dumper($_ +), $count, $main::total, $main::l;

windows cmd.exe:

@set PROMPT=$G rem create binary file perl -e "binmode STDOUT; $,=qq(\0); print qq(bin\n), qw(encoded file), + qq(r:\r\0n:\n\0nr:\n\r\0rn:\r\n), qw(with EOL-like sequences)" > src +.bin rem process perl -pi.orig pi.pl src.bin rem show the file sizes are different dir src.bin* rem use xxd.exe from gvim for windows to show hexdump xxd src.bin.orig xxd src.bin @set PROMPT=$P$G

output:

>rem create binary file >perl -e "binmode STDOUT; $,=qq(\0); print qq(bin\n), qw(encoded file) +, qq(r:\r\0n:\n\0nr:\n\r\0rn:\r\n), qw(with EOL-like sequences)" > sr +c.bin >rem process >perl -pi.orig pi.pl src.bin "bin\n\0" vs "bin\n\0" : +cnt=0 tot=0 l=5 "encoded\0" vs "encoded\0" : +cnt=0 tot=0 l=13 "file\0" vs "file\0" : +cnt=0 tot=0 l=18 "r:\r\0" vs "r:\r\0" : +cnt=0 tot=0 l=22 "n:\n\0" vs "n:\n\0" : +cnt=0 tot=0 l=26 "nr:\n\r\0" vs "nr:\n\r\0" : +cnt=0 tot=0 l=32 "rn:\r\n\0" vs "rn:\r\n\0" : +cnt=0 tot=0 l=38 "with\0" vs "vith\0" : +cnt=1 tot=1 l=43 "EOL-like\0" vs "EOL-like\0" : +cnt=0 tot=1 l=52 "sequences" vs "sequences" : +cnt=0 tot=1 l=61 >rem show the file sizes are different >dir src.bin* Volume in drive C is System Volume Serial Number is 309C-2FED Directory of C:\Users\peter.jones\Documents\HX27\LocalDatalogs\Gage\P +ASS2 12/07/2016 03:11 PM 65 src.bin 12/07/2016 03:11 PM 61 src.bin.orig 2 File(s) 126 bytes 0 Dir(s) 82,748,403,712 bytes free >rem use xxd.exe from gvim for windows to show hexdump >xxd src.bin.orig 0000000: 6269 6e0a 0065 6e63 6f64 6564 0066 696c bin..encoded.fil 0000010: 6500 723a 0d00 6e3a 0a00 6e72 3a0a 0d00 e.r:..n:..nr:... 0000020: 726e 3a0d 0a00 7769 7468 0045 4f4c 2d6c rn:...with.EOL-l 0000030: 696b 6500 7365 7175 656e 6365 73 ike.sequences >xxd src.bin 0000000: 6269 6e0d 0a00 656e 636f 6465 6400 6669 bin...encoded.fi 0000010: 6c65 0072 3a0d 006e 3a0d 0a00 6e72 3a0d le.r:..n:...nr:. 0000020: 0a0d 0072 6e3a 0d0d 0a00 7669 7468 0045 ...rn:....vith.E 0000030: 4f4c 2d6c 696b 6500 7365 7175 656e 6365 OL-like.sequence 0000040: 73 s

I've gotten the input as close to binmode as I can, in that for the sample input above, the sum of the lengths of $_ totals 61, which matches that example input. But putting binmode STDOUT in either the BEGIN, the main, or both, does not seem to help the output file be the same length. I tried perl -C0 -pi... (from perlrun), hoping to make it all :raw equivalent, but with no change. I looked thru perlvar to try to find an output-equivalent of ${^LAST_FH}, but didn't see anything.

In the end, I'll probably just stop trying the magic perl -pi and make a manual in-place loop, where I open the ARGV input and output handles myself, so I have full control. But after this much investigation, I'd really like to find out if it's possible to make the whole in-place pipeline binmode.

ps: No, converting 'w' to 'v' is not my end goal :-). I just used that as a SCCE to show the basic problem binmode issue: I want to be able to do all my working (where my actual manipulations will preserve byte-length) without changing the EOL-like characters for this binary file.

Replies are listed 'Best First'.
Re: binmode i/o for perl -pi in-place editing
by choroba (Archbishop) on Dec 07, 2016 at 23:40 UTC
    Similar to binmode and one-liners, isn't it?
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      oops, yes. Thanks. Either I didn't hit Next> enough times when I Super Searched before, or I had a typo, because I didn't find that or a few other similar ones (especially in place edit - how to do).

      Tomorrow, I'll have to try the ARGVOUT filehandle from the latter, and the -Mopen=IN,:bytes,OUT,:bytes derived from the former, and see whether one or both help. The other thought I had, while driving home, was that maybe I could grab the select-returned filehandle, and see whether that's the current loop's output handle (I would think so; I'll test it out tomorrow)

      And I'm really disappointed I didn't notice the ARGVOUT when reading perlrun, because it was literally just a few lines down from where I was reading in the perldocs, and I should have seen it. :-( Oh, well. Hopefully, good news tomorrow.

        So, what I learned:

        • \*ARGV and ${^LAST_FH} do refer to the same file
        • \*ARGVOUT and my $sh = select; do refer to the same file
        • Almost sufficient: perl -pi.orig -e "binmode ARGV; binmode ARGVOUT; s/w/v/g" src.bin : seemed to confirm that source file and output file were same length.
        • I couldn't get perl -pi.orig -Mopen=IN,:bytes,OUT,:bytes -e "s/w/v/g" src.bin to work the same as the dual bin-mode (I tried with :raw vs :bytes, and only the IN or OUT or both...
        • Unfortunately, even with the binmode, I found it wasn't sufficient: changing the input file so the first null-separated record (aka "line") has all the various EOL-like sequences showed me wrong: perl -e "binmode STDOUT; $,=qq(\0); print qq(n\nr\rnr\n\rrn\r\nend), qw(encoded file), qq(r:\r\0n:\n\0nr:\n\r\0rn:\r\n), qw(with EOL-like sequences)" > src.binperl -pi pi.pl src.bin & dir src.bin* ⇒ the output is one byte shorter than the input, because the binmode(ARGV) is too late for the first read (as was mentioned in the binmode and one-liners). Also, I'd lost the $/=qq(\0) at some point in the BEGIN block; when I re-inserted that, I found it wasn't working (because the first record had only been reading to the first newline, instead of the first null, and thus masking the fact that binmode wasn't applied for the first read).
        • I tried doing some tricks with eof (with or without empty parentheses), to no avail.
        • I tried doing BEGIN { ...; binmode ARGV; }, but it didn't help

        At this point, I'm just going to give in do it without the magic -pi. It frustrates me that I cannot get this, but since I have a less magic way working, I guess I'll have to live with it.

Re: binmode i/o for perl -pi in-place editing
by Marshall (Abbot) on Dec 09, 2016 at 00:16 UTC
    I didn't get this working with the -pi switches, but looks to me like shmem has a solution. If all else fails, what needs to be done without the -pi complication is straightforward:
    #!usr/bin/perl use strict; use warnings; open (IN, '<','src.bin') or die $!; binmode IN; open (OUT, '>','srcnew.bin') or die $!; binmode OUT; my $inbuf = do {local $/; <IN> }; $inbuf =~ s/w/v/; print OUT $inbuf; __END__ C:\PROJEC~1\testing>debug src.bin -d 62 69 6E 0A 00 65 6E 63-6F 64 65 64 00 66 69 6C bin..encoded.fil 65 00 72 3A 0D 00 6E 3A-0A 00 6E 72 3A 0A 0D 00 e.r:..n:..nr:... 72 6E 3A 0D 0A 00 77 69-74 68 00 45 4F 4C 2D 6C rn:...with.EOL-l 69 6B 65 00 73 65 71 75-65 6E 63 65 73 00 00 00 ike.sequences... 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ -q C:\PROJEC~1\testing>debug srcnew.bin -d 62 69 6E 0A 00 65 6E 63-6F 64 65 64 00 66 69 6C bin..encoded.fil 65 00 72 3A 0D 00 6E 3A-0A 00 6E 72 3A 0A 0D 00 e.r:..n:..nr:... 72 6E 3A 0D 0A 00 76 69-74 68 00 45 4F 4C 2D 6C rn:...vith.EOL-l 69 6B 65 00 73 65 71 75-65 6E 63 65 73 00 00 00 ike.sequences... 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ -q
    Update: edited out beginning line info (byte offset, etc) from the debug output in the hopes that the lines will display better on Perl Monks. "debug" is a standard Windows command from the Command Line.

    I will also add as a comment that although it is possible to preserve the line endings in a file, I almost always find that this is not desirable. You can wind up with mixed line endings that Perl can deal with, but other programs cannot. Perl will write a line ending appropriate for the platform being used for that "write". So: read the line, print the line is often the best way. That "normalizes" the line endings.

      Yeah, I ended up using something similar to your non-pi script to actually do the processing yesterday. (And today, I was able to follow shmem's suggestions to get the -pi working)

      My searches indicate that debug.exe hasn't shipped with Win7(1,2) (or probably beyond), though it did ship with Windows XP(3). Regardless of normal, my Win7 64bit installation doesn't include it. I just used xxd, because it was faster than rolling my own hex dumper in perl -- though I should have, once I was trying to bring fellow Monks to help me. :-)

      I was just using null-separated text for easy-to-create dummy data. Really, it is a binary data format, which can have the bytes 0x0A and 0x0D anywhere (not really functioning as newlines), but occasionally has embedded strings; I was trying to edit some of those occasional strings (and I couldn't get my GnuWin32 sed.exe to do the changes I wanted -- I was probably not escaping something correctly on the command line -- so I switched to perl, because I thought it would be easier (and I wanted to learn more about the -pi options, since I'd never used that combination, I'd normally just rolled my own file loop, like I ended up doing.)

      Again, everyone, thanks for your help and suggestions. I've learned more, which is always my goal here.


      1 http://superuser.com/questions/510671 /is-there-debug-exe-equivalent-for-windows7: implies it doesn't ship with Win7
      2 http://www.computerhope.com/forum/index.php?topic=129058.0: implies it doesn't ship with Win7
      3 https://technet.microsoft.com/en-us/library/bb491040.aspx: did ship with XP

        Really, it is a binary data format, which can have the bytes 0x0A and 0x0D anywhere (not really functioning as newlines), but occasionally has embedded strings; I was trying to edit some of those occasional strings...

        Something like that was my very first use of -pi back in the '90s - monkey-patching a different hostid into a binary bound to a specific Sun workstation, when due to sudden death of one machine they had to be transferred to the other which was not licensed. The only other method would have been manipulating the hostid at the OpenBoot PROM, which was deemed too dangerous (and it changes the MAC Address as well).

        Then monkey-patching windows dlls containing incomplete translations (making sure that the replacing string is equal or lower in size as the original, of course)... that type of fun stuff.

        Anyways. I am/was glad I could help ;-)

        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1177452]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2020-04-10 10:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (49 votes). Check out past polls.

    Notices?