Re: Weird Character in File Makes Perl Think it's EOF

periapt
in reply to Weird Character in File Makes Perl Think it's EOF

You could certainly read the file in using binmode but, as wol noted, you do loose end-of-line handling. Depending on what is happening with your file before the parsing stage, you may want to try preprocessing it before the parse step.

Assuming that your file should only have word characters in it (as defined by \w = [a-zA-Z0-9_]), you could try this one-liner

perl -i.orig -p -e "s/\W+/?/g;" <yourfile>

This will rename the original file <yourfile>.orig and change every occurance of a non-word character to a question mark. I am assuming here that you want to retain the relative location of the offending byte. If you don't, simply write s/\W+// instead of s/\W+/?/.

If you wanted to write the output to STDOUT say before passing the data to another process you can omit the -i.orig flag

Of course, you could do it with sed or gawk but this is PerlMonks ;o).

use strict; use warnings; use diagnostics;

Re^2: Weird Character in File Makes Perl Think it's EOF
Jim
    My pre-processing suggestion would be to use tr:

    tr -d "\032" < infile > outfile


    tr "\032" " " < infile > outfile

    If you use Gawk, you have to set its BINMODE.

    Using ActivePerl for Windows, I've never had to use binmode to handle nasty ASCII control characters like NUL (0x00) and SUB (0x1A). It seems to read and write them in text mode just fine.

    D:\>perl -e "print qq{\x00\x1A\nfoo\nbar\x1A\x00\n}"  foo bar D:\>perl -e "print qq{\x00\x1A\nfoo\nbar\x1A\x00\n}" | perl -ne "print if m/foo/" foo D:\>

