Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Weird Character in File Makes Perl Think it's EOF

by periapt (Hermit)
on Oct 17, 2008 at 20:32 UTC ( #717852=note: print w/replies, xml ) Need Help??

in reply to Weird Character in File Makes Perl Think it's EOF

You could certainly read the file in using binmode but, as wol noted, you do loose end-of-line handling. Depending on what is happening with your file before the parsing stage, you may want to try preprocessing it before the parse step.

Assuming that your file should only have word characters in it (as defined by \w = [a-zA-Z0-9_]), you could try this one-liner

perl -i.orig -p -e "s/\W+/?/g;" <yourfile>

This will rename the original file <yourfile>.orig and change every occurance of a non-word character to a question mark. I am assuming here that you want to retain the relative location of the offending byte. If you don't, simply write s/\W+// instead of s/\W+/?/.

If you wanted to write the output to STDOUT say before passing the data to another process you can omit the -i.orig flag

Of course, you could do it with sed or gawk but this is PerlMonks ;o).

use strict; use warnings; use diagnostics;

Replies are listed 'Best First'.
Re^2: Weird Character in File Makes Perl Think it's EOF
by Jim (Curate) on Oct 19, 2008 at 20:40 UTC
    My pre-processing suggestion would be to use tr:

    tr -d "\032" < infile > outfile


    tr "\032" " " < infile > outfile

    If you use Gawk, you have to set its BINMODE.

    Using ActivePerl for Windows, I've never had to use binmode to handle nasty ASCII control characters like NUL (0x00) and SUB (0x1A). It seems to read and write them in text mode just fine.

    D:\>perl -e "print qq{\x00\x1A\nfoo\nbar\x1A\x00\n}"  foo bar D:\>perl -e "print qq{\x00\x1A\nfoo\nbar\x1A\x00\n}" | perl -ne "print if m/foo/" foo D:\>

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://717852]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2018-03-24 05:09 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (297 votes). Check out past polls.