Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Weird Character in File Makes Perl Think it's EOF

by periapt (Hermit)
on Oct 17, 2008 at 20:32 UTC ( #717852=note: print w/ replies, xml ) Need Help??


in reply to Weird Character in File Makes Perl Think it's EOF

You could certainly read the file in using binmode but, as wol noted, you do loose end-of-line handling. Depending on what is happening with your file before the parsing stage, you may want to try preprocessing it before the parse step.

Assuming that your file should only have word characters in it (as defined by \w = [a-zA-Z0-9_]), you could try this one-liner

perl -i.orig -p -e "s/\W+/?/g;" <yourfile>

This will rename the original file <yourfile>.orig and change every occurance of a non-word character to a question mark. I am assuming here that you want to retain the relative location of the offending byte. If you don't, simply write s/\W+// instead of s/\W+/?/.

If you wanted to write the output to STDOUT say before passing the data to another process you can omit the -i.orig flag

Of course, you could do it with sed or gawk but this is PerlMonks ;o).


PJ
use strict; use warnings; use diagnostics;


Comment on Re: Weird Character in File Makes Perl Think it's EOF
Download Code
Re^2: Weird Character in File Makes Perl Think it's EOF
by Jim (Curate) on Oct 19, 2008 at 20:40 UTC
    My pre-processing suggestion would be to use tr:

    tr -d "\032" < infile > outfile

    ...or...

    tr "\032" " " < infile > outfile

    If you use Gawk, you have to set its BINMODE.

    Using ActivePerl for Windows, I've never had to use binmode to handle nasty ASCII control characters like NUL (0x00) and SUB (0x1A). It seems to read and write them in text mode just fine.

    D:\>perl -e "print qq{\x00\x1A\nfoo\nbar\x1A\x00\n}"  foo bar D:\>perl -e "print qq{\x00\x1A\nfoo\nbar\x1A\x00\n}" | perl -ne "print if m/foo/" foo D:\>
    Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://717852]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2014-07-31 09:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls