Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Removing Control Characters from a File

by Banky (Acolyte)
on Jun 04, 2002 at 15:19 UTC ( #171508=perlquestion: print w/ replies, xml ) Need Help??
Banky has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm receiving a data file which I am reading in line by line which has a few embedded ^X's (Ctrl-X's) this causes the script reading from the file to stop reading the file after it has encountered the first line with one of these. Right now I replaced them with spaces in VIM however in the future I'd like to be able to automate this process and do it with Perl. I was just thinking a Signal handler might be the way to go but I don't know where to begin exactly, any suggestions?

Comment on Removing Control Characters from a File
Re: Removing Control Characters from a File
by Abigail-II (Bishop) on Jun 04, 2002 at 15:35 UTC
    This will remove ^X's from your file
    perl -0777lpwi -030e0 your_file

    Study of man perlrun will explain why this works.

    Abigail


      Very nice. This merits a closer look:
      perl -0777lpwi -030e0 file 0777 Set $/ to 0777. l Chomp $\ and set $\ = $/. Thus $\ = 0777. p Loop over file and print. w Turn warnings on. i Make the changes in-place. 030 Set $/ to 030 (^X). $\ is still 0777. e0 Empty program.

      So in effect this splits the file into records divided by ^X, chomps ^X from the end of each record and prints out the records with 0777 appended at the end. However, (and I'm not certain about this) since there is no character with that value it actually appends ''. Thus, the ^X's are removed from the file.

      So in a way the program is functionally equivalent to:     perl -i -pe 's/\30//g' file

      I say functionally equivalent since there is no aesthetic equivalence. :-)

      --
      John.

Re: Removing Control Characters from a File
by Joost (Canon) on Jun 04, 2002 at 15:39 UTC
    This shouldn't happen, except when you explicitly end your script when it recieves an ^X. Perl will eat anything you throw at it - even when reading in with something like while (<>) {, even "\00" bytes and stuff like that.

    This seems to suggest your problem is in some other part of your pipeline (maybe some other program is processing your output?), or you don't use the standard while(<>) method of reading lines from a file.

    You can ofcourse filter out the ^X characters by doing something like:

    while (<>) { s/\30//g; # ... do stuff }
    Update: ofcourse, the program you're referring to might not be yours, it might not even be perl (booh! hiss!) - in that case: see the comment by Abigail-II (Never knew there were 2)
    -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: Removing Control Characters from a File
by rbc (Curate) on Jun 04, 2002 at 16:15 UTC
    You could use \c
    while (<>) { s/\cX//g; # removes ^X's s/\cM//g; # removes ^M's ... }
Re: Removing Control Characters from a File
by Anonymous Monk on Jun 04, 2002 at 23:18 UTC
    Hi, Depend's on how much data you are reading line by line, If it's not to much, why not slurp all of it up in one go? my (@data) = <FILE_HANDLE>; Then clean up the data (process the array) in memory? Just a thought :)
Re: Removing Control Characters from a File
by Aristotle (Chancellor) on Jun 05, 2002 at 08:39 UTC

    That seems very odd to me. Why is your script coughing on C-X characters? Mine don't, and I don't think C-X is understood as a special character on Unix. If you using a strange OS that does so, you may want to binmode the filehandle before reading from it. The other case I can think of is if your data is coming down a serial port with Xon/Xoff enabled; in which case the problem is not with your script, but the port's configuration. As I haven't done any serial port work I'm not sure what you'd do to change the configuration to suit your needs, but maybe with this pointer you will be able to look in the right place.

    If it's something completely different, then please do tell us on what system you're doing this and where your data is coming from.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://171508]
Approved by derby
Front-paged by jmcnamara
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (10)
As of 2014-07-29 23:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls