Banky has asked for the wisdom of the Perl Monks concerning the following question:

I'm receiving a data file which I am reading in line by line which has a few embedded ^X's (Ctrl-X's) this causes the script reading from the file to stop reading the file after it has encountered the first line with one of these. Right now I replaced them with spaces in VIM however in the future I'd like to be able to automate this process and do it with Perl. I was just thinking a Signal handler might be the way to go but I don't know where to begin exactly, any suggestions?
  • Comment on Removing Control Characters from a File

Replies are listed 'Best First'.
Re: Removing Control Characters from a File
by Abigail-II (Bishop) on Jun 04, 2002 at 15:35 UTC
    This will remove ^X's from your file
    perl -0777lpwi -030e0 your_file

    Study of man perlrun will explain why this works.


      Very nice. This merits a closer look:
      perl -0777lpwi -030e0 file 0777 Set $/ to 0777. l Chomp $\ and set $\ = $/. Thus $\ = 0777. p Loop over file and print. w Turn warnings on. i Make the changes in-place. 030 Set $/ to 030 (^X). $\ is still 0777. e0 Empty program.

      So in effect this splits the file into records divided by ^X, chomps ^X from the end of each record and prints out the records with 0777 appended at the end. However, (and I'm not certain about this) since there is no character with that value it actually appends ''. Thus, the ^X's are removed from the file.

      So in a way the program is functionally equivalent to:     perl -i -pe 's/\30//g' file

      I say functionally equivalent since there is no aesthetic equivalence. :-)


Re: Removing Control Characters from a File
by Joost (Canon) on Jun 04, 2002 at 15:39 UTC
    This shouldn't happen, except when you explicitly end your script when it recieves an ^X. Perl will eat anything you throw at it - even when reading in with something like while (<>) {, even "\00" bytes and stuff like that.

    This seems to suggest your problem is in some other part of your pipeline (maybe some other program is processing your output?), or you don't use the standard while(<>) method of reading lines from a file.

    You can ofcourse filter out the ^X characters by doing something like:

    while (<>) { s/\30//g; # ... do stuff }
    Update: ofcourse, the program you're referring to might not be yours, it might not even be perl (booh! hiss!) - in that case: see the comment by Abigail-II (Never knew there were 2)
    -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: Removing Control Characters from a File
by rbc (Curate) on Jun 04, 2002 at 16:15 UTC
    You could use \c
    while (<>) { s/\cX//g; # removes ^X's s/\cM//g; # removes ^M's ... }
Re: Removing Control Characters from a File
by Aristotle (Chancellor) on Jun 05, 2002 at 08:39 UTC

    That seems very odd to me. Why is your script coughing on C-X characters? Mine don't, and I don't think C-X is understood as a special character on Unix. If you using a strange OS that does so, you may want to binmode the filehandle before reading from it. The other case I can think of is if your data is coming down a serial port with Xon/Xoff enabled; in which case the problem is not with your script, but the port's configuration. As I haven't done any serial port work I'm not sure what you'd do to change the configuration to suit your needs, but maybe with this pointer you will be able to look in the right place.

    If it's something completely different, then please do tell us on what system you're doing this and where your data is coming from.

    Makeshifts last the longest.

Re: Removing Control Characters from a File
by Anonymous Monk on Jun 04, 2002 at 23:18 UTC
    Hi, Depend's on how much data you are reading line by line, If it's not to much, why not slurp all of it up in one go? my (@data) = <FILE_HANDLE>; Then clean up the data (process the array) in memory? Just a thought :)