ascii problem

kermit393 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: ascii problem by duff (Parson) on Jun 01, 2006 at 20:03 UTC
If on unix, `cat filename \| col -b` should do the trick. Unless, of course, it's actually the two characters `^` and `H`. duff	[reply] [d/l] [select]
Re^2: ascii problem by crashtest (Curate) on Jun 01, 2006 at 20:46 UTC
Unless, of course, it's actually the two characters ^ and H. ... in which case -- staying with an all-UNIX solution -- you could convert the two characters `^` and `H` to the control-character `^H` first ¹: `cat filename \| sed 's!\^H!\x08!g' \| col -b` [download] ¹ inserting non-ascii like `\x08` might not work for all versions of sed	[reply] [d/l] [select]
Re: ascii problem by samtregar (Abbot) on Jun 01, 2006 at 20:07 UTC
Sounds like a job for a regular expression: `#!/usr/bin/perl my $test = "hello there ass\x08\x08wesome"; print "BEFORE: $test, ", "(length: " . length($test) . ")\n"; while ($test =~ /\x08/) { $test =~ s/.\x08//; } print "AFTER: $test, ", "(length: " . length($test) . ")\n";` [download] On my terminal that prints: `BEFORE: hello there awesome, (length: 23) AFTER: hello there awesome, (length: 19)` [download] You'd need to use a hex editor (or if you're elite, hexl-mode in Emacs) to confirm that the first string actually had backspaces, but the lengths match my expectations. The regular expression works by finding any character (.) followed by a backspace (\x08) and replacing them with nothing (i.e. deleting them from the string). This is wrapped up in a loop to repeat the process as long as backspaces are present. -sam	[reply] [d/l] [select]
Re^2: ascii problem by davido (Cardinal) on Jun 01, 2006 at 21:00 UTC
That's good, but somehow I just don't like seeing you needing to test with the m// operator, and then perform a nearly equal match for the substitution. To get away with invoking the regexp engine only one time instead of twice on each loop iteration, you can do this instead: use strict; use warnings; my $test = "hello there ass\x08\x08wesome"; print "BEFORE: $test, ", "(length: " . length($test) . ")\n"; while ($test =~ /.\x08/) { $test = substr( $test, 0, $-[0] ) . substr( $test, $+[0] ); # Note: The preceeding line is the same as: # $test = $` . $'; # but avoids using $` and $', side-stepping the global # performance penalty associated with their use. } print "AFTER: $test, ", "(length: " . length($test) . ")\n"; [download] Dave	[reply] [d/l]
Re^3: ascii problem by runrig (Abbot) on Jun 01, 2006 at 21:46 UTC
Or just: `1 while $test =~ s/.\x08//;` [download]	[reply] [d/l]
Re^4: ascii problem by davido (Cardinal) on Jun 01, 2006 at 23:03 UTC
Re: ascii problem by japhy (Canon) on Jun 01, 2006 at 23:10 UTC
Just because... `my $bs; $bs = qr{ . (??{ $bs })? \cH }x; $str =~ s/$bs//g;` [download] Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l]
Re^2: ascii problem by chibiryuu (Beadle) on Jun 02, 2006 at 16:20 UTC
And more "Just because..." `$str =~ s/((?>[^\cH]*))((?>\cH+))/substr $1, 0, length($1) - length($2 +)/eg;` [download]	[reply] [d/l]
Re: ascii problem by Tobin Cataldo (Monk) on Jun 01, 2006 at 20:05 UTC
The basic regex would be `s/.\^H//` [download] This would get the inner match. Then... `while ($var =~ /.\^H/) { $var =~ s/.\^H//; }` [download] forgot the 's', sorry.	[reply] [d/l] [select]
Re^2: ascii problem by samtregar (Abbot) on Jun 01, 2006 at 20:11 UTC
That only works if he really has two characters for backspace - "^" and "H". More likely he's got real backspaces, aka \x08 in Perl (among many ways to write that, of which ^H is not one). -sam	[reply]
Re^3: ascii problem by bart (Canon) on Jun 01, 2006 at 20:22 UTC
among many ways to write that, of which ^H is not one But `"\cH"` is.	[reply] [d/l]
Re^3: ascii problem by Tobin Cataldo (Monk) on Jun 01, 2006 at 20:16 UTC
If it is in a string then it isn't a backspace character. It is the characters '^' and 'H'.	[reply]
Re^4: ascii problem by samtregar (Abbot) on Jun 01, 2006 at 20:19 UTC
Re^4: ascii problem by gellyfish (Monsignor) on Jun 01, 2006 at 20:25 UTC
Re: ascii problem by eye (Chaplain) on Jun 05, 2006 at 11:29 UTC
Borrowing from samtregar's response, I'd like to extend it in the following way: `while ($test =~ /\x08/) { $test =~ s/^\x08+//; $test =~ s/[^\x08]\x08//g; }` [download] This deals with the problem of the user backspacing past the beginning of the line. S	[reply] [d/l]
Re: ascii problem by Hue-Bond (Priest) on Jun 23, 2006 at 21:55 UTC
In almost all the Perl solutions posted so far, there is the common denominator `while (/.\x08/)`. I don't like it because it scans again some parts of the string that already have been cleaned up. So I've worked out something using `m/\G.../g`, that continues scanning where it left off: `my $text="ABC\x08\x08\x08\x08\x08DEFGHIJKL\x08\x08M\x08NOP\x08\x08\x08 +\x08\x08\x08\x08QRST"; while ($text =~ /\G.?(\x08+)/g) { my $c2r = 2 length $1; ## number of chars to remove my $st = (pos $text) - $c2r; ## start $st++, $c2r-- while $st < 0; ## maybe there are too many \x08's ne +ar the beginning substr $text, $st, $c2r, ''; ## wipe erased characters as long as +their corresponding \x08's pos $text = $st; ## make \G resume in a sensible place } print "$text\n"; __END__ DEFQRST` [download] -- David Serrano	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks