Re: ascii problem
by duff (Parson) on Jun 01, 2006 at 20:03 UTC
|
If on unix, cat filename | col -b should do the trick.
Unless, of course, it's actually the two characters ^ and H.
| [reply] [d/l] [select] |
|
Unless, of course, it's actually the two characters ^ and H.
... in which case -- staying with an all-UNIX solution -- you could convert the two characters ^ and H to the control-character ^H first 1:
cat filename | sed 's!\^H!\x08!g' | col -b
1 inserting non-ascii like \x08 might not work for all versions of sed
| [reply] [d/l] [select] |
Re: ascii problem
by samtregar (Abbot) on Jun 01, 2006 at 20:07 UTC
|
Sounds like a job for a regular expression:
#!/usr/bin/perl
my $test = "hello there ass\x08\x08wesome";
print "BEFORE: $test, ", "(length: " . length($test) . ")\n";
while ($test =~ /\x08/) {
$test =~ s/.\x08//;
}
print "AFTER: $test, ", "(length: " . length($test) . ")\n";
On my terminal that prints:
BEFORE: hello there awesome, (length: 23)
AFTER: hello there awesome, (length: 19)
You'd need to use a hex editor (or if you're elite, hexl-mode in Emacs) to confirm that the first string actually had backspaces, but the lengths match my expectations.
The regular expression works by finding any character (.) followed by a backspace (\x08) and replacing them with nothing (i.e. deleting them from the string). This is wrapped up in a loop to repeat the process as long as backspaces are present.
-sam
| [reply] [d/l] [select] |
|
That's good, but somehow I just don't like seeing you needing to test with the m// operator, and then perform a nearly equal match for the substitution. To get away with invoking the regexp engine only one time instead of twice on each loop iteration, you can do this instead:
use strict;
use warnings;
my $test = "hello there ass\x08\x08wesome";
print "BEFORE: $test, ", "(length: " . length($test) . ")\n";
while ($test =~ /.\x08/) {
$test = substr( $test, 0, $-[0] ) . substr( $test, $+[0] );
# Note: The preceeding line is the same as:
# $test = $` . $';
# but avoids using $` and $', side-stepping the global
# performance penalty associated with their use.
}
print "AFTER: $test, ", "(length: " . length($test) . ")\n";
| [reply] [d/l] |
|
1 while $test =~ s/.\x08//;
| [reply] [d/l] |
|
Re: ascii problem
by japhy (Canon) on Jun 01, 2006 at 23:10 UTC
|
my $bs;
$bs = qr{ . (??{ $bs })? \cH }x;
$str =~ s/$bs//g;
| [reply] [d/l] |
|
And more "Just because..."
$str =~ s/((?>[^\cH]*))((?>\cH+))/substr $1, 0, length($1) - length($2
+)/eg;
| [reply] [d/l] |
Re: ascii problem
by Tobin Cataldo (Monk) on Jun 01, 2006 at 20:05 UTC
|
s/.\^H//
This would get the inner match. Then...
while ($var =~ /.\^H/) {
$var =~ s/.\^H//;
}
forgot the 's', sorry. | [reply] [d/l] [select] |
|
That only works if he really has two characters for backspace - "^" and "H". More likely he's got real backspaces, aka \x08 in Perl (among many ways to write that, of which ^H is not one).
-sam
| [reply] |
|
among many ways to write that, of which ^H is not one
But "\cH" is.
| [reply] [d/l] |
|
If it is in a string then it isn't a backspace character. It is the characters '^' and 'H'.
| [reply] |
|
|
Re: ascii problem
by eye (Chaplain) on Jun 05, 2006 at 11:29 UTC
|
Borrowing from samtregar's response, I'd like to extend it in the following way:
while ($test =~ /\x08/) {
$test =~ s/^\x08+//;
$test =~ s/[^\x08]\x08//g;
}
This deals with the problem of the user backspacing past the beginning of the line.
S
| [reply] [d/l] |
Re: ascii problem
by Hue-Bond (Priest) on Jun 23, 2006 at 21:55 UTC
|
In almost all the Perl solutions posted so far, there is the common denominator while (/.\x08/). I don't like it because it scans again some parts of the string that already have been cleaned up. So I've worked out something using m/\G.../g, that continues scanning where it left off:
my $text="ABC\x08\x08\x08\x08\x08DEFGHIJKL\x08\x08M\x08NOP\x08\x08\x08
+\x08\x08\x08\x08QRST";
while ($text =~ /\G.*?(\x08+)/g) {
my $c2r = 2 * length $1; ## number of chars to remove
my $st = (pos $text) - $c2r; ## start
$st++, $c2r-- while $st < 0; ## maybe there are too many \x08's ne
+ar the beginning
substr $text, $st, $c2r, ''; ## wipe erased characters as long as
+their corresponding \x08's
pos $text = $st; ## make \G resume in a sensible place
}
print "$text\n";
__END__
DEFQRST
| [reply] [d/l] |