Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

ascii problem

by kermit393 (Novice)
on Jun 01, 2006 at 19:43 UTC ( [id://553129]=perlquestion: print w/replies, xml ) Need Help??

kermit393 has asked for the wisdom of the Perl Monks concerning the following question:

I have a file which is a keylog which contains many strings like this: "strigs^H^Hngs" What I need is a way to take the backspace ascii characters (^H) and delete them, but also delete the corresponding characters so that this test string would end up looking like "strings" because the first ^H corresponds to the second s and the second ^H corresponds to the first g

Replies are listed 'Best First'.
Re: ascii problem
by duff (Parson) on Jun 01, 2006 at 20:03 UTC

    If on unix, cat filename | col -b should do the trick.

    Unless, of course, it's actually the two characters ^ and H.

      Unless, of course, it's actually the two characters ^ and H.
      ... in which case -- staying with an all-UNIX solution -- you could convert the two characters ^ and H to the control-character ^H first 1:
      cat filename | sed 's!\^H!\x08!g' | col -b

      1 inserting non-ascii like \x08 might not work for all versions of sed

Re: ascii problem
by samtregar (Abbot) on Jun 01, 2006 at 20:07 UTC
    Sounds like a job for a regular expression:

    #!/usr/bin/perl my $test = "hello there ass\x08\x08wesome"; print "BEFORE: $test, ", "(length: " . length($test) . ")\n"; while ($test =~ /\x08/) { $test =~ s/.\x08//; } print "AFTER: $test, ", "(length: " . length($test) . ")\n";

    On my terminal that prints:

    BEFORE: hello there awesome, (length: 23) AFTER: hello there awesome, (length: 19)

    You'd need to use a hex editor (or if you're elite, hexl-mode in Emacs) to confirm that the first string actually had backspaces, but the lengths match my expectations.

    The regular expression works by finding any character (.) followed by a backspace (\x08) and replacing them with nothing (i.e. deleting them from the string). This is wrapped up in a loop to repeat the process as long as backspaces are present.

    -sam

      That's good, but somehow I just don't like seeing you needing to test with the m// operator, and then perform a nearly equal match for the substitution. To get away with invoking the regexp engine only one time instead of twice on each loop iteration, you can do this instead:

      use strict; use warnings; my $test = "hello there ass\x08\x08wesome"; print "BEFORE: $test, ", "(length: " . length($test) . ")\n"; while ($test =~ /.\x08/) { $test = substr( $test, 0, $-[0] ) . substr( $test, $+[0] ); # Note: The preceeding line is the same as: # $test = $` . $'; # but avoids using $` and $', side-stepping the global # performance penalty associated with their use. } print "AFTER: $test, ", "(length: " . length($test) . ")\n";

      Dave

Re: ascii problem
by japhy (Canon) on Jun 01, 2006 at 23:10 UTC
      And more "Just because..."
      $str =~ s/((?>[^\cH]*))((?>\cH+))/substr $1, 0, length($1) - length($2 +)/eg;
Re: ascii problem
by Tobin Cataldo (Monk) on Jun 01, 2006 at 20:05 UTC
    The basic regex would be
    s/.\^H//
    This would get the inner match. Then...
    while ($var =~ /.\^H/) { $var =~ s/.\^H//; }
    forgot the 's', sorry.
      That only works if he really has two characters for backspace - "^" and "H". More likely he's got real backspaces, aka \x08 in Perl (among many ways to write that, of which ^H is not one).

      -sam

        among many ways to write that, of which ^H is not one

        But "\cH" is.

        If it is in a string then it isn't a backspace character. It is the characters '^' and 'H'.
Re: ascii problem
by eye (Chaplain) on Jun 05, 2006 at 11:29 UTC
    Borrowing from samtregar's response, I'd like to extend it in the following way:
    while ($test =~ /\x08/) { $test =~ s/^\x08+//; $test =~ s/[^\x08]\x08//g; }
    This deals with the problem of the user backspacing past the beginning of the line.

    S
Re: ascii problem
by Hue-Bond (Priest) on Jun 23, 2006 at 21:55 UTC

    In almost all the Perl solutions posted so far, there is the common denominator while (/.\x08/). I don't like it because it scans again some parts of the string that already have been cleaned up. So I've worked out something using m/\G.../g, that continues scanning where it left off:

    my $text="ABC\x08\x08\x08\x08\x08DEFGHIJKL\x08\x08M\x08NOP\x08\x08\x08 +\x08\x08\x08\x08QRST"; while ($text =~ /\G.*?(\x08+)/g) { my $c2r = 2 * length $1; ## number of chars to remove my $st = (pos $text) - $c2r; ## start $st++, $c2r-- while $st < 0; ## maybe there are too many \x08's ne +ar the beginning substr $text, $st, $c2r, ''; ## wipe erased characters as long as +their corresponding \x08's pos $text = $st; ## make \G resume in a sensible place } print "$text\n"; __END__ DEFQRST

    --
    David Serrano

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://553129]
Approved by polettix
Front-paged by moklevat
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-19 23:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found