Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Removing ANSI Color Codes

by narse (Pilgrim)
on May 31, 2003 at 06:06 UTC ( #262044=perlquestion: print w/replies, xml ) Need Help??

narse has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to build a regex that will remove those pesky ANSI color codes from text (ie 0;30). My problem however is that I am not sure what to match as I cannot see what codes are being used. I have tried using:
$line =~ s/\d;\d\dm//g;
This eliminates some of the colors, but somehow some still remain and I am not sure what else to match. Is there a way to see the text without the color codes evaluated by the terminal? What colors does this exclude?

Thanks in advance.

Replies are listed 'Best First'.
Re: Removing ANSI Color Codes
by castaway (Parson) on May 31, 2003 at 06:20 UTC
    Real ANSI colour codes look like ESC[<number>m where ESC is the character 27, and can be matched using \e in a perl regular expression. The number denotes the colour, possible codes are:
    // common defines #define NORMAL "\e[0m" #define BOLD "\e[1m" #define UNDERSCORE "\e[4m" #define BLINK "\e[5m" #define INVERSE "\e[7m" // foreground colours #define ANSI_BLACK "\e[30m" #define ANSI_RED "\e[31m" #define ANSI_GREEN "\e[32m" #define ANSI_YELLOW "\e[33m" #define ANSI_BLUE "\e[34m" #define ANSI_PURPLE "\e[35m" #define ANSI_CYAN "\e[36m" #define ANSI_WHITE "\e[37m" // background colours #define ANSI_BACK_BLACK "\e[40m" #define ANSI_BACK_RED "\e[41m" #define ANSI_BACK_GREEN "\e[42m" #define ANSI_BACK_YELLOW "\e[43m" #define ANSI_BACK_BLUE "\e[44m" #define ANSI_BACK_PURPLE "\e[45m" #define ANSI_BACK_CYAN "\e[46m" #define ANSI_BACK_WHITE "\e[47m"
    (Yup, I grabbed that from a mud)
    So to match ANSI colours you need:
    $line =~ s/\e\[\d+m//g;
    To see the colour codes, at least in unix/linux, just show the file with 'less'. That makes them show up as ESC[1mESC[31m here.

    If that doesn't help, you'll need to show an example somehow.

    C.

Re: Removing ANSI Color Codes
by Aristotle (Chancellor) on May 31, 2003 at 13:33 UTC
    castaway's list is no substitute for a specification of ANSI escapes, and her resulting regex suffers from the same problem as yours, although they break on opposite cases: neither takes into account that you can put any number of colours (including just one) in the same escape sequence by separating them with a semicolon. Yours will also match more than just escape sequences. Use
    s/\e\[\d+(?>(;\d+)*)m//g;

    Makeshifts last the longest.

      Point taken, thanks Aristotle. (MUDs don't ever use the semi-colon syntax, they just hang the sequences one after the other, thats my excuse anyway. ;)

      C.

      I'm confused by your use of (?> ... ) here, aristotle... any chance you could clear it up? (I did read the description in perlre; it didn't help clear things up). Specificly, what does this pattern accomplish that s/\e\[\d+(;\d+)*m//g; doesn't?


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

        Nothing, in this case. *g* It will just fail a notch faster in cases where it can't match.

        The reason is that once the (;\d+)* stops, if what follows isn't an m, the regex engine will backtrack, giving up a bit of what (;\d+)* matched, trying to find an m. Of course we know that neither the semicolon nor \d can match something that is an m, so no backtracking in the world is going to help and make it match.

        What (?>re) does is throw away all the intermediate states once re has matched, so if backtracking seems necessary, the engine will not remember how to backtrack into the middle of re. Effectively, if the engine fails to find an m after the (?>re), it will unmatch re all at once, rather than waste time doing so character by character.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://262044]
Approved by castaway
Front-paged by halley
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2021-05-11 16:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (119 votes). Check out past polls.

    Notices?