Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

"em" - Emphasize text using regular expressions

by FloydATC (Deacon)
on Apr 18, 2013 at 19:16 UTC ( [id://1029408]=CUFP: print w/replies, xml ) Need Help??

Not sure if this counts as "cool use" but this little thing has become one of my favorite tools over the past few years. Pass any plaintext data through it and emphasize any text you want using regular expressions. Useful for tcpdump, server logs...anything really.

Update: After posting version 1 the problem of overlapping matches has been bugging me so much I've actually found a solution. Instead of using straight regex substitution I now do all the matching first, then apply the colors afterwards. I've tested it quite a bit but it's still experimental.

#!/usr/bin/perl use strict; use warnings; use Term::ANSIColor; my @rules = @ARGV; while (my $line = <STDIN>) { print rewrite($line, @rules); } exit; # Process a single line sub rewrite { my $line = shift; my @rules = @_; my @marks = (); # Process each rule and find areas to mark while (@rules) { my $regex = shift @rules; my $color = shift @rules || 'bold yellow'; $color = color('reset').color($color); while ($line =~ /$regex/ig) { my $reset = undef; # Scan match area to find last color foreach my $i (reverse $-[0] .. $+[0]) { if (defined $marks[$i]) { $reset = $marks[$i] unless defined $reset; $marks[$i] = undef; # Cancel previous color } } # If necessary, keep scanning to beginning of line unless (defined $reset) { foreach my $i (reverse 0 .. $-[0]) { if (defined $marks[$i]) { $reset = $marks[$i]; last; } } } # Mark area $marks[$-[0]] = $color; $marks[$+[0]] = $reset || color('reset'); } } # Apply color codes to the string foreach my $i (reverse 0 .. $#marks) { substr($line, $i, 0, $marks[$i]) if defined $marks[$i]; } return $line; } =pod =head1 NAME em - console emphasis tool version 2 =head1 DESCRIPTION em is a command line tool for visually emphasizing text in log files e +tc. by colorizing the output matching regular expressions. =head1 SYNOPSIS em REGEX1 [COLOR1] [REGEX2 [COLOR2]] ... [REGEXn [COLORn]] =head1 USAGE REGEX is any regular expression recognized by Perl. For some shells this must be enclosed in double quotes ("") to prevent the shell from interpolating special characters like * or ?. COLOR is any ANSI color string accepted by Term::ANSIColor, such as 'green' or 'bold red'. Any number of REGEX-COLOR pairs may be specified. If the number of arg +uments is odd (i.e. no COLOR is specified for the last REGEX) em will use 'bo +ld yellow'. Overlapping rules are supported. For characters that match multiple ru +les, only the last rule will be applied. =head1 EXAMPLES In a system log, emphasize the words "error" and "ok": =over tail -f /var/log/messages | em error red ok green =back In a mail server log, show all email addresses between <> in white, su +ccesses in green: =over tail -f /var/log/maillog | em "(?<=\<)[\w\-\.]+?\@[\w\-\.]+?(?=\>)" "b +old white" "stored message|delivered ok" "bold green" =back In a web server log, show all URIs in yellow: =over tail -f /var/log/httpd/access_log | em "(?<=\"get).+?\s" =back =head1 BUGS AND LIMITATIONS Multi-line matching is not implemented. All regular expressions are matched without case sensitivity. =head1 AUTHOR Andreas Lund <floyd@atc.no> =head1 COPYRIGHT AND LICENSE Copyright 2009-2013 Andreas Lund <floyd@atc.no>. This program is free +software; you may redistribute it and/or modify it under the same terms as Perl +itself. =cut
1. I would love for someone to adopt this and put it on CPAN so myself and others can get easy access to it
2. There's one annoying limitation; overlapping matches don't behave the way they should, and I can't find a way to fix it.

Update: There is one other cool way to use this tool, and that's regex testing. Simply type "em" and the regex you want to test. Example:
em "0x[0-9a-f]+"
Now input your test strings one by one, and "em" will show you exactly what matches and what doesn't. Hit Ctrl+D (EOF) to exit.

-- Time flies when you don't know what you're doing

Replies are listed 'Best First'.
Re: "em" - Emphasize text using regular expressions
by kcott (Archbishop) on Apr 18, 2013 at 20:45 UTC

    G'day FloydATC,

    "Not sure if this counts as "cool use" ..."

    I thought so. I now have a copy in ~ken/local/bin/. Thanks. (++ when the Vote Fairy next visits)

    "I would love for someone to adopt this and put it on CPAN so myself and others can get easy access to it"

    Is there a reason you can't do this yourself? Take a look at How to submit a script to CPAN.

    "There's one annoying limitation; overlapping matches don't behave the way they should, and I can't find a way to fix it."

    You'll need to document this in a little more detail than "the result can be... interesting" and "don't behave the way they should".

    -- Ken

      Is there a reason you can't do this yourself?
      I'm lazy. And a little bit out of my depth.
      You'll need to document this in a little more detail
      Update: This problem has been fixed. Finally :-)

      The problem is was really simple, it's just a bit difficult to describe precisely and even more difficult to fix properly.

      For every expression that matches, the script will would prefix the match with ANSI codes to change the color, then suffix it with another ANSI code to "reset" to the default color. If this overlaps with a previous match on the same line, the ANSI "reset" code will would remove any other color codes that should still have been in effect.

      Example 1:
      echo "aabbccddee" | em bbccdd red cc blue
      would produce
      aa[RED]bb[BLUE]cc[RESET]dd[RESET]ee
      which is incorrect because the BLUE/RESET resets "dd" which should have been RED.

      Example 2:
      echo "aabbccddee" | em cc blue bbccdd red
      would produce
      aabb[BLUE]cc[RESET]ddee
      because "bbBLUEccRESETdd" does not match "bbccdd".

      I fear the only real solution would be to replace the simple regex substitute with a state machine that uses a color stack to see what color we should reset to. Unfortunately, this means we go from a quick and simple tool to a complicated piece of code that could fail even harder.

      Instead of a simple substitution, I now keep an array of the codes I would insert. For each matching rule, I now scan this array to find the proper "reset color code" instead of just inserting a plain ANSI reset. Only when all the matching has finished I apply the codes using substr().

      -- Time flies when you don't know what you're doing

        Why are you using the color function when there's colored which can limit the scope of text that you want to get colored?

Re: "em" - Emphasize text using regular expressions
by umasuresh (Hermit) on Apr 18, 2013 at 20:37 UTC
    Very useful indeed, I think it will be useful to add a flag for line number similar to grep -n.
      No lines are added or removed so you can just add 'grep' to the pipeline if you need line numbers.

      -- Time flies when you don't know what you're doing
Re: "em" - Emphasize text using regular expressions
by reisinge (Hermit) on Apr 20, 2013 at 13:29 UTC

    GNU grep used with --color option and GREP_COLORS environment variable can do something similar.

    Well done is better than well said. -- Benjamin Franklin

      With this tool you can just type
      tail -f logfile | em "interesting"

      ...instead of
      export GREP_COLOR="33;1" && tail -f logfile | grep "interesting" --colour

      And you can do more than one regex/color in the same filter.

      -- Time flies when you don't know what you're doing
Re: "em" - Emphasize text using regular expressions
by taint (Chaplain) on Nov 06, 2013 at 23:47 UTC
    Greetings, FloydATC.
    Nice utility!
    "1. I would love for someone to adopt this and put it on CPAN so myself and others can get easy access to it"

    I believe you're right. It should be available. It could be really handy for alot of implementations.
    In fact, I took the liberty of creating a CPAN package for you. I'll be happy to upload it. But before I do, I'd prefer your inspection, and approval. :) -- Check your message box here, for all the scary details. :)

    Best wishes.

    --Chris

    #!/usr/bin/perl -Tw
    use perl::always;
    my $perl_version = (5.12.5);
    print $perl_version;
      FWIW
      I changed it's name to reflect the acronym in an effort to avoid conflits with the *BSD em driver and related man pages. So it's packaged as cet-2. I also hung .pl on the end, to more easily distinguish it. Both of which can be easily changed prior to registering it on CPAN, should you be interested.
      Just thought I'd mention it.

      Best wishes.

      --Chris

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = (5.12.5);
      print $perl_version;
        Good thinking. Thanks for adopting it and feel free to upload :-)

        -- FloydATC

        Time flies when you don't know what you're doing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://1029408]
Approved by kcott
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-18 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found