Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

String Matching

by stevbutt (Novice)
on Aug 13, 2012 at 23:41 UTC ( #987241=perlquestion: print w/replies, xml ) Need Help??
stevbutt has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

Please help with some wise and efficient string matching wisdom

Input :

May  2 04:06:15 exim[17905]: 2012-07-03 07:06:15 1SPPtO-0004en-PS <= [] I=[]:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost.localdomain T="Half price offer"

I want to grab the IP address ( without the square brackets ) the email address ( which always follows <= )

so far I have the ip address but with the square brackets using :

my ($srvrip) = $remainder =~ m/H=.+?(\[.+?\])/;

How can I extract the email address ?

I have a lot of lines in the log files so need this to be as efficient as possible and am also restricted to perl 5.8.4

Hope you can help

Replies are listed 'Best First'.
Re: String Matching
by davido (Archbishop) on Aug 14, 2012 at 01:51 UTC


    Here it is with nicer formatting and a basic explanation:

    m/ <=\s* (\S+) # Capture the email address following <= [^[]+\[ # Skip to the first subsequent square bracket. ([^\]]+) # Capture until a closing bracket. /x

    You can tinker with it yourself here.

    The email address will be in $1 and the IP will be in $2, following a successful match.

    Update: Silly me for trusting the OP's spec. Kenosis mentioned to me that the exim record could, in addition to <= also contain any of ==, **, =>, *>, ->, and possibly some others. So the <= anchor is probably not ideal, but could be improved upon with (?:<=|==|\*\*|=>|\*>|=>) (plus whatever others are legal).


      Thanks Dave,

      The Spec is correct - This is already in a if/ifelse statement where we know if we are dealing with == ** etc So what you have shown me is just perfect,

      many thanks


        Fantastic! My faith in humanity is restored. ;) ...and I'm glad it worked for you.


Re: String Matching
by GrandFather (Sage) on Aug 14, 2012 at 01:12 UTC

    What have you tried?

    As an aside don't fall for the "efficient as possible" tripe. Getting wrong answers fast is not generally considered a good solution. Work on getting the correct answers first then (and only if the solution takes too long to run) consider how you can make it faster.

    True laziness is hard work

      This is just so true.

Re: String Matching
by rpnoble419 (Pilgrim) on Aug 14, 2012 at 07:08 UTC
    If the layout is fixed (that is if the data changes but the position of the data does not change, then try this:
    $_='May 2 04:06:15 exim[17905]: 2012-07-03 07:06:15 1SPP +tO-0004en-PS <= [] I=[6.5.1 +4.4]:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost +.localdomain T="Half price offer"'; my @data= split(/ /); my $Email=$data[10]; my $IP=$data[12]; $IP=~s/\[//g; $IP=~s/\]//g; print "Email: $Email\n"; print "IP: $IP\n";
    As you are limited to Perl 5.8.4, regex's are not as fast as in 5.10 and up so I would try to limit the data I perform a regex on as you never know what will change and cause your program to bomb (usually at 3:00am on a Sunday morning). I would split your data into its many parts and then run what ever regex you need on a smaller data chunk. For the email you don't even need a regex. The square brackets can be removed in any number of ways, I choose the lazy way in my example.
Re: String Matching
by 2teez (Priest) on Aug 14, 2012 at 07:43 UTC

    If your logfile has it's data with fixed "width", then using unpack function can really come in handy! And you really wouldn't border on perl version you are using. see this:

    use warnings; use strict; my $str = 'May 2 04:06:15 exim[17905]: 2012-07-03 07:06:15 1SPPtO- +0004en-PS <= [] I=[ +]:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost.lo +caldomain T="Half price offer"'; my ( $e_mail, $ip ) = unpack "x82A13x21A9", $str; print "EMAIL: ", $e_mail, "\nIP: ", $ip, $/; # OR while (<DATA>) { my ( $e_mail, $ip ) = unpack "x82A13x21A9", $_; print "EMAIL: ", $e_mail, "\nIP: ", $ip, $/; } __DATA__ May 2 04:06:15 exim[17905]: 2012-07-03 07:06:15 1SPPtO-0 +004en-PS <= [] I=[] +:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost.loc +aldomain T="Half price offer"

    Check perldoc perlpacktut for more info.

    UPDATE: Oops! my bad I missed that but was pointed out by Kenosis though, Please Note however, if the length of the field to be gotten varies, then unpack function will NOT also work.
    However, I had mentioned perviously that the logfiles data MUST have a FIXED WIDTH.

Re: String Matching
by linuxkid (Sexton) on Aug 14, 2012 at 15:15 UTC

    remember that perl doesn't do greedy matching, so, try: /.*<=(.*)\s*\[(\d?\d\d\.\d?\d\d\.\d?\d\d\.\d?\d\d).*/ $1 will be the email, and $2 will be the ip.

      Just for the record, the standard quantifiers in Perl regular expressions are, indeed, greedy.

        They are greedy, but not too clever

        my $str = "xaxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"; $str =~ /(a+)/; print "$1\n"; # prints "a" not "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
        The quantifiers are greedy, but they refuse to let go of something they found unless forced even if they could get more someplace later.

        Enoch was right!
        Enjoy the last years of Rome.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://987241]
Approved by GrandFather
[Corion]: 1nickt: Not in the general sense... I only have very specific crawlers, but not a simple crawler like that ;) But maybe that would be a good application/( stress) test for Future::HTTP to parallelize
[Corion]: Also, a good application to test my API to rate limit things

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2017-10-18 11:27 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (244 votes). Check out past polls.