Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

RegEX Doubt

by sandy105 (Scribe)
on Aug 19, 2014 at 11:39 UTC ( [id://1097966]=perlquestion: print w/replies, xml ) Need Help??

sandy105 has asked for the wisdom of the Perl Monks concerning the following question:

i have the following signature log file .i need to extract specific portion of the log file ,namely the first square baracket,second square bracket and the last patch of code after third square bracket

log signature

[part1-date] info - [..part2..] [..part3..] part4

[part1-date] log - [..part2..] [..part3..] part4

foreach (@lines) { $_ =` /\[([^]]+)\] \[([^]]+)\] \[([^]]+)\] (.*)/ || next my ($part1,$part2,$part3) = ($1,$2,$4);

this is what i have ,i cant seem to get correct result

do i have to take care of " info - " || " log - "

Replies are listed 'Best First'.
Re: RegEX Doubt
by hippo (Bishop) on Aug 19, 2014 at 12:12 UTC

    Yes, you do have to account for all the characters in your regex. It might be simpler just to split on the delimiters, though:

    echo '[part1-date] log - [..part2..] [..part3..] part4' | perl -ne 'my + @r = split (/[[\]]/, $_); print @r[1,3,6];'
Re: RegEX Doubt
by Athanasius (Archbishop) on Aug 19, 2014 at 12:07 UTC

    Hello sandy105,

    Yes, you have to allow for text such as “info - ” between the bracketed parts. But you don’t have to capture it. No need to capture the contents of the third bracketed part either, if you don’t need it:

    #! perl use strict; use warnings; while (<DATA>) { / \[ ([^]]+) \] .* \[ ([^]]+) \] .* \[ [^]]+ \] \s+ (.*) /x or nex +t; my ($part1, $part2, $part3) = ($1, $2, $3); print "1: |$part1| 2: |$part2| 3: |$part3|\n"; } __DATA__ [part1-dateA] info - [..part2..] [..part3..] part4 [part1-dateB] log - [..part2..] [..part3..] part4

    Output:

    22:04 >perl 972_SoPW.pl 1: |part1-dateA| 2: |..part2..| 3: |part4| 1: |part1-dateB| 2: |..part2..| 3: |part4| 22:04 >

    Note: I’ve added an /x modifier to the regex and whitespace within to make it easier to read.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      thank you much ..for anyone referencing later x is a modifier to allow white spaces .possible use cases include commenting regex and spacing to improve legibility

Re: RegEX Doubt
by Laurent_R (Canon) on Aug 19, 2014 at 14:54 UTC
    Hi,

    your have been given valuable answers already, but I would suggest two possible improvements in terms of readability and ease of regex construction:

    - using non-greedy quantifiers rather than negated character class for matching what is between square brackets

    - creating first a sub-regex and use it then 3 times.

    Possibly something like this:

    $_ = "[part1-date] info - [..part2..] [..part3..] part4"; $part = qr /\[(.+?)]/; # subregex using non-greedy quantifier (sl +ightly easier than /\[([^]]+)\]/) print "$1 $2 $3" if /$part \w+ - $part $part \w+/; # prints "part1 +-date ..part2.. ..part3.."
Re: RegEX Doubt
by soonix (Canon) on Aug 19, 2014 at 12:44 UTC

    Hi sandy105,

  • I wanted to propose to put [^[]* instead of the spaces between your [...sections...], but the other Monks were quicker :-)
  • I can't find =` in perlop - most probably you mean =~ ?

      yes that's correct. "=~"

Re: RegEX Doubt
by sandy105 (Scribe) on Aug 19, 2014 at 16:11 UTC

    i dont know if i should create aother thread for this ..but there is another hiccup.for the last "PART4" i need to match it with few strings eg "init code finished ","batch process code started"..

    right now i am checking part 4 in if loops ..but it looks messy .is there a better way to match part 4 with say strings from a array @match

      I think you should do it in two steps. Step 1: just capture everything that comes after [..part 3..] (which you can do when you apply the first regex), and step 2: look at what you've captured presumably in $4 at the end of the line. Otherwise, the rest of your regex (what you have already) might fail because you are trying to match more things, and these things may or may not be there.

        yes i am capturing the fourth part and then checking if any match those keywords ; but i have like n if loops ..is there a better way to search /compare that

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1097966]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-24 20:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found