Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Split string variable of log input and output pieces in text file

by Marshall (Canon)
on Apr 17, 2016 at 16:59 UTC ( [id://1160734]=note: print w/replies, xml ) Need Help??


in reply to Split string variable of log input and output pieces in text file

Sometimes for these fixed formats (and that doesn't mean "all the time"), it is easier to use split and then an array slice instead of a regex. You only have to select what you need. The first arg to split is a regex, but a simple one. Here is a demo of that. Notice the array index of -2. That is completely fine in Perl and means 2nd from the end. This isn't a good example of this, but I put the vars in the left side of the split into the order that I need them later and adjust in the indices in the array slice.
#!usr/bin/perl use warnings; use strict; my $line = '2016-04-17 10:12:27:682011 GMT tcp 115.239.248.245:1751 -> + 192.168.0.17:8080 52976f9f34d5c286ecf70cac6fba4506 04159c6111bca4f83 +d7d606a617acc5d6a58328d3a631adf3795f66a5d6265f4d1ec99977a5ae8cb2f3133 +c9503e5086a5f2ac92be196bb0c9a9f653f9669495 (312 bytes)'; my ($protocol, $ip, $port, $size)= (split /[\s:()]+/,$line)[6,7,8,-2]; print join ",",($protocol, $ip, $port, $size); print "\n"; __END__ Prints: tcp,115.239.248.245,1751,312 To test easily to find the index numbers: my @x = split /[\s:()]+/,$line; print join "\n", @x; use the line number from text editor to see indicies without counting
  • Comment on Re: Split string variable of log input and output pieces in text file
  • Download Code

Replies are listed 'Best First'.
Re^2: Split string variable of log input and output pieces in text file
by firepro20 (Novice) on Apr 17, 2016 at 18:06 UTC

    Thankyou very much for this. If you can explain how the regex works with // pattern matching that would be great. Especially since I have other log files with different formats.

      Ok, split /[\s:()]+/,$line>, please read http://perldoc.perl.org/functions/split.html.

      Split takes a line as input and makes an array according to the split regex. The split regex defines what constitutes a new array element boundary. During the split process the "separators" are "consumed", meaning deleted.

      The regex above says: "if I see one or more, spaces, colons or left paren or right paren", delete those and move what is left over to the left as an array element. This part: [6,7,8,-2] says ok, I've lots of stuff but I only want the 7th,8th,9th thing and the 2nd one from the end. Perl arrays are indexed at zero. So the first one is index[0]. Run my "hint" code and see what happens if you delete () from the regex. Experimentation is key. Run some examples and report back.

      This is not a perfect analogy, but if you had an old style typewriter and hit "carriage return" every time you saw the matching regex, you would wind up with my "hint" code.

        Thankyou Marshall for explaining this. Now I am able to adapt the split regex according to other log formats that I have

        Hi Marshall if I have a log now that I want the separators [] to be consumed, do I need to change the limiters? Or just modify the code like so:

         split /[[]\s:()]+/,$line>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1160734]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-18 12:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found