perlpal has asked for the wisdom of the Perl Monks concerning the following question:
Hi
I am looking for a better way to perform the double split operation.
My requirement is to extract the date from the following table :
Timestamp 10.72.218.82:cpu_busy
-------------------------------------------------------------------------------
2009-11-05 17:59:52 1.501
The table is taken into a scalar variable.Hence, split is performed twice :
1: split on newline.
2: split on one or more spaces.
The code that i have written to achieve this seems to be very basic :
my @output = split(/\n/,$cmd_output);
my @return_values = split(/\s+/,$output[-1]);
my $date = $return_values[0];
Is there a better way to perform a double split operation?
Is there a better way to extract the date from the table?
Thanks in advance!
Re: Better way to perform "double split" ?
by holli (Abbot) on Nov 10, 2009 at 09:02 UTC
|
my @result = map { [split /\s+/] } split /\n/, $cmd_output
That leaves you with an array of arrays like [ ['2009-11-05 17:59:52', '1.501'], ...]. If you prefer a hash, simply make it
my %result = map { (split /\s+/) } split /\n/, $cmd_output
You probably have to ensure you exclude the header lines in your input file, like so:
my @result = map { [split /\s+/] } grep { /^\d+/ } split /\n/, $cmd_ou
+tput
This only processes lines that start with a couple of numbers. It should be sufficient in this case. Read up about grep, map and sort. They're powerful tools.
holli
You can lead your users to water, but alas, you cannot drown them.
| [reply] [d/l] [select] |
Re: Better way to perform "double split" ?
by 7stud (Deacon) on Nov 10, 2009 at 09:17 UTC
|
use strict;
use warnings;
use 5.010;
my $str =
'Timestamp 10.72.218.82:cpu_busy
----------------------------------
2009-11-05 17:59:52 1.501
';
my ($date) = $str =~ /(^\d{4}-\d{2}-\d{2})/m;
say "-->$date<--";
--output:--
-->2009-11-05<--
| [reply] [d/l] |
|
my ($date)
provides a 'list context'. In other words, $date is part of a list of variables, where the list happens to be of length 1. That list of variables expects to be assigned a list of values. In response to that demand, the match operator m/// returns a list of of the actual matches to the parenthesized groupings in the regex.
If m/// happened to return more than one value, because there were multiple parenthesized groupings in the regex, then the rules of list assignment would take over: extra values on the right hand side of a list assignment are discarded. Here is an example:
use strict;
use warnings;
use 5.010;
my ($a, $b) = (1, 2, 3);
say $a; #1
say $b; #2
| [reply] [d/l] [select] |
|
Oh, yeah....the 'm' flag allows the ^ to match at the start of every line in the string.
| [reply] |
Re: Better way to perform "double split" ?
by JavaFan (Canon) on Nov 10, 2009 at 10:51 UTC
|
Is there a better way to perform a double split operation?
Not in my book. I know different ways, but if I were to use a double split, I'd do it more or less in the same way.
Is there a better way to extract the date from the table?
If it's just to retrieve the first sequence of non-space characters after the penultimate newline, you could also write:
my ($date) = /(\S+)[^\n]*$/;
But that doesn't mean it's faster. Or clearer. | [reply] [d/l] |
Re: Better way to perform "double split" ?
by rovf (Priest) on Nov 10, 2009 at 09:36 UTC
|
| [reply] [d/l] [select] |
|
Well , i thought of that too, but it is a requirement to store the output in a scalar.
| [reply] |
|
Well, thinking of it, you could of course also operate on substr($cmd_output,rindex($cmd_output,"\n")+1) which also would eliminate one split, but I admit that this doesn't look very elegant either. But if the format is always as shown in your example (in particular, the usage of white space), you could get the date by
(split(/(\s|\n)/m,$cmd_output))[-3]
(BTW, I believe that the m modifier can even be left out here - what do the experts say?).
--
Ronald Fischer <ynnor@mm.st>
| [reply] [d/l] [select] |
Re: Better way to perform "double split" ?
by oha (Friar) on Nov 11, 2009 at 09:41 UTC
|
use strict;
use warnings;
use Parse::RecDescent;
use Data::Dumper;
my $g = Parse::RecDescent->new(<<'EOG');
main: head row(s) /\Z/ { [$item[1], @{$item[2]}]; }
| <error>
head: /.*/ /-*/ { $item[1]; }
row: date time num { [@item[1..3]]; }
date: /\d\d\d\d-\d\d-\d\d/
time: /\d\d:\d\d:\d\d/
num: /[\d\.]+/
EOG
my $data = join '', <DATA>;
print Dumper($g->main($data));
__DATA__
Timestamp 10.72.218.82:cpu_busy
----------------------------------------------------------------------
+---------
2009-11-05 17:59:52 1.501
2009-10-15 17:39:52 2.501
2009-12-25 17:19:52 3.501
will return the following, skipping empty lines and spaces, or generate a detailed error if data is invalid.
$VAR1 = [
'Timestamp 10.72.218.82:cpu_busy',
[
'2009-11-05',
'17:59:52',
'1.501'
],
[
'2009-10-15',
'17:39:52',
'2.501'
],
[
'2009-12-25',
'17:19:52',
'3.501'
]
];
Update: changing the row: rule returning a DateTime will also verify if the date is correct, improving the error detection while parsing. | [reply] [d/l] [select] |
|
|