Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Better way to perform "double split" ?

by perlpal (Scribe)
on Nov 10, 2009 at 08:51 UTC ( #806155=perlquestion: print w/replies, xml ) Need Help??

perlpal has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I am looking for a better way to perform the double split operation.

My requirement is to extract the date from the following table :

Timestamp 10.72.218.82:cpu_busy
-------------------------------------------------------------------------------
2009-11-05 17:59:52 1.501

The table is taken into a scalar variable.Hence, split is performed twice :
1: split on newline.
2: split on one or more spaces.
The code that i have written to achieve this seems to be very basic :

my @output = split(/\n/,$cmd_output); my @return_values = split(/\s+/,$output[-1]); my $date = $return_values[0];

Is there a better way to perform a double split operation?

Is there a better way to extract the date from the table?

Thanks in advance!

Replies are listed 'Best First'.
Re: Better way to perform "double split" ?
by holli (Abbot) on Nov 10, 2009 at 09:02 UTC
    That's what map is for:
    my @result = map { [split /\s+/] } split /\n/, $cmd_output
    That leaves you with an array of arrays like [ ['2009-11-05 17:59:52', '1.501'], ...]. If you prefer a hash, simply make it
    my %result = map { (split /\s+/) } split /\n/, $cmd_output
    You probably have to ensure you exclude the header lines in your input file, like so:
    my @result = map { [split /\s+/] } grep { /^\d+/ } split /\n/, $cmd_ou +tput
    This only processes lines that start with a couple of numbers. It should be sufficient in this case. Read up about grep, map and sort. They're powerful tools.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: Better way to perform "double split" ?
by 7stud (Deacon) on Nov 10, 2009 at 09:17 UTC

    Is there a better way to extract the date from the table?

    How about:

    use strict; use warnings; use 5.010; my $str = 'Timestamp 10.72.218.82:cpu_busy ---------------------------------- 2009-11-05 17:59:52 1.501 '; my ($date) = $str =~ /(^\d{4}-\d{2}-\d{2})/m; say "-->$date<--"; --output:-- -->2009-11-05<--

      By way of explanation:

      my ($date)

      provides a 'list context'. In other words, $date is part of a list of variables, where the list happens to be of length 1. That list of variables expects to be assigned a list of values. In response to that demand, the match operator m/// returns a list of of the actual matches to the parenthesized groupings in the regex.

      If m/// happened to return more than one value, because there were multiple parenthesized groupings in the regex, then the rules of list assignment would take over: extra values on the right hand side of a list assignment are discarded. Here is an example:

      use strict; use warnings; use 5.010; my ($a, $b) = (1, 2, 3); say $a; #1 say $b; #2
        Oh, yeah....the 'm' flag allows the ^ to match at the start of every line in the string.
Re: Better way to perform "double split" ?
by JavaFan (Canon) on Nov 10, 2009 at 10:51 UTC
    Is there a better way to perform a double split operation?
    Not in my book. I know different ways, but if I were to use a double split, I'd do it more or less in the same way.
    Is there a better way to extract the date from the table?
    If it's just to retrieve the first sequence of non-space characters after the penultimate newline, you could also write:
    my ($date) = /(\S+)[^\n]*$/;
    But that doesn't mean it's faster. Or clearer.
Re: Better way to perform "double split" ?
by rovf (Priest) on Nov 10, 2009 at 09:36 UTC
    Is it a requirement that you have stored your table in a scalar? If, for instance, you get it from a `external command`, and you know that the date line is always the last, you could store it into an @array instead and use [-1] to extract the last line, without splitting.

    -- 
    Ronald Fischer <ynnor@mm.st>
      Well , i thought of that too, but it is a requirement to store the output in a scalar.

        Well, thinking of it, you could of course also operate on substr($cmd_output,rindex($cmd_output,"\n")+1) which also would eliminate one split, but I admit that this doesn't look very elegant either. But if the format is always as shown in your example (in particular, the usage of white space), you could get the date by

        (split(/(\s|\n)/m,$cmd_output))[-3]

        (BTW, I believe that the m modifier can even be left out here - what do the experts say?).
        -- 
        Ronald Fischer <ynnor@mm.st>
Re: Better way to perform "double split" ?
by oha (Friar) on Nov 11, 2009 at 09:41 UTC
    use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $g = Parse::RecDescent->new(<<'EOG'); main: head row(s) /\Z/ { [$item[1], @{$item[2]}]; } | <error> head: /.*/ /-*/ { $item[1]; } row: date time num { [@item[1..3]]; } date: /\d\d\d\d-\d\d-\d\d/ time: /\d\d:\d\d:\d\d/ num: /[\d\.]+/ EOG my $data = join '', <DATA>; print Dumper($g->main($data)); __DATA__ Timestamp 10.72.218.82:cpu_busy ---------------------------------------------------------------------- +--------- 2009-11-05 17:59:52 1.501 2009-10-15 17:39:52 2.501 2009-12-25 17:19:52 3.501
    will return the following, skipping empty lines and spaces, or generate a detailed error if data is invalid.
    $VAR1 = [ 'Timestamp 10.72.218.82:cpu_busy', [ '2009-11-05', '17:59:52', '1.501' ], [ '2009-10-15', '17:39:52', '2.501' ], [ '2009-12-25', '17:19:52', '3.501' ] ];
    Update: changing the row: rule returning a DateTime will also verify if the date is correct, improving the error detection while parsing.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://806155]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2022-05-19 09:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (71 votes). Check out past polls.

    Notices?