Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Multiple line records from a command string

by Preceptor (Deacon)
on Feb 23, 2010 at 14:50 UTC ( [id://824873]=perlquestion: print w/replies, xml ) Need Help??

Preceptor has asked for the wisdom of the Perl Monks concerning the following question:

I'm probably being a little daft, but I'm sort of trying to get my head around the most elegant way to accomplish this, rather than one that 'just works'.
If I run a particular command I (long) list of entries, that look something like:
Symmetrix ID: Device Physical Name : \\.\PHYSICALDRIVE1 Device Symmetrix Name : 0062 (VCM) Device Serial ID : 6000062081 Symmetrix ID : 002020202220 Attached BCV Device : N/A Attached VDEV TGT Device : N/A Vendor ID : EMC Product ID : SYMMETRIX Product Revision : 5772 Device WWN : 222222222222222222 Device Emulation Type : FBA Device Defined Label Type: N/A Device Defined Label : N/A Device Sub System Id : 0x0001 Cache Partition Name : DEFAULT_PARTITION Device Block Size : 512 Device Capacity { Cylinders : 6 Tracks : 90 512-byte Blocks : 11520 MegaBytes : 6 KiloBytes : 5760 } Device Configuration : 2-Way Mir (Non-Exclusive Access) Device is WORM Enabled : No Device is WORM Protected : No SCSI-3 Persistent Reserve: Disabled Dynamic Spare Invoked : No Dynamic RDF Capability : None STAR Mode : No STAR Recovery Capability : None STAR Recovery State : NA Device Service State : Normal
This pattern is repeated a lot of times, once for each volume in my storage array.
What I'd really like to be able to do, is define a multi-line regexp, that matches 3 or 4 lines out of it (Device Symmetrix Name, Symmetrix ID, any one of the capacity entries, and the state of the SCSI-3 Persistent Reserve:) and then allows me to do some manner of 'foreach loop'.
So far, I've figured I can just do a line by line 'foreach', and set variables based on matching particular lines:
my $symm_id; foreach ( `symdev list -v` ) { if ( m/Symmetrix ID/ ) { $symm_id = ( m/Symmetrix ID\s+:\s+(\d+)/ ); } }
That sort of thing. But that seems ... well, just rather inelegant really, when I'm pretty sure I can do multi-line regexps. I just can't quite get my head around how to turn at multi-line RE into 'something useful' for a 'foreach' loop, with multiple values.
I've found the 'm' flag on the pattern match, and can knock together something that matches a single record.
my $text_to_parse = `symdev list -v`; my ( $dev, $sym, $config, $SCSI3 ) = ( $text_to_parse =~ m/Device Symmetrix Name\s+\:\s+([A-F0-9]+).*S +ymmetrix ID\s+\:\s+(\d+).*Device Capacity.*Cylinders\s+\:\s+(\d+).*SC +SI-3 Persistent Reserve\s+\:\s+(\w+)/m ); print "$dev, $sym, $config, $SCSI3\n";
Well, approximately - that's not quite what I want to do, as the greedy matches will eat the intervening chunks of the command output. But essentially, I want to do that 'per record'. Given the output of this particular command generates around 400Mb, I may end up dumping it to a file first.

Replies are listed 'Best First'.
Re: Multiple line records from a command string
by BrowserUk (Patriarch) on Feb 23, 2010 at 15:18 UTC

    How's this?

    #! perl -slw use strict; $/ = "\nS"; m[ Device \s Symmetrix \s Name \s+ : \s+ ( [^\n]+ ) .+? Device \s Serial \s ID \s+ : \s+ ( \d+ ) .+? KiloBytes \s+ : \s+ ( \d+ ) .+? SCSI-3 \s Persistent \s Reserve : \s+ ( \S+ ) .+? ]smx and print "name:$1\nID:$2\ncap:$3\nRes:$4" while <DATA>; __DATA__ ******** Two copies of the supplied example ************

    Output:

    c:\test>junk75 name:0062 (VCM) ID:6000062081 cap:5760 Res:Disabled name:0062 (VCM) ID:6000062081 cap:5760 Res:Disabled

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      (for Perceptor and others' sake)

      In BrowserUK's code:

      m[ Device \s Symmetrix \s Name \s+ : \s+ ( [^\n]+ ) .+? Device \s Serial \s ID \s+ : \s+ ( \d+ ) .+? KiloBytes \s+ : \s+ ( \d+ ) .+? SCSI-3 \s Persistent \s Reserve : \s+ ( \S+ ) .+? ]smx . . .

      The regex flags smx mean the following:

      s makes . match anything, including newlines. (In Perl, by default, . does not match newlines.)

      m causes the characters ^ and $ to match, respectively, at the beginning and ending of lines instead of the beginning and ending of strings

      x is for "Extended Formatting" or "Whitespace is not significant." Among other things, this allows the expression to span several lines without matching literal white space.

      Using these three flags in Perl regexes is a best practice described by Damian Conway in Perl Best Practices pages 236-241.

      You could also break this up a bit and handle the input per record the way you would like. Personally I also find this a little easier to read and maintain. The key from above is setting the input line separator:

      $/ = "\nS"; # or $/ = "Symmetrix ID:\n"

      Which will allow you to open the command output to a filehandle and use a while loop, where each iteration is an individual record. You can then easily parse out the information you want:

      open(FH,"-|", "symdev list -v") or die $!; while (<FH>) { my $symm_id = my $others = undef; # or ''; if ( m/Symmetrix ID/ ) { $symm_id = ( m/Symmetrix ID\s+:\s+(\d+)/ ); } # etc }

      HTH!

        You're obviously welcome to your view, but I cannot see the purpose in making it so complicated (20+ lines instead of 7).

        Or so needlessly inefficient. Calling the regex engine 8 times (twice each for every thing you want to match), rather than once.

        And there is absolutely no reason to initialise my vars to undef.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      How's this?

      ++ Brilliant. Perfect example for translating sample input into short, concise, readable code.

        I think BrowserUk is aware of that :D
      That's lovely, and almost exactly what I was looking for. And way less ugly than what I was trying to do with 'line based' setting of variables.
Re: Multiple line records from a command string
by toolic (Bishop) on Feb 23, 2010 at 15:51 UTC
    Here is my approach, not using complex regular expressions. Stuff all colon-separated lines into a hash. When a new start-of-record is encountered, print out what you want from the hash, then clear the hash. This operates line-by-line, and only one record is in memory at a time.
    use strict; use warnings; my %data; while (<DATA>) { if (/Symmetrix ID:/) { # start of new record print_record(%data) if %data; %data = (); } elsif (/:/) { chomp; my ($k, $v) = split /\s*:\s*/; $k =~ s/^\s+//; $data{$k} = $v; } } print_record(%data) if %data; sub print_record { my (%data) = @_; my @params = ( 'Device Serial ID', 'Vendor ID', ); for (@params) { print "$_: $data{$_}\n"; } print "\n"; } __DATA__
    Prints for 2 records:
    Device Serial ID: 6000062081 Vendor ID: EMC Device Serial ID: 6000062082 Vendor ID: EMC2
Re: Multiple line records from a command string
by zwon (Abbot) on Feb 23, 2010 at 15:10 UTC

    I don't think that using regexp in this case is what I'd call "elegant way". Also, I'm not sure that this will be efficient solution. I'd rather write a parser for this quite simple format, that would turn this output into data structure. This is also more scalable approach, as if you eventually need more fields from the output, you can just use them without fiddling with regexp.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://824873]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-20 00:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found