Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Regex to extract certain lines only from command output/text file.

by perl514 (Pilgrim)
on Mar 07, 2013 at 17:58 UTC ( #1022282=perlquestion: print w/ replies, xml ) Need Help??
perl514 has asked for the wisdom of the Perl Monks concerning the following question:

Respected Monks,

Given below is an extract from a file that I wish to extract certain portions from.

7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1 21 HOSTNAME Generic 22 hsname12 Generic-legacy 10000000ABCDE004A 3:3:3 10000000ABCDE004A 3:5:2 10000000ABCDE0048 2:3:3 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2

This is the 3par showhost command output. It basically shows the Host ID, the name of the host, the host setting, the HBA WWN and the 3par Storage Array port.

I just need to extract the portion, if there is a "---" or a blank line in there...such as

7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4

or

23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2

or

21 HOSTNAME Generic

So basically the way I want is, the output should leave out if all the entries have a d:d:d at the end like so

9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2

So I tried a lot of stuff, and none of it worked. So I thought "Why not take everything that starts with a digit upto the part that just ends before the next digit and see if a "---" or a blank space is there.

Here is my script. I tried a lot of stuff, tried setting the $/ to "" and then to undef and then to \n\n, but it doesn't seem to work. So I removed the $/ from my script and started over (once again). As of now, my brain is completely a tangled mess, so please do not get pissed with the script below.

#!/usr/bin/perl use warnings; use strict; while (<>) { chomp; next if /Id Name Persona -WWN\/iSCSI_Name- Port/; if(/(?<loggedout>[d]+.*?\p{Hex}{16}.*?(--- |\s+))/sm) { print "$+{loggedout}\n"; } }

The script above does not output anything. I can easlity pull out just the line with the "---". Thats not a problem. What I want is, the whole host block associated with it. Any pointers would help. I am not looking for pre written code, I really badly need to crack this on my own, but I have miserably misunderstood the regex...please help me untangle my brain :).

Perlpetually Indebted To PerlMonks

use Learning::Perl; use Beginning::Perl::Ovid; print "Awesome Books";
http://dwimperl.com/windows.html is a boon for Windows.

Comment on Regex to extract certain lines only from command output/text file.
Select or Download Code
Re: Regex to extract certain lines only from command output/text file.
by arnaud99 (Beadle) on Mar 07, 2013 at 19:22 UTC

    Hi

    I have taken another approach to what I think you are trying to do. Hopefully you can use this to achieve what you want.

    use strict; use warnings; my @tmp; my @keep; my $first_time = 1; while (my $a_line = <DATA>) { chomp $a_line; push @tmp, $a_line; if ( $a_line =~ /^\d+\s+\D+/ ) { if ($first_time) { $first_time = 0; } else { #we have a new host master record #process the data from the previous host process_previous(\@tmp); @keep = (@keep, @tmp); @tmp =(); } } } process_previous(\@tmp); @keep = (@keep, @tmp); @tmp=(); print "$_\n" for @keep; exit(0); #----------------- SUBS ---------------------------- sub process_previous { my $array_ref = shift; my $keep_this_data = 0; foreach my $elem(@$array_ref) { if ($elem !~ /\d:\d:\d$/ ) { $keep_this_data = 1; last; } } if (!$keep_this_data) { #empty the array @$array_ref = (); } } __DATA__ 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 21 HOSTNAME Generic 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2

    output

    7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 21 HOSTNAME Generic

    I hope this helps.

    Arnaud

      arnaud99
      Check your script, the output indicated in your post is really different to what your code really shows. Below is the output of your script:

      7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 21 HOSTNAME Generic
      According, to the OP,
      9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2
      Is not suppose to be shown in the final output. Your script even modified some info. like this:
      9 hostname13 Generic 10000000AB2A3006A 3:5:2 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2
      The OP has only one "9 hostname13 Generic", so how did it become two, in the final output.

        Hi,

        In response to: The OP has only one "9 hostname13 Generic", so how did it become two, in the final output.

        I added an extra one in the __DATA__ section, for testing purposes. Only one was showing in the output (the last one in __DATA__ , due to an empty line at the end of the same __DATA__ section.

        (The fix is also posted in one of the replies)

        Thanks.

        Arnaud

Re: Regex to extract certain lines only from command output/text file.
by arnaud99 (Beadle) on Mar 07, 2013 at 19:53 UTC

    Reformatted the code and added a few comments so it looks cleaner.

    use strict; use warnings; my @tmp; my @keep; while (my $a_line = <DATA>) { chomp $a_line; if ( $a_line =~ /^\d+\s+\D+/ ) { #we have a new host master record #process the data from the previous host process_previous(\@tmp); #on return @tmp may be empty @keep = (@keep, @tmp); @tmp =(); #now we empty it anyway } push @tmp, $a_line; } process_previous(\@tmp); @keep = (@keep, @tmp); @tmp=(); print "$_\n" for @keep; exit(0); #----------------- SUBS ---------------------------- sub process_previous { my $array_ref = shift; my $keep_this_data = 0; foreach my $elem(@$array_ref) { if ($elem !~ /\d:\d:\d$/ ) { #found aline NOT terminating in 3 digit, each separated by + a #colon, so we want to keep the whole info abou this host $keep_this_data = 1; last; } } if (!$keep_this_data) { #empty the array @$array_ref = (); } } __DATA__ 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 21 HOSTNAME Generic 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2

      Neither did your second "approach" solve the OP problems in totality. See the output

      7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1 21 HOSTNAME Generic 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2

      Hi

      Thanks for your comments. The error is due to an empty line at the end of the __DATA__section. I must have added it when I pasted the code.

      Since the empty line did not match /\d:\d:\d$/, the data for that host was considered to be worth keeping.

      I trust the issue is now sorted. I Simply added

      next if $a_line =~ /^\s*$/; #ignore empty lines

      after reading each line.

      Here is the full code, with the extra line check, and the empty __DATA__ line.

      use strict; use warnings; use 5.010; my @tmp; my @keep; while (my $a_line = <DATA>) { next if $a_line =~ /^\s*$/; #ignore empty lines chomp $a_line; if ( $a_line =~ /^\d+\s+\D+/ ) { #we have a new host master record #process the data from the previous host process_previous(\@tmp); #on return @tmp may be empty @keep = (@keep, @tmp); @tmp =(); #now we empty it anyway } push @tmp, $a_line; } process_previous(\@tmp); @keep = (@keep, @tmp); @tmp=(); print "$_\n" for @keep; exit(0); #----------------- SUBS ---------------------------- sub process_previous { my $array_ref = shift; my $keep_this_data = 0; foreach my $elem(@$array_ref) { if ($elem !~ /\d:\d:\d$/ ) { #found a line NOT terminating in 3 digit, each separated b +y a #colon, so we want to keep the whole info abou this host $keep_this_data = 1; last; } } if (!$keep_this_data) { #empty the array @$array_ref = (); } } __DATA__ 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 21 HOSTNAME Generic 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2

      Kind regards

      Arnaud.
Re: Regex to extract certain lines only from command output/text file.
by 2teez (Priest) on Mar 08, 2013 at 00:32 UTC

    Something like this:

    #!/usr/bin/perl use warnings; use strict; my %data_needed; my $key; while (<DATA>) { chomp; if (/^(\d.+?)(\d{8,}.+?)?$/) { $key = $1; push @{ $data_needed{$key} }, $2; } else { s/^\s+//; push @{ $data_needed{$key} }, $_; } } { no warnings 'uninitialized'; for ( keys %data_needed ) { my $space = " " x length($_); my $value = join "\n$space", @{ $data_needed{$_} }; printf "%s%s\n\n", $_, $value if $value =~ /---/ or $value eq ''; } } __DATA__ 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1 21 HOSTNAME Generic 22 hsname12 Generic-legacy 10000000ABCDE004A 3:3:3 10000000ABCDE004A 3:5:2 10000000ABCDE0048 2:3:3 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2
    Output:
    23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2 21 HOSTNAME Generic 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1
    Note:
    Please, note that the order of your output might be different to what I have here. Of course, the desired display order is left to the OP.
    Hope this helps

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Regex to extract certain lines only from command output/text file.
by Kenosis (Priest) on Mar 08, 2013 at 00:57 UTC

    Perhaps the following will be helpful:

    use strict; use warnings; local $/; my @record; for ( split /(.+Generic.+)/m, <> ) { next unless /\S/; s/\n+$//g; push @record, $_; if ( @record == 2 ) { $record[1] =~ s/(\n?.+?Generic\n)/print $1; ''/ge; print "@record\n" if "$record[0]$record[1]" =~ /---/; undef @record; } }

    Output on your data set:

    7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1 21 HOSTNAME Generic 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2

    Usage: perl script.pl inFile [>outFile]

Re: Regex to extract certain lines only from command output/text file.
by kcott (Abbot) on Mar 08, 2013 at 12:38 UTC

    G'day perl514,

    It seems to me that you only need two short regexes here: qr{^\s*\d+\s} to identify the start of each block of host data and qr{\d:\d:\d\n$} to check the line endings. Here's how I used these in a script:

    #!/usr/bin/env perl use strict; use warnings; my $host_start_re = qr{^\s*\d+\s}; my $triple_re = qr{\d:\d:\d\n$}; my @buffered_lines; my $print_buffer = 0; while (<DATA>) { if (/$host_start_re/) { print @buffered_lines if $print_buffer; @buffered_lines = (); $print_buffer = 0; } push @buffered_lines, $_; $print_buffer = 1 unless /$triple_re/; } print @buffered_lines if $print_buffer; __DATA__ 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 9 hostname13 Generic 10000000AB2A3006A 3:5:2 10000000AB2A30068 2:5:2 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1 21 HOSTNAME Generic 22 hsname12 Generic-legacy 10000000ABCDE004A 3:3:3 10000000ABCDE004A 3:5:2 10000000ABCDE0048 2:3:3 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2

    Output:

    $ pm_3par_extract.pl 7 hostname12 Generic-legacy 10000000AB210ACF6 --- 10000000AB210ACF4 2:5:4 10000000AB210ACF4 2:3:4 10000000AB210ACF6 3:5:4 20 hostname14 Generic-legacy 10000000AB2A3000C --- 10000000AB2A3000E 3:3:1 21 HOSTNAME Generic 23 srvernam Generic-legacy 5001438002A3004A 3:3:3 5001438002A3004A --- 5001438002A30048 2:3:3 5001438002A30048 2:5:2 5001438002A30048 2:5:2

    -- Ken

Re: Regex to extract certain lines only from command output/text file.
by perl514 (Pilgrim) on Mar 08, 2013 at 18:18 UTC

    Hi arnaud99,2teez,Kenosis and Kcott

    Thank you so very much for your help. All of you have taken very different approach and yet the result is the same.

    I have no words to thank you all. You guys make PerlMonks and awesome place !!

    I must admit that some of the solutions seem very complicated to me and I will be inserting a lot of print statements to understand whats going on, but thats ok.

    Thank you all once again.

    Perlpetually Indebted To PerlMonks

    use Learning::Perl; use Beginning::Perl::Ovid; print "Awesome Books";
    http://dwimperl.com/windows.html is a boon for Windows.

Re: Regex to extract certain lines only from command output/text file.
by perl514 (Pilgrim) on Mar 30, 2013 at 08:24 UTC

    Hi Monks,

    First of all, thank you very much for your help on this issue. I tried all the options above and they worked great. however, I wanted to crack the issue without using much of the techniques mentioned above, so I tried a different approach and it worked !!. What also helped is, this time, I could safely ignore the line that has blanks next to it...

    So given below is the line of output.

    C:\perlscripts>more 3par.txt Id Name Persona -WWN/iSCSI_Name- Port 2 hpux-host2 HPUX-legacy 1122334455667788 2:3:1 1122334455667799 --- 1122334455667788 --- 1122334455667799 4:3:1 3 hpux-host3 HPUX-legacy 1122334455667788 2:3:1 1122334455667799 --- 1122334455667788 --- 1122334455667799 --- 4 hpux-host4 HPUX-legacy 1122334455667788 --- 1122334455667799 3:3:1 1122334455667788 1:3:1 1122334455667799 4:3:1 5 hpux-host5 HPUX-legacy 1122334455667788 2:3:1 1122334455667799 3:3:1 1122334455667788 1:3:1 1122334455667799 4:3:1 5 hpux-host6 HPUX-legacy 1122334455667788 2:3:1 1122334455667799 3:3:1 1122334455667788 1:3:1 1122334455667799 4:3:1 7 hpux-host HPUX-legacy 1122334455667788 2:3:1 1122334455667799 3:3:1 1122334455667788 --- 1122334455667799 4:3:1

    From this, I only had to take out those hosts which have one or more --- in the output. Here's my attempt:

    #!/usr/bin/perl use warnings; use strict; my $space_sep_FH; my @para; open($space_sep_FH,'>', 'space_sep.txt') or die "Error:$!.$^E"; while (<>) { next if /Id Name Persona -WWN\/iSCSI_Name- Port/; s/(^[ 0-9])+/\n\n$1/; print $space_sep_FH "$_"; } print "Done writing to space_sep.txt.\n"; $/ = "\n\n"; open($space_sep_FH,'<', 'space_sep.txt') or die "Error:$!.$^E"; @para = <$space_sep_FH>; foreach my $para (@para) { print "$para" if $para =~ /---/; }

    And here is the output!!

    C:\perlscripts>perl test_3par_tp.pl 3par.txt Done writing to space_sep.txt. 2 hpux-host2 HPUX-legacy 1122334455667788 2:3:1 1122334455667799 --- 1122334455667788 --- 1122334455667799 4:3:1 3 hpux-host3 HPUX-legacy 1122334455667788 2:3:1 1122334455667799 --- 1122334455667788 --- 1122334455667799 --- 4 hpux-host4 HPUX-legacy 1122334455667788 --- 1122334455667799 3:3:1 1122334455667788 1:3:1 1122334455667799 4:3:1 7 hpux-host HPUX-legacy 1122334455667788 2:3:1 1122334455667799 3:3:1 1122334455667788 --- 1122334455667799 4:3:1 C:\perlscripts>

    It now feels better to crack it in another way !!

    May be my newly bought T430 was an inspiration :)

    Perl is really a great language, it lets you do things your way.

    Perlpetually Indebted To PerlMonks

    use Learning::Perl; use Beginning::Perl::Ovid; print "Awesome Books";
    http://dwimperl.com/windows.html is a boon for Windows.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1022282]
Approved by Corion
Front-paged by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (13)
As of 2014-09-18 13:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (115 votes), past polls