Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Best way to parse my data

by sierpinski (Chaplain)
on Sep 30, 2009 at 16:06 UTC ( [id://798377]=perlquestion: print w/replies, xml ) Need Help??

sierpinski has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a data manipulation issue, and I'm sure it's a super-easy thing to do in Perl, I just can't get it to work... first is the data I am reading from a command output:

B group1 host1 ONLINE B group1 host2 OFFLINE B group2 host2 ONLINE B group2 host3 OFFLINE B group3 host3 ONLINE B group4 host1 ONLINE B group5 host3 ONLINE ... G group2 G group3
Now, what this means is, the 'B' section is a list of which groups are running on which hosts. (This may seem familiar to anyone who has used VCS before), and the 'G' section later on in the same command output shows groups that are frozen.

What I'm looking for is to be able to say "GroupX is frozen on hostY".

What I've tried:
- Using hashes to store 'B' information as keys, and 'G' information as values
- Arrays with grep
- Complex grep/awk statements
- A few other things I can't recall offhand

I know there has to be a really simple solution to this, but I may be thinking about it too much. There is a caveat though. These checks are happening on multiple servers at once, so if host1, host2, and host3 are all in the same cluster, the same frozen group will get reported on all 3 hosts, so I want to only report the ones that have ONLINE in the 'B' section.

Hopefully that makes sense. Thanks for any ideas you can provide!
/\ Sierpinski

Replies are listed 'Best First'.
Re: Best way to parse my data
by moritz (Cardinal) on Sep 30, 2009 at 16:18 UTC
    I'd go with a hash and an array:

    The hash (let's say %groups) has the group name as the key, and the host information as value.

    The just holds the names of the frozen groups. For each such name you can look up the host information in %groups and find out to which host it is on.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Best way to parse my data
by kennethk (Abbot) on Sep 30, 2009 at 16:24 UTC
    First off, it's nearly always a good idea to use a CSV module instead of rolling your own. As you have posted no code, I'm assuming you haven't used one. There are a large number, such as Text::CSV, which is a core module(thanks toolic).

    The interrelationship between your data are not entirely clear to me based on what you've post, so sorry if I'm off point. Once you've used a CSV module to split your data into a list of lists, you could search through the data for matches using either map/grep or nested foreach loops, which tend to be far more legible.

    If your issue is ultimately "I have a list of bad groups (G, term 1) that I need to map to a list of hosts (B, term 2) you should probably use a hash of lists keyed on group where your value is the array of hosts in the group. If you don't want to consider OFFLINE hosts, don't add those elements to your lists.

    Update: code:

    #!/usr/bin/perl use strict; use warnings; my %group_map = (); while (<DATA>) { my @list = split; if ($list[0] =~ /B/) { my ($group,$host,$online) = @list[1..3]; next if $online =~ /OFFLINE/; # filter "OFFLINE" push @{$group_map{$group}}, $host; next; } if ($list[0] =~ /G/) { my $group = $list[1]; foreach my $host (@{$group_map{$group}}) { print "$group is frozen on $host\n"; } next; } } __DATA__ B group1 host1 ONLINE B group1 host2 OFFLINE B group2 host2 ONLINE B group2 host3 OFFLINE B group3 host3 ONLINE B group4 host1 ONLINE B group5 host3 ONLINE G group2 G group3
      such as Text::CSV, which is a core module.
      Minor nitpick: Text::CSV is not listed as a Core module according to the latest official documentation page (currently 5.10.1). However, as your link makes clear, it is available for download from CPAN.
      Unfortunately we don't have Text::CSV installed, and I can't figure out why for the life of me. We're not permitted to install or use our own modules. It's a political battle that makes no sense and we always lose.

      Looks like this works perfectly too, thanks to everyone for the responses!
      /\ Sierpinski
Re: Best way to parse my data
by BioLion (Curate) on Sep 30, 2009 at 16:35 UTC

    Maybe a different hash approach? :

    use warnings; use strict; use Data::Dumper qw/Dumper/; my %data = (); while (<DATA>){ chomp( my $line = $_ ); my @stuff = split /\s+/, $line; ## separate the classes if ( $stuff[0] eq 'B'){ $data{$stuff[0]}{$stuff[1]}{$stuff[2]} = $stuff[3]; } else{ $data{$stuff[0]}{$stuff[1]} = 1; } } print Dumper \%data; for my $group ( keys %{ $data{'G'} } ){ if ( !exists$data{'B'}{ $group } ){ print "Frozen Group \'$group\' is not defined for \'B\' Section.\n +"; } else{ for ( keys %{ $data{'B'}{ $group } } ){ if ( $data{'B'}{ $group }{$_} eq 'ONLINE' ){ print "Group \'$group\' is frozen on host \'$_\'.\n"; } } } } __DATA__ B group1 host1 ONLINE B group1 host2 OFFLINE B group2 host2 ONLINE B group2 host3 OFFLINE B group3 host3 ONLINE B group4 host1 ONLINE B group5 host3 ONLINE G group2 G group3

    Gives :

    Probably not the most elegant, but simple usually works best, and maybe i missed what it was you were after, but hopefully this is helpful anyways...

    Update : Could also change read in of 'G' data as suggested by moritz and include the next if ($line =~ m/OFFLINE/); as suggested by ccn.

    Just a something something...
      That looks to be exactly what I need... If anyone was interested, here is what I've tried:

      # $frozen_cmd contains the command that provides the data I posted or +iginally in the 'G' section $ssh->send("$frozen_cmd"); my %frozen; while ( defined ($line = $ssh->read_line()) ) { $frozen{$line} = ''; } my $key; foreach $key (%frozen) { # $tmpcmd1 contains the command to look for just the specific entry of + "B" for that one group $ssh->send($tmpcmd1); my $frozhost = $ssh->read_line(); $frozen{$key} = $frozhost; }
      I'll try your method out and see if that does the trick, but it looks like you got it.. Thanks so much!
      /\ Sierpinski
Re: Best way to parse my data
by ccn (Vicar) on Sep 30, 2009 at 16:18 UTC
    #!/usr/bin/perl -wlan next if /OFFLINE/; # I want to only report the ones that have ONLINE if ($F[0] eq 'B') { push @{ $groups{ $F[1] } }, $F[2]; } if ($F[0] eq 'G') { print "$F[1] is frozen on hosts " . join ', ', @{ $groups{ $F[1] } + }; }
Re: Best way to parse my data
by grizzley (Chaplain) on Oct 01, 2009 at 07:30 UTC

    You can do it with one regexp actually:

    use warnings; use strict; $_ = join"", <DATA>; while(/^B\s+(\w+)\s+(\w+)\s+(ONLINE)(?=.*^G\s+\1\b)/msg) { print "Group '$1' is frozen on host '$2'.\n" } __DATA__ B group1 host1 ONLINE B group1 host2 OFFLINE B group2 host2 ONLINE B group2 host3 OFFLINE B group3 host3 ONLINE B group4 host1 ONLINE B group5 host3 ONLINE G group2 G group3
Re: Best way to parse my data
by 123_321 (Initiate) on Sep 30, 2009 at 16:19 UTC

    Read data by using hashes as below:

    for($i=0; $i<= $inputlength; $i++){
       ${$sesctionname}{$groupname}{$hostname}=$status;
    }

    When you are reading values from hash refer keys.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://798377]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-19 20:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found