Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Getting indices of the same value that occurs multiple times in an array...

by reubs85 (Acolyte)
on Aug 04, 2011 at 11:27 UTC ( [id://918505]=perlquestion: print w/replies, xml ) Need Help??

reubs85 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Could I pick your brains regarding some array wrangling?

I have an input file that looks something like:

Family lacM taba mori glyB gly4 OG_1 1 0 1 0 0 OG_2 0 1 0 1 0

and I would like to know, for each family, which two members are '1', possibly storing them as a hash in which the key = family (eg OG_1) and the value = 'lacM mori' or something similar...

I am able to manipulate this such that I can get the name of the first incidence of '1', by using (the input file has been read into the array @COUNT):

my $a=shift @COUNT; my @families=split(/\s+/,$a); my %participants; foreach (@COUNT) { my @a=split(/\s+/,$_); my $og=$a[0]; my $search_for="1"; ## search for participants in that OG my ( $index )= grep { $a[$_] eq $search_for } 0..$#a; $participants{$og}=$index; } my %og_to_gid; foreach my $k (keys %participants) { $og_to_gid{$k}=$families[$participants{$k}]; }

which gives, within the %og_to_gid hash, a result whereby the key = OG_1 and the value = lacM, in this case (sorry about the weird variable names; I'm a genome biologist (and genome biologists are weird anyway)), which was fine for a previous script but now I really need to know the names of BOTH the members for each family.

Any ideas would be most welcome! Is there something within for example List::Utils that could do it? I had a look but I couldn't see anything...

Thanks once again,

Reuben

Replies are listed 'Best First'.
Re: Getting indices of the same value that occurs multiple times in an array...
by moritz (Cardinal) on Aug 04, 2011 at 11:39 UTC
    Try something like
    my @indexes = grep { $a[$_] eq $search_for } 0..$#a; $participants{$og} = \@indexes;

    This puts an array reference of all found indexes into the hash. See perlreftut for more information about references, and why you need them here.

Re: Getting indices of the same value that occurs multiple times in an array...
by thenaz (Beadle) on Aug 04, 2011 at 12:20 UTC

    Here is what I would do (save you a loop).

    my %og_to_gid; foreach (@COUNT) { my @a = split /\s+/; my $og = $a[0]; my $search_for = "1"; my @indices = grep { $a[$_] eq $search_for } 0..$#a; $og_to_gid{$og} = join ' ', map { $families[$_] } @indices; }
Re: Getting indices of the same value that occurs multiple times in an array...
by AnomalousMonk (Archbishop) on Aug 04, 2011 at 14:55 UTC
    >perl -wMstrict -le "my @COUNT = ( 'Family lacM taba mori glyB gly4', 'OG_1 1 0 1 0 0', 'OG_2 0 1 0 1 0', ); ;; my @families = split /\s+/, shift @COUNT; ;; my %participants; for my $record (@COUNT) { my @fields = split /\s+/, $record; $participants{$fields[0]} = [ map { $families[$_] } grep { $fields[$_] } 1 .. $#fields ]; } ;; use Data::Dumper; print Dumper \%participants; " $VAR1 = { 'OG_2' => [ 'taba', 'glyB' ], 'OG_1' => [ 'lacM', 'mori' ] };

    Update: The expression
        [ map { $families[$_] } grep { $fields[$_] } 1 .. $#fields ]
    is more concise and perhaps a bit faster as
        [ map $families[$_], grep $fields[$_], 1 .. $#fields ]

Re: Getting indices of the same value that occurs multiple times in an array...
by Marshall (Canon) on Aug 05, 2011 at 10:35 UTC
    I think that it is easier to use the indexes() function in List::MoreUtils.

    the snippet "indexes{/^1$/}@colData" yields the index numbers of elements in @colData that contains just a "1". This is an XS module like List::Util's and runs very fast. Getting the textual name is a simple array look-up. These terms are pushed into a HashOfArray. So, 2 lines of code that do a lot of work!

    #!/usr/bin/perl -w use strict; use List::MoreUtils qw(indexes); use Data::Dumper; my %family_data; my @dataNames = qw( lacM taba mori glyB gly4); <DATA>; #skip the header line while (<DATA>) { my ($family, @colData) = split; push @{$family_data{$family}}, map{$dataNames[$_]}indexes{/^1$/}@colData; } print Dumper(\%family_data); =prints $VAR1 = { 'OG_2' => [ 'taba', 'glyB' ], 'OG_1' => [ 'lacM', 'mori' ] }; =cut __DATA__ Family lacM taba mori glyB gly4 OG_1 1 0 1 0 0 OG_2 0 1 0 1 0
    Of course map{$dataNames[$_]}indexes{/^1$/}@colData;
    could be map{$dataNames[$_]}indexes{$_}@colData;
    but I thought the regex was less confusing albeit a bit slower.
    either way is plausible, take your pick.
Re: Getting indices of the same value that occurs multiple times in an array...
by Don Coyote (Hermit) on Aug 05, 2011 at 06:03 UTC

    I can see some answers have been supplied to solve the problem of entering the values into an Hash via a mapped grep. I have looked at the initial problem of identifying multiple values within each line. The aligned families can be retrieved using an array reading loop. However I caqn imagine further data manipulation would mean having the families in a hash may be more helpful in future reference. Here is how I would initially extract the multiple values via an array loop.

    use strict; use warnings; my $gendata = './gen.dat'; open GENDAT, "< $gendata" or die "can't open $gendata $!"; my @count; while (<GENDAT>){ push @count, $_; } close GENDAT; my $a=shift @count; my @families=split(/\s+/,$a); my $b=0; foreach my $c(@count){ my @state; push @state, split(/\s+/,$c); print $state[0].' '; for($b=0;$b<=$#state;$b++){ print $families[$b].' ' if $state[$b] eq '1'; } print $/; } exit (0);

    this gives:

    OG_1 lacM mori OG_2 taba glyB

    A different approach might be assigning values to each of the families and then extract the 01 strings and convert them into a binary then return the relevant family pair as determined by the binary value. Of course this would only work if there were a definite 2 unique families per 'OG'.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://918505]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-25 08:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found