Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Getting indices of the same value that occurs multiple times in an array...

by reubs85 (Acolyte)
on Aug 04, 2011 at 11:27 UTC ( #918505=perlquestion: print w/ replies, xml ) Need Help??
reubs85 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Could I pick your brains regarding some array wrangling?

I have an input file that looks something like:

Family lacM taba mori glyB gly4 OG_1 1 0 1 0 0 OG_2 0 1 0 1 0

and I would like to know, for each family, which two members are '1', possibly storing them as a hash in which the key = family (eg OG_1) and the value = 'lacM mori' or something similar...

I am able to manipulate this such that I can get the name of the first incidence of '1', by using (the input file has been read into the array @COUNT):

my $a=shift @COUNT; my @families=split(/\s+/,$a); my %participants; foreach (@COUNT) { my @a=split(/\s+/,$_); my $og=$a[0]; my $search_for="1"; ## search for participants in that OG my ( $index )= grep { $a[$_] eq $search_for } 0..$#a; $participants{$og}=$index; } my %og_to_gid; foreach my $k (keys %participants) { $og_to_gid{$k}=$families[$participants{$k}]; }

which gives, within the %og_to_gid hash, a result whereby the key = OG_1 and the value = lacM, in this case (sorry about the weird variable names; I'm a genome biologist (and genome biologists are weird anyway)), which was fine for a previous script but now I really need to know the names of BOTH the members for each family.

Any ideas would be most welcome! Is there something within for example List::Utils that could do it? I had a look but I couldn't see anything...

Thanks once again,

Reuben

Comment on Getting indices of the same value that occurs multiple times in an array...
Select or Download Code
Re: Getting indices of the same value that occurs multiple times in an array...
by moritz (Cardinal) on Aug 04, 2011 at 11:39 UTC
    Try something like
    my @indexes = grep { $a[$_] eq $search_for } 0..$#a; $participants{$og} = \@indexes;

    This puts an array reference of all found indexes into the hash. See perlreftut for more information about references, and why you need them here.

Re: Getting indices of the same value that occurs multiple times in an array...
by thenaz (Beadle) on Aug 04, 2011 at 12:20 UTC

    Here is what I would do (save you a loop).

    my %og_to_gid; foreach (@COUNT) { my @a = split /\s+/; my $og = $a[0]; my $search_for = "1"; my @indices = grep { $a[$_] eq $search_for } 0..$#a; $og_to_gid{$og} = join ' ', map { $families[$_] } @indices; }
Re: Getting indices of the same value that occurs multiple times in an array...
by AnomalousMonk (Abbot) on Aug 04, 2011 at 14:55 UTC
    >perl -wMstrict -le "my @COUNT = ( 'Family lacM taba mori glyB gly4', 'OG_1 1 0 1 0 0', 'OG_2 0 1 0 1 0', ); ;; my @families = split /\s+/, shift @COUNT; ;; my %participants; for my $record (@COUNT) { my @fields = split /\s+/, $record; $participants{$fields[0]} = [ map { $families[$_] } grep { $fields[$_] } 1 .. $#fields ]; } ;; use Data::Dumper; print Dumper \%participants; " $VAR1 = { 'OG_2' => [ 'taba', 'glyB' ], 'OG_1' => [ 'lacM', 'mori' ] };

    Update: The expression
        [ map { $families[$_] } grep { $fields[$_] } 1 .. $#fields ]
    is more concise and perhaps a bit faster as
        [ map $families[$_], grep $fields[$_], 1 .. $#fields ]

Re: Getting indices of the same value that occurs multiple times in an array...
by Don Coyote (Monk) on Aug 05, 2011 at 06:03 UTC

    I can see some answers have been supplied to solve the problem of entering the values into an Hash via a mapped grep. I have looked at the initial problem of identifying multiple values within each line. The aligned families can be retrieved using an array reading loop. However I caqn imagine further data manipulation would mean having the families in a hash may be more helpful in future reference. Here is how I would initially extract the multiple values via an array loop.

    use strict; use warnings; my $gendata = './gen.dat'; open GENDAT, "< $gendata" or die "can't open $gendata $!"; my @count; while (<GENDAT>){ push @count, $_; } close GENDAT; my $a=shift @count; my @families=split(/\s+/,$a); my $b=0; foreach my $c(@count){ my @state; push @state, split(/\s+/,$c); print $state[0].' '; for($b=0;$b<=$#state;$b++){ print $families[$b].' ' if $state[$b] eq '1'; } print $/; } exit (0);

    this gives:

    OG_1 lacM mori OG_2 taba glyB

    A different approach might be assigning values to each of the families and then extract the 01 strings and convert them into a binary then return the relevant family pair as determined by the binary value. Of course this would only work if there were a definite 2 unique families per 'OG'.

Re: Getting indices of the same value that occurs multiple times in an array...
by Marshall (Prior) on Aug 05, 2011 at 10:35 UTC
    I think that it is easier to use the indexes() function in List::MoreUtils.

    the snippet "indexes{/^1$/}@colData" yields the index numbers of elements in @colData that contains just a "1". This is an XS module like List::Util's and runs very fast. Getting the textual name is a simple array look-up. These terms are pushed into a HashOfArray. So, 2 lines of code that do a lot of work!

    #!/usr/bin/perl -w use strict; use List::MoreUtils qw(indexes); use Data::Dumper; my %family_data; my @dataNames = qw( lacM taba mori glyB gly4); <DATA>; #skip the header line while (<DATA>) { my ($family, @colData) = split; push @{$family_data{$family}}, map{$dataNames[$_]}indexes{/^1$/}@colData; } print Dumper(\%family_data); =prints $VAR1 = { 'OG_2' => [ 'taba', 'glyB' ], 'OG_1' => [ 'lacM', 'mori' ] }; =cut __DATA__ Family lacM taba mori glyB gly4 OG_1 1 0 1 0 0 OG_2 0 1 0 1 0
    Of course map{$dataNames[$_]}indexes{/^1$/}@colData;
    could be map{$dataNames[$_]}indexes{$_}@colData;
    but I thought the regex was less confusing albeit a bit slower.
    either way is plausible, take your pick.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://918505]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (18)
As of 2014-10-30 19:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (208 votes), past polls