Beefy Boxes and Bandwidth Generously Provided by pair Networks Russ
Perl: the Markov chain saw
 
PerlMonks

help needed in modifying the code for counting possible combinations

by BhariD (Novice)
 | Log in | Create a new user | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on Oct 30, 2009 at 16:59 UTC ( #804197=perlquestion: print w/ replies, xml ) Need Help??
BhariD has asked for the wisdom of the Perl Monks concerning the following question:

I have the following input file. The input file include pairs of two id's separated by tab. Each line with a pair is a combination and has relevance.

DATA

YP_01 NP_02

NP_02 YP_01

YP_01 NP_03

NP_03 YP_01

NP_02 NP_03

NP_03 NP_02

NP_04 NP_05

.....

Here is the code which I am using to get the reciprocal best hit pair from this input file. For example, in above input file, YP_01\tNP_02 and NP_02\tYP_01, is reciprocal best hit pair. And NP_04\tNP_05 is not reciprocal hit pair as it lacks the reciprocal combination in the file i.e. NP_05\tNP_04.

open (DATA, $results) || die "couldn't open the file!"; my %unique = (); while( <DATA> ){ chomp; $unique{ join( "\t", sort split /\t/ ) } ++; } my @pairs = (); my @not_pairs = (); for my $item (sort keys %unique ){ if( $unique{$item} > 1 ){ push @pairs, $item; }else{ push @not_pairs, $item; } } print join( "\n", @pairs ), "\n";

This gives the following result file

YP_01 NP_02

YP_01 NP_03

NP_02 NP_03

The problem is now I need to modify the code such that for above result file it produces the following result file instead

YP_01 NP_02

YP_01 NP_03

3

The idea is if YP_01 is paired with NP_02, and NP_03 and if NP_02 is paired with NP_03, then I do not need the pair NP_02 and NP_03 printed in the output file as their info already there in combination with YP_01. But I do need to know whether that pair NP_02 and NP_03 existed or not and that is why I need towards the end a #. As shown above I have "3" here in the end representing that the three best hit pairs are present (YP_01 and NP_02; YP_01 and NP_03; NP_02 and NP_03). If NP_02 and NP_03 was not present, then the number should change to 2 showing that out of 3 only 2 comparison are there.

could anyone help me with this. please let me know if you need further clarification

Comment on help needed in modifying the code for counting possible combinations
Download Code
Re: help needed in modifying the code for counting possible combinations
by Anonymous Monk on Oct 30, 2009 at 19:15 UTC
    If you have four related points, what do you do?
    A B B C C D D A B D
    How do you tell if that last one was B-D or A-C if you write a 5 in its place?
[reply]
[d/l]
Re: help needed in modifying the code for counting possible combinations
by gman (Pilgrim) on Oct 30, 2009 at 20:16 UTC

    ok, maybe way off here but I'll give it a try

    #!/usr/bin/perl # use Data::Dumper; my $datafile = "test.dat"; my %finalQ = (); open(FH, "<$datafile") || die ; my %combined = (); while( <FH> ){ chomp; my ($key,$value) = sort (split(/\t/,$_)); $combined{$key}=$value; } my %count = (); my %reverse = reverse(%combined); foreach my $key (sort %combined) { print "$key => $combined{$key}\n"; if ( exists $reverse{$key} ) { $count{"$key\t$reverse{$key}"}+=2; print "$key => $reverse{$key} :: match\n"; } else { $count{"$key\t$combined{$key}"}++; } } print Dumper(%count);

    output

    $VAR1 = '       ';
    $VAR2 = 4;
    $VAR3 = 'NP_05  NP_04';
    $VAR4 = 2;
    $VAR5 = 'YP_01  NP_03';
    $VAR6 = 2;
    $VAR7 = 'NP_03  NP_02';
    $VAR8 = 4;
    $VAR9 = 'NP_02  NP_03';
    $VAR10 = 1;
    $VAR11 = 'NP_04 NP_05';
    $VAR12 = 1;
    
[reply]
[d/l]
Re: help needed in modifying the code for counting possible combinations
by GrandFather (Bishop) on Oct 30, 2009 at 22:34 UTC

    use strict; use warnings; my %pairs; while (<DATA>) { chomp; next if ! length; my @pair = sort split; ++$pairs{$pair[0]}{$pair[1]}; } my $totalHits = 0; my %seconds; for my $first (sort keys %pairs) { for my $second (sort keys %{$pairs{$first}}) { next if $pairs{$first}{$second} <= 1; ++$totalHits; my $previous = grep {exists $pairs{$_}{$first} && exists $pairs{$_}{$seco +nd}} keys %{$seconds{$first}}; ++$seconds{$second}{$first}; next if $previous; print "$first $second\n"; } } print "$totalHits\n"; __DATA__ NP_01 NP_02 NP_02 NP_01 NP_01 NP_03 NP_03 NP_01 NP_02 NP_03 NP_03 NP_02 NP_04 NP_05

    Prints:

    NP_01 NP_02 NP_01 NP_03 3

    Note that I changed YP_01 to NP_01 to avoid inconsistencies between your reported results and the actual results.


    True laziness is hard work
[reply]
[d/l]
[select]

      This is awesome! Thank you so much GrandFather. I tried with the following data:

      __DATA__ NP_01 NP_02 NP_02 NP_01 NP_01 NP_03 NP_03 NP_01 NP_02 NP_03 NP_03 NP_02 NP_04 NP_05 NP_06 NP_07 NP_07 NP_06 This prints: NP_01 NP_02 NP_01 NP_03 NP_06 NP_07 4

      I want the output to be like this instead:

      NP_01 NP_02 NP_01 NP_03 3 NP_06 NP_07 1

      3 for the presence of all three NP_01-NP_02, NP_01-NP_03, NP_02-NP_03 possible pairs in the file. Lets say, if NP_02-NP_03 combination was not present in the file then the number should become 2 showing that NP_02-NP_03 combo does not exist in the file. Any suggestion how can I get this from your code.

      Example in case when NP_02-NP_03 reciprocal pair does not exist in the file and the required output

      __DATA__ NP_01 NP_02 NP_02 NP_01 NP_01 NP_03 NP_03 NP_01 NP_04 NP_05 NP_06 NP_07 NP_07 NP_06 prints: NP_01 NP_02 NP_01 NP_03 2 [not present in NP_02 NP_03] NP_06 NP_07 1
[reply]
[d/l]
[select]

        That makes it more interesting. If you discover any more interesting cases however you'd better tell us what the application actually is and give us the bigger picture.

        use strict; use warnings; my %pairs; while (<DATA>) { chomp; next if ! length; my @pair = sort split; ++$pairs{$pair[0]}{$pair[1]}; } for my $first (sort keys %pairs) { my @hits; my $count = 0; my @implied; for my $second (keys %{$pairs{$first}}) { next if $pairs{$first}{$second} < 2; push @implied, $second; ++$count; if (exists $pairs{$second}{$first}) { delete $pairs{$second}{$first}; ++$count; } print "$first $second\n"; } next if ! $count; @implied = sort @implied; if (@implied == 2 && $pairs{$implied[0]}{$implied[1]}) { ++$count; delete $pairs{$implied[0]}{$implied[1]}; @implied = (); } print "$count"; print " [not present in @implied]" if @implied == 2; print "\n\n" } __DATA__ AP_01 AP_02 AP_02 AP_01 AP_01 AP_03 AP_03 AP_01 NP_01 NP_02 NP_02 NP_01 NP_01 NP_03 NP_03 NP_01 NP_02 NP_03 NP_03 NP_02 NP_04 NP_05 NP_06 NP_07 NP_07 NP_06

        Prints:

        AP_01 AP_02 AP_01 AP_03 2 [not present in AP_02 AP_03] NP_01 NP_03 NP_01 NP_02 3 NP_06 NP_07 1

        I added the AP_dd set to make it clearer which groups were which and to provide data to demonstrate the third case.


        True laziness is hard work
[reply]
[d/l]
[select]

Back to Seekers of Perl Wisdom


Login:
Password
remember me
What's my password?
Create A New User

Node Status
node history
Node Type: perlquestion [id://804197]
Approved by wfsp
Front-paged by wfsp
help
Community Ads
Chatterbox
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users
Others chanting in the Monastery: (7)
tirwhan
Moriarty
atcroft
herveus
Eyck
biohisham
gnosti
As of 2009-11-21 11:02 GMT
Sections
The Monastery Gates
Seekers of Perl Wisdom
Meditations
PerlMonks Discussion
Categorized Q&A
Tutorials
Obfuscated Code
Perl Poetry
Cool Uses for Perl
Perl News
Information
PerlMonks FAQ
Guide to the Monastery
What's New at PerlMonks
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Find Nodes
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Snippets Section
Code Catacombs
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Planet Perl
Perlsphere
Use Perl
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Perl Directory
Perl documentation
CPAN
Random Node
Voting Booth

Future historians will find that the material characteristic of the current era is...

Aluminium
Plastic
Oil
Water
Carbon dioxide
Copper
Iron
Silicon
Salt
Uranium
Hydrogen
Other

Results (730 votes), past polls