Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.

by BhariD (Sexton)
on Oct 01, 2009 at 23:47 UTC ( #798763=note: print w/ replies, xml ) Need Help??


in reply to Re: how to compare column 1 to column 2 and vice versa from multiple rows.
in thread how to compare column 1 to column 2 and vice versa from multiple rows.

Thank you so much!! your suggestions really helped. I apologize for the formatting errors. I hope its not too bad this time.

Can I ask you one more question. With this input file (below):
gene_a gene_b
gene_b gene_a

I get the following output:
gene_a gene_b

If the input file is something like this:
gene_a gene_b
gene_b gene_a
gene_c gene_a
gene_a gene_c
gene_c gene_b
gene_b gene_c

Then I want the program to output the following:
gene_a gene_b gene_c

instead of:
gene_a gene_b
gene_b gene_c
gene_c gene_a

The thing is I am looking for pairs for which column[0] is equal to column1 and vice versa. This can happen for any combination of numbers (as I showed with three above a, b and c). Can you provide your suggestion in this case. I would really really appreciate it!

Thanks

BH


Comment on Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.
Re^3: how to compare column 1 to column 2 and vice versa from multiple rows.
by ccn (Vicar) on Oct 02, 2009 at 06:13 UTC

    It is not too late to insert <code> tags into your original post. You are able to update it any time.

    As I understand you just want to output unique names of genes instead of raw rows. Than try this

    #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }

    And this:

    Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt

    Where genes.txt is a file containing gene rows

    Feel free to ask if you need explanations on the algorithm and it's implementation.

      Could you please tell me what each line is doing in this code.
      if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }
      code: Thanks to ccn

      Also, before I execute this code I fill the  @F from the file input right?

      my @F = <DATA>;
        No you don't fill @F. The -a switch does it for you. see perldoc perlrun The code above is complete script. Just run it as shown in usage.

        Now explanations:

        #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt # %seen is a hash where we store keys composed from seen rows # @F is an array of 2 elements $F[0] is a first column of your file an +d $F[1] is the second one (see [doc://perlrun] -a switch # So the key for row is composed by concatenation of sorted columns if ( $seen{ join ' ', sort @F }++ ) { # %uniq hash will keep seen columns first and second $uniq{$F[0]}++; $uniq{$F[1]}++; } # the code above is loops over each row of file because of -n switch s +ee [doc://perlrun] END { # this block run just before exit print for keys %uniq; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://798763]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2014-09-22 12:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (191 votes), past polls