Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: how to compare column 1 to column 2 and vice versa from multiple rows.

by ccn (Vicar)
on Oct 02, 2009 at 06:13 UTC ( #798789=note: print w/ replies, xml ) Need Help??


in reply to Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.
in thread how to compare column 1 to column 2 and vice versa from multiple rows.

It is not too late to insert <code> tags into your original post. You are able to update it any time.

As I understand you just want to output unique names of genes instead of raw rows. Than try this

#!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }

And this:

Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt

Where genes.txt is a file containing gene rows

Feel free to ask if you need explanations on the algorithm and it's implementation.


Comment on Re^3: how to compare column 1 to column 2 and vice versa from multiple rows.
Select or Download Code
Replies are listed 'Best First'.
Re^4: how to compare column 1 to column 2 and vice versa from multiple rows.
by BhariD (Sexton) on Oct 02, 2009 at 18:46 UTC
    Could you please tell me what each line is doing in this code.
    if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }
    code: Thanks to ccn

    Also, before I execute this code I fill the  @F from the file input right?

    my @F = <DATA>;
      No you don't fill @F. The -a switch does it for you. see perldoc perlrun The code above is complete script. Just run it as shown in usage.

      Now explanations:

      #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt # %seen is a hash where we store keys composed from seen rows # @F is an array of 2 elements $F[0] is a first column of your file an +d $F[1] is the second one (see [doc://perlrun] -a switch # So the key for row is composed by concatenation of sorted columns if ( $seen{ join ' ', sort @F }++ ) { # %uniq hash will keep seen columns first and second $uniq{$F[0]}++; $uniq{$F[1]}++; } # the code above is loops over each row of file because of -n switch s +ee [doc://perlrun] END { # this block run just before exit print for keys %uniq; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://798789]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (12)
As of 2015-07-29 11:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls