Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: how to compare column 1 to column 2 and vice versa from multiple rows.

by ccn (Vicar)
on Oct 02, 2009 at 06:13 UTC ( #798789=note: print w/ replies, xml ) Need Help??


in reply to Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.
in thread how to compare column 1 to column 2 and vice versa from multiple rows.

It is not too late to insert <code> tags into your original post. You are able to update it any time.

As I understand you just want to output unique names of genes instead of raw rows. Than try this

#!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }

And this:

Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt

Where genes.txt is a file containing gene rows

Feel free to ask if you need explanations on the algorithm and it's implementation.


Comment on Re^3: how to compare column 1 to column 2 and vice versa from multiple rows.
Select or Download Code
Re^4: how to compare column 1 to column 2 and vice versa from multiple rows.
by BhariD (Sexton) on Oct 02, 2009 at 18:46 UTC
    Could you please tell me what each line is doing in this code.
    if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }
    code: Thanks to ccn

    Also, before I execute this code I fill the  @F from the file input right?

    my @F = <DATA>;
      No you don't fill @F. The -a switch does it for you. see perldoc perlrun The code above is complete script. Just run it as shown in usage.

      Now explanations:

      #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt # %seen is a hash where we store keys composed from seen rows # @F is an array of 2 elements $F[0] is a first column of your file an +d $F[1] is the second one (see [doc://perlrun] -a switch # So the key for row is composed by concatenation of sorted columns if ( $seen{ join ' ', sort @F }++ ) { # %uniq hash will keep seen columns first and second $uniq{$F[0]}++; $uniq{$F[1]}++; } # the code above is loops over each row of file because of -n switch s +ee [doc://perlrun] END { # this block run just before exit print for keys %uniq; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://798789]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2014-10-01 16:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (29 votes), past polls