Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: how to compare column 1 to column 2 and vice versa from multiple rows.

by ccn (Vicar)
on Oct 02, 2009 at 06:13 UTC ( #798789=note: print w/replies, xml ) Need Help??


in reply to Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.
in thread how to compare column 1 to column 2 and vice versa from multiple rows.

It is not too late to insert <code> tags into your original post. You are able to update it any time.

As I understand you just want to output unique names of genes instead of raw rows. Than try this

#!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }

And this:

Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt

Where genes.txt is a file containing gene rows

Feel free to ask if you need explanations on the algorithm and it's implementation.

Replies are listed 'Best First'.
Re^4: how to compare column 1 to column 2 and vice versa from multiple rows.
by BhariD (Sexton) on Oct 02, 2009 at 18:46 UTC
    Could you please tell me what each line is doing in this code.
    if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }
    code: Thanks to ccn

    Also, before I execute this code I fill the  @F from the file input right?

    my @F = <DATA>;
      No you don't fill @F. The -a switch does it for you. see perldoc perlrun The code above is complete script. Just run it as shown in usage.

      Now explanations:

      #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt # %seen is a hash where we store keys composed from seen rows # @F is an array of 2 elements $F[0] is a first column of your file an +d $F[1] is the second one (see [doc://perlrun] -a switch # So the key for row is composed by concatenation of sorted columns if ( $seen{ join ' ', sort @F }++ ) { # %uniq hash will keep seen columns first and second $uniq{$F[0]}++; $uniq{$F[1]}++; } # the code above is loops over each row of file because of -n switch s +ee [doc://perlrun] END { # this block run just before exit print for keys %uniq; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://798789]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2016-10-01 19:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?






    Results (6 votes). Check out past polls.