Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: how to compare column 1 to column 2 and vice versa from multiple rows.

by ccn (Vicar)
on Sep 30, 2009 at 17:43 UTC ( #798414=note: print w/ replies, xml ) Need Help??


in reply to how to compare column 1 to column 2 and vice versa from multiple rows.

Something like this:

my %seen; while (my $line = <>) { chomp $line; print $line if $seen{ join '', sort split /\s+/, $line }++; }

Use a hash to check for duplicates. Compose a key for the hash in such a way that rows having similar columns give same keys.

Update: Missed ++ has been added.


Comment on Re: how to compare column 1 to column 2 and vice versa from multiple rows.
Download Code
Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.
by BhariD (Sexton) on Oct 01, 2009 at 23:47 UTC

    Thank you so much!! your suggestions really helped. I apologize for the formatting errors. I hope its not too bad this time.

    Can I ask you one more question. With this input file (below):
    gene_a gene_b
    gene_b gene_a

    I get the following output:
    gene_a gene_b

    If the input file is something like this:
    gene_a gene_b
    gene_b gene_a
    gene_c gene_a
    gene_a gene_c
    gene_c gene_b
    gene_b gene_c

    Then I want the program to output the following:
    gene_a gene_b gene_c

    instead of:
    gene_a gene_b
    gene_b gene_c
    gene_c gene_a

    The thing is I am looking for pairs for which column[0] is equal to column1 and vice versa. This can happen for any combination of numbers (as I showed with three above a, b and c). Can you provide your suggestion in this case. I would really really appreciate it!

    Thanks

    BH

      It is not too late to insert <code> tags into your original post. You are able to update it any time.

      As I understand you just want to output unique names of genes instead of raw rows. Than try this

      #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }

      And this:

      Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt

      Where genes.txt is a file containing gene rows

      Feel free to ask if you need explanations on the algorithm and it's implementation.

        Could you please tell me what each line is doing in this code.
        if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }
        code: Thanks to ccn

        Also, before I execute this code I fill the  @F from the file input right?

        my @F = <DATA>;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://798414]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-08-01 03:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls