Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: how to compare column 1 to column 2 and vice versa from multiple rows.

by ccn (Vicar)
on Sep 30, 2009 at 17:43 UTC ( #798414=note: print w/replies, xml ) Need Help??


in reply to how to compare column 1 to column 2 and vice versa from multiple rows.

Something like this:
my %seen; while (my $line = <>) { chomp $line; print $line if $seen{ join '', sort split /\s+/, $line }++; }

Use a hash to check for duplicates. Compose a key for the hash in such a way that rows having similar columns give same keys.

Update: Missed ++ has been added.

  • Comment on Re: how to compare column 1 to column 2 and vice versa from multiple rows.
  • Download Code

Replies are listed 'Best First'.
Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.
by BhariD (Sexton) on Oct 01, 2009 at 23:47 UTC

    Thank you so much!! your suggestions really helped. I apologize for the formatting errors. I hope its not too bad this time.

    Can I ask you one more question. With this input file (below):
    gene_a gene_b
    gene_b gene_a

    I get the following output:
    gene_a gene_b

    If the input file is something like this:
    gene_a gene_b
    gene_b gene_a
    gene_c gene_a
    gene_a gene_c
    gene_c gene_b
    gene_b gene_c

    Then I want the program to output the following:
    gene_a gene_b gene_c

    instead of:
    gene_a gene_b
    gene_b gene_c
    gene_c gene_a

    The thing is I am looking for pairs for which column[0] is equal to column1 and vice versa. This can happen for any combination of numbers (as I showed with three above a, b and c). Can you provide your suggestion in this case. I would really really appreciate it!

    Thanks

    BH

      It is not too late to insert <code> tags into your original post. You are able to update it any time.

      As I understand you just want to output unique names of genes instead of raw rows. Than try this

      #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }

      And this:

      Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt

      Where genes.txt is a file containing gene rows

      Feel free to ask if you need explanations on the algorithm and it's implementation.

        Could you please tell me what each line is doing in this code.
        if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }
        code: Thanks to ccn

        Also, before I execute this code I fill the  @F from the file input right?

        my @F = <DATA>;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://798414]
help
Chatterbox?
[marto]: xubuntu or lubuntu
[Corion]: :-D
[marto]: my desktop is a core2 duo, 120GB SSD, 4GB RAM, boots to (GUI) login in less than 10 seconds from cold start
[Discipulus]: mah, winbuntu 10...
[marto]: whent he boys start school I'll build myself as new machine, ryzen based
[Corion]: marto: That's cool! I think my (home) desktop spends 30 seconds in the BIOS...
[Corion]: marto: Heh - currently they seem promising, but I think I'll stay with nVidia+Intel for the time being, as I've been bitten too often by bad AMD drivers
Discipulus is (still?) not a big fan of notepad++
[marto]: Corion, I think in the past this was a big problem for them. GPU driver wise the improve all the time. I use the open source drivers on my machine (old R9 270, 2GB) and had no problems

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2017-07-27 08:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I came, I saw, I ...
























    Results (407 votes). Check out past polls.