Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.

Thank you so much!! your suggestions really helped. I apologize for the formatting errors. I hope its not too bad this time.

Can I ask you one more question. With this input file (below):
gene_a gene_b
gene_b gene_a

I get the following output:
gene_a gene_b

If the input file is something like this:
gene_a gene_b
gene_b gene_a
gene_c gene_a
gene_a gene_c
gene_c gene_b
gene_b gene_c

Then I want the program to output the following:
gene_a gene_b gene_c

instead of:
gene_a gene_b
gene_b gene_c
gene_c gene_a

The thing is I am looking for pairs for which column[0] is equal to column1 and vice versa. This can happen for any combination of numbers (as I showed with three above a, b and c). Can you provide your suggestion in this case. I would really really appreciate it!

Thanks

Comment on Re^2: how to compare column 1 to column 2 and vice versa from multiple rows.

Replies are listed 'Best First'.
Re^3: how to compare column 1 to column 2 and vice versa from multiple rows. by ccn (Vicar) on Oct 02, 2009 at 06:13 UTC
It is not too late to insert `<code>` tags into your original post. You are able to update it any time. As I understand you just want to output unique names of genes instead of raw rows. Than try this `#!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }` [download] And this: `Linux version: perl -lane '@u{@F}=() if $s{join "", sort @F}++ }{ print for keys %u' +genes.txt Windows version: perl -lane "@u{@F}=() if $s{join '', sort @F}++ }{ print for keys %u" +genes.txt` [download] Where `genes.txt` is a file containing gene rows Feel free to ask if you need explanations on the algorithm and it's implementation.	[reply] [d/l] [select]
Re^4: how to compare column 1 to column 2 and vice versa from multiple rows. by BhariD (Sexton) on Oct 02, 2009 at 18:46 UTC
Could you please tell me what each line is doing in this code. `if ( $seen{ join ' ', sort @F }++ ) { $uniq{$F[0]}++; $uniq{$F[1]}++; } END { print for keys %uniq; }` [download] code: Thanks to ccn Also, before I execute this code I fill the `@F` from the file input right? `my @F = <DATA>;` [download]	[reply] [d/l] [select]
Re^5: how to compare column 1 to column 2 and vice versa from multiple rows. by ccn (Vicar) on Oct 02, 2009 at 20:44 UTC
No you don't fill `@F`. The -a switch does it for you. see perldoc perlrun The code above is complete script. Just run it as shown in usage. Now explanations: #!/usr/bin/perl -lan # Usage: thisscript.pl genes.txt # %seen is a hash where we store keys composed from seen rows # @F is an array of 2 elements $F[0] is a first column of your file an +d $F[1] is the second one (see [doc://perlrun] -a switch # So the key for row is composed by concatenation of sorted columns if ( $seen{ join ' ', sort @F }++ ) { # %uniq hash will keep seen columns first and second $uniq{$F[0]}++; $uniq{$F[1]}++; } # the code above is loops over each row of file because of -n switch s +ee [doc://perlrun] END { # this block run just before exit print for keys %uniq; } [download]	[reply] [d/l] [select]


XP is just a number
	PerlMonks