This is related to a similar post yesterday about alignment data. Unfortunately the alignment data is in a different format so that script will not work, buuut fortunately the consensus (ie. if a letter is the same in all columns) is shown at the bottom by a series of stars and periods.
I have a few problems:
- the alignment is broken over 3 lines by line breaks, so my consensus appears to be 3 words
- I don't know how to read out all of the index positions for which an array element matches something eg. *
(I have read a similar post but all of the examples are for finding the index of a match that only occurs once.) below is an example data file and my script.
CLUSTAL O(1.0.3) multiple sequence alignment
435590.150003364 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
226186.29348112 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
295405.53715441 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
272559.60683413 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
411901.149130658 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
411476.156111512 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
411479.156862051 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
449673.167697753 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
470145.189432667 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
483215.260624635 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
483218.217990640 -------MTKKRVKKNVEHGQAHIQSSFNNTIVTLTDA
+EGNALSWASAGGLGFRGSKKST
483216.217986519 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
484018.198273057 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIIAWSSAGKMGFRGSKKNT
483217.212662191 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
511680.292809380 MAKVTKKVTKKRVKKNVERGQAHIQSSFNNTIVTITDT
+EGNALSWASAGGLGFRGSRKST
537012.224520527 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
547042.224016865 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
469586.251841635 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
469590.229451850 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
457392.251944867 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
457394.254834140 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
457395.229454394 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
556258.229443407 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
556260.229435668 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
469587.263252878 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
469588.262357064 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
457391.263236304 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
585543.270273974 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
702446.294448298 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
702443.292632887 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
702444.292640606 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
702447.294446237 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
657309.locus_tag:BXY_18280 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
457424.EQ973217.G291 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
556259.GG663458.G80 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
471870.DS562360.G288 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS
+EGQIISWSSAGKMGFRGSKKNT
:*** * .** *::***** **:::::
+**: ::*:*** :*****:*.*
435590.150003364 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
226186.29348112 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
295405.53715441 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
272559.60683413 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
411901.149130658 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
411476.156111512 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
411479.156862051 PYAAQMAAQDCAKIAYDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
449673.167697753 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
470145.189432667 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
483215.260624635 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
483218.217990640 PYAAQMAAETATKAALIHGLKSVDVMVKGPGSGREAAI
+RALSAAGLTVTSIKDVTPVPHN
483216.217986519 PYAAQMAAQDCAKIAYDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
484018.198273057 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
483217.212662191 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
511680.292809380 PYAAQMAAETATKAALIHGLKSVDVMVKGPGSGREAAI
+RALQACGLEVTSIKDVTPVPHN
537012.224520527 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIVDVTPLPHN
547042.224016865 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
469586.251841635 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
469590.229451850 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
457392.251944867 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
457394.254834140 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
457395.229454394 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
556258.229443407 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
556260.229435668 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
469587.263252878 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
469588.262357064 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
457391.263236304 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
585543.270273974 PYAAQMAAQDCAKIAYDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
702446.294448298 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTVHGAGIEVTEIIDVTPLPHN
702443.292632887 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
702444.292640606 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
702447.294446237 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
657309.locus_tag:BXY_18280 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
457424.EQ973217.G291 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
556259.GG663458.G80 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
471870.DS562360.G288 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI
+RTIHGAGIEVTEIIDVTPLPHN
********: .:* * **:.*.. *****.***:**
+*:: ..*: **.* ****:***
435590.150003364 GCRPPKRRRV
226186.29348112 GCRPPKRRRV
295405.53715441 GCRPPKRRRV
272559.60683413 GCRPPKRRRV
411901.149130658 GCRPPKRRRV
411476.156111512 GCRPPKRRRV
411479.156862051 GCRPPKRRRV
449673.167697753 GCRPPKRRRV
470145.189432667 GCRPPKRRRV
483215.260624635 GCRPPKRRRV
483218.217990640 GCRPPKRRRV
483216.217986519 GCRPPKRRRV
484018.198273057 GCRPPKRRRV
483217.212662191 GCRPPKRRRV
511680.292809380 GCRPPKRRRV
537012.224520527 GCRPPKRRRV
547042.224016865 GCRPPKRRRV
469586.251841635 GCRPPKRRRV
469590.229451850 GCRPPKRRRV
457392.251944867 GCRPPKRRRV
457394.254834140 GCRPPKRRRV
457395.229454394 GCRPPKRRRV
556258.229443407 GCRPPKRRRV
556260.229435668 GCRPPKRRRV
469587.263252878 GCRPPKRRRV
469588.262357064 GCRPPKRRRV
457391.263236304 GCRPPKRRRV
585543.270273974 GCRPPKRRRV
702446.294448298 GCRPPKRRRV
702443.292632887 GCRPPKRRRV
702444.292640606 GCRPPKRRRV
702447.294446237 GCRPPKRRRV
657309.locus_tag:BXY_18280 GCRPPKRRRV
457424.EQ973217.G291 GCRPPKRRRV
556259.GG663458.G80 GCRPPKRRRV
471870.DS562360.G288 GCRPPKRRRV
**********
use strict;
use warnings;
my $clu_align=$ARGV[0];
open(IN,">",$clu_align) or die "can't open file $clu_align";
my @consensus;
while(my $line=<IN>){
chomp($line);
next if ($line =~ /^CLUSTAL/); # header row
next if ($line=~ /^$/); # a few blank rows
if ($line =~ /\*/){
push (@consensus,$line);
}
}
close(IN);
chomp(@consensus);
#foreach my $c (@consensus){ just for debugging
# print "consensus\n";
# print "$c\t";
# print "\n";
#}
my( @index )= grep { $consensus[$_] eq "*" } 0..$#consensus;
my $outfile="$clu_align.VarPositions";
open (OUT,">",$outfile);
foreach my $i (@index) {
print OUT "$i\tis not conserved\n";
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|