Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

This is related to a similar post yesterday about alignment data. Unfortunately the alignment data is in a different format so that script will not work, buuut fortunately the consensus (ie. if a letter is the same in all columns) is shown at the bottom by a series of stars and periods. I have a few problems: - the alignment is broken over 3 lines by line breaks, so my consensus appears to be 3 words - I don't know how to read out all of the index positions for which an array element matches something eg. * (I have read a similar post but all of the examples are for finding the index of a match that only occurs once.) below is an example data file and my script.

CLUSTAL O(1.0.3) multiple sequence alignment 435590.150003364 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 226186.29348112 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 295405.53715441 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 272559.60683413 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 411901.149130658 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 411476.156111512 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 411479.156862051 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 449673.167697753 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 470145.189432667 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 483215.260624635 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 483218.217990640 -------MTKKRVKKNVEHGQAHIQSSFNNTIVTLTDA +EGNALSWASAGGLGFRGSKKST 483216.217986519 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 484018.198273057 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIIAWSSAGKMGFRGSKKNT 483217.212662191 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 511680.292809380 MAKVTKKVTKKRVKKNVERGQAHIQSSFNNTIVTITDT +EGNALSWASAGGLGFRGSRKST 537012.224520527 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 547042.224016865 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 469586.251841635 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 469590.229451850 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 457392.251944867 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 457394.254834140 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 457395.229454394 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 556258.229443407 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 556260.229435668 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 469587.263252878 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 469588.262357064 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 457391.263236304 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 585543.270273974 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 702446.294448298 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 702443.292632887 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 702444.292640606 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 702447.294446237 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 657309.locus_tag:BXY_18280 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 457424.EQ973217.G291 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 556259.GG663458.G80 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT 471870.DS562360.G288 MAKKTV-AAKKRNVKVDANGQLHVHSSFNNIIVSLANS +EGQIISWSSAGKMGFRGSKKNT :*** * .** *::***** **::::: +**: ::*:*** :*****:*.* 435590.150003364 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 226186.29348112 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 295405.53715441 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 272559.60683413 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 411901.149130658 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 411476.156111512 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 411479.156862051 PYAAQMAAQDCAKIAYDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 449673.167697753 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 470145.189432667 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 483215.260624635 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 483218.217990640 PYAAQMAAETATKAALIHGLKSVDVMVKGPGSGREAAI +RALSAAGLTVTSIKDVTPVPHN 483216.217986519 PYAAQMAAQDCAKIAYDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 484018.198273057 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 483217.212662191 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 511680.292809380 PYAAQMAAETATKAALIHGLKSVDVMVKGPGSGREAAI +RALQACGLEVTSIKDVTPVPHN 537012.224520527 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIVDVTPLPHN 547042.224016865 PYAAQMAAQDCAKVAYDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 469586.251841635 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 469590.229451850 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 457392.251944867 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 457394.254834140 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 457395.229454394 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 556258.229443407 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 556260.229435668 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 469587.263252878 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 469588.262357064 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 457391.263236304 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 585543.270273974 PYAAQMAAQDCAKIAYDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 702446.294448298 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTVHGAGIEVTEIIDVTPLPHN 702443.292632887 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 702444.292640606 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 702447.294446237 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 657309.locus_tag:BXY_18280 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 457424.EQ973217.G291 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 556259.GG663458.G80 PYAAQMAAQDCAKIAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN 471870.DS562360.G288 PYAAQMAAQDCAKVAFDLGLRKVKAYVKGPGNGRESAI +RTIHGAGIEVTEIIDVTPLPHN ********: .:* * **:.*.. *****.***:** +*:: ..*: **.* ****:*** 435590.150003364 GCRPPKRRRV 226186.29348112 GCRPPKRRRV 295405.53715441 GCRPPKRRRV 272559.60683413 GCRPPKRRRV 411901.149130658 GCRPPKRRRV 411476.156111512 GCRPPKRRRV 411479.156862051 GCRPPKRRRV 449673.167697753 GCRPPKRRRV 470145.189432667 GCRPPKRRRV 483215.260624635 GCRPPKRRRV 483218.217990640 GCRPPKRRRV 483216.217986519 GCRPPKRRRV 484018.198273057 GCRPPKRRRV 483217.212662191 GCRPPKRRRV 511680.292809380 GCRPPKRRRV 537012.224520527 GCRPPKRRRV 547042.224016865 GCRPPKRRRV 469586.251841635 GCRPPKRRRV 469590.229451850 GCRPPKRRRV 457392.251944867 GCRPPKRRRV 457394.254834140 GCRPPKRRRV 457395.229454394 GCRPPKRRRV 556258.229443407 GCRPPKRRRV 556260.229435668 GCRPPKRRRV 469587.263252878 GCRPPKRRRV 469588.262357064 GCRPPKRRRV 457391.263236304 GCRPPKRRRV 585543.270273974 GCRPPKRRRV 702446.294448298 GCRPPKRRRV 702443.292632887 GCRPPKRRRV 702444.292640606 GCRPPKRRRV 702447.294446237 GCRPPKRRRV 657309.locus_tag:BXY_18280 GCRPPKRRRV 457424.EQ973217.G291 GCRPPKRRRV 556259.GG663458.G80 GCRPPKRRRV 471870.DS562360.G288 GCRPPKRRRV **********
use strict; use warnings; my $clu_align=$ARGV[0]; open(IN,">",$clu_align) or die "can't open file $clu_align"; my @consensus; while(my $line=<IN>){ chomp($line); next if ($line =~ /^CLUSTAL/); # header row next if ($line=~ /^$/); # a few blank rows if ($line =~ /\*/){ push (@consensus,$line); } } close(IN); chomp(@consensus); #foreach my $c (@consensus){ just for debugging # print "consensus\n"; # print "$c\t"; # print "\n"; #} my( @index )= grep { $consensus[$_] eq "*" } 0..$#consensus; my $outfile="$clu_align.VarPositions"; open (OUT,">",$outfile); foreach my $i (@index) { print OUT "$i\tis not conserved\n"; }

In reply to find index of specific array value that occurs multiple times by AWallBuilder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-03-19 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found