supriyoch_2008 has asked for the
wisdom of the Perl Monks concerning the following question:
Hi PerlMonks,
I have a string like "ATCGGCGCCTAT" consisting of four 3-letter words (ATC,GGC,GCC,TAT). I am interested to
count the number and the kinds of letters (A,T,G,C) at 1st, 2nd and 3rd positions of these four words. I am a beginner
in perl programming. I shall be glad if any perl monk can help me by providing the syntax/perl code that can count
letters and their kinds in 1st, 2nd and 3rd position of 3-letter words. Separate scalar variable may be used for each
position and letter. For instance, for the 1st position the scalar variables may be $A1, $T1, $G1 and $C1.
Similarly, for 2nd the scalar variables may be $A2,$T2,$G2,$C2 and for 3rd position $A3,$T3,$G3 and $C3.
#!usr/bin/perl-w
$a="ATCGGCGCCTAT";
code??
print" Letters at 1st position:\n
A1=$A1; T1=$T1; G1=$G1; C1=$C1; \n
Letters at 2nd position:\n
A2=$A2; T2=$T2; G2=$G2; C2=$C2; \n
Letters at 3rd position:\n
A3=$A3; T3=$T3; G3=$G3; C3=$C3; \n";
exit;
Expected results are:
The result of counting from the string should look like:
Letters at 1st position:
A1=1; T1=1; G1=2; C1=0;
Letters at 2nd position:
A2=1; T2=1; G2=1; C2=1;
Letters at 3rd position:
A3=0; T3=1; G3=0; C3=3;
Re: How can I count the number and kinds of letters at 1st, 2nd and 3rd positions of 3-letter words in a string? by johngg (Abbot) on Apr 25, 2012 at 15:32 UTC |
Use a hash of hashes rather than individual scalar count variables. You could use unpack to break the string into words and a global regex match and capture for the individual positions.
knoppix@Microknoppix:~$ perl -Mstrict -MData::Dumper -wE '
> my $str = q{ATCGGCGCCTAT};
> my @words = unpack q{(a3)*}, $str;
> my %counts;
>
> foreach my $word ( @words )
> {
> my $posn;
> $counts{ q{position } . ++ $posn }->{ $1 } ++
> while $word =~ m{(.)}g;
> }
>
> print Data::Dumper->Dumpxs( [ \ %counts ], [ qw{ *counts } ] );'
%counts = (
'position 1' => {
'A' => 1,
'T' => 1,
'G' => 2
},
'position 3' => {
'T' => 1,
'C' => 3
},
'position 2' => {
'A' => 1,
'T' => 1,
'C' => 1,
'G' => 1
}
);
knoppix@Microknoppix:~$
I hope this is helpful.
| [reply] [d/l] |
Re: How can I count the number and kinds of letters at 1st, 2nd and 3rd positions of 3-letter words in a string? by NetWallah (Monsignor) on Apr 25, 2012 at 15:41 UTC |
Build and print a Hash of Arrayrefs like this:
my ($i,%x);
$i=0;
$x{$_}[$i++%3]++ for split //, $a;
for my $k (sort keys %x){
my $aref = $x{$k};
print "$k ";
print for @$aref;
print $_;
}
(untested)
All great truths begin as blasphemies.
― George Bernard Shaw, writer, Nobel laureate (1856-1950)
| [reply] [d/l] |
Re: How can I count the number and kinds of letters at 1st, 2nd and 3rd positions of 3-letter words in a string? by davido (Bishop) on Apr 25, 2012 at 16:02 UTC |
I've noticed the trend in your posts:
When I see that many questions on virtually the same topics (bioinformatics, strings, regular expressions), with little change in level of understanding of the Perl tools between posts, I start to think one of two things is at play here. Either this individual is just letting us solve his problems for him (I'm happy to dismiss this possibility for now, as just me being grumpy), or this individual needs not just an answer to his immediate question, but help learning the tools better so that he'll be better equipped to solve such problems on his own. Perhaps you really haven't been shown where to find the best information on the topics you're inquiring about. That's our fault, so here goes:
There are a number of free sources of information on regular expressions: perlrequick, perlretut, perlre.
Sources of information on conditionals and looping: perlintro, perlsyn
Sources of information on the substr function: substr.
Books about Bioinformatics, and using Perl to solve Bioinformatics problems: Beginning Perl for Bioinformatics, Mastering Perl for Bioinformatics, Developing Bioinformatics Computer Skills.
And I shouldn't forget to mention these gems: Learning Perl, and Mastering Regular Expressions.
There are many other books out there, but I picked these ones to help you narrow down the haystack to a few needles that will be most beneficial to you right now.
I think it would be accurate to say that if you were to spend a few hours with perlintro, perlrequick, perlretut, and substr, you would save yourself many more hours of trouble-shooting time in the longrun... time well spent. Furthermore, if you were to pick one or two of the books from the list above and spend a week or two with it, as well as a few days with all of the POD I linked to, you could become one of the Monastery's stronger Bioinformatics fonts of wisdom.
| [reply] [d/l] |
|
That brings up a question I've been wondering about: Obviously we get a lot of questions about mining bioinformatics data, from people in that business who are trying to become Perl programmers in their spare time. (There's nothing wrong with that, of course, although when I started learning Perl, I started with "hello world," not "mine gigabytes of data for complex character patterns.")
So, with many bioinformatics folks doing it themselves, I figure there must be many more who would rather hire a programmer. I'd further guess that many wouldn't need a full-time person, just someone they can call to put together quick scripts. Is that what the pros here are seeing? Is there a large demand in the industry for Perl programmers? Would it make sense to study up on how the data works, to be able to promote oneself as a "bioinformatics data mining guy"?
| [reply] |
|
Perl has always been a language dedicated to getting things done. While many of us (myself included) enjoy the exploration of deeper topics, many who use it are more interested in the result than in the tool used to obtain the result. There's nothing wrong with that. But as you've identified, it might benefit some of those people to hire someone. Nevertheless, one of Perl's strengths is that it is within reach of the "weekend mechanics" of programming. If you need to rebuild a car's transmission you'll probably send that out to a mechanic. But if all you're doing is changing brake pads or even building a go-cart with a lawn mower engine, you might tackle that yourself just because you can. That's one of Perl's strengths; the weekend programmer, non-CS student, sysadmin, biologist, and sales manager can all accomplish a lot with the "baby Perl" subset.
As I attend Perl Mongers meetings, and as I work with clients, it's easy to forget that not everyone is building big web applications sitting on top of database abstractions and powerful frameworks. Not everyone has a release manager, version control, a QA department, unit testing requirements, and all those other things that are common in "the industry." Perl is used within the programming industry, but it's also heavily used just to get things done.
Whether there's money to be had seeking contracts in the bioinformatics industry, I have no idea. I've always thought (perhaps wrongly so) that many of our bioinformatics questions are coming from academia, which is not necessarily a pot of gold.
| [reply] |
|
|
| [reply] |
Re: How can I count the number and kinds of letters at 1st, 2nd and 3rd positions of 3-letter words in a string? by brx (Pilgrim) on Apr 25, 2012 at 17:52 UTC |
#!perl
use strict;
use warnings;
my $seq = "ATCGGCGCCTAT" ;
my (%first,%second,%third);
#perlre
my @trilet = $seq =~ /.../g;
#perlsyn LOOP
foreach my $letter ('A','T','G','C') { #init
$first{ $letter }=0;
$second{ $letter }=0;
$third{ $letter }=0;
}
foreach my $tri (@trilet) {
#perlfunc : substr
$first{ substr $tri,0,1 }++;
$second{ substr $tri,1,1 }++;
$third{ substr $tri,2,1 }++;
}
foreach my $letter ('A','T','G','C') {
print "$letter=$first{$letter}; ";
}
print "\n";
foreach my $letter ('A','T','G','C') {
print "$letter=$second{$letter}; ";
}
print "\n";
foreach my $letter ('A','T','G','C') {
print "$letter=$third{$letter}; ";
}
print "\n";
| [reply] [d/l] |
Re: How can I count the number and kinds of letters at 1st, 2nd and 3rd positions of 3-letter words in a string? by BillKSmith (Friar) on Apr 27, 2012 at 03:37 UTC |
One more way to parse the sequence. The initialization statement is only needed if zero-counts must be defined.
use strict;
use warnings;
use Readonly;
use Data::Dumper qw( Dumper );
Readonly::Scalar my $seq => "ATCGGCGCCTAT" ;
my %count;
@count{ qw(A1 A2 A3 T1 T2 T3 C1 C2 C3 G1 G2 G3 ) } = (0) x (3*4);
foreach my $i (0 .. length($seq)-1 ) {
my $pos = $i % 3 + 1;
my $base = substr $seq, $i, 1;
$count{$base.$pos}++;
}
$Data::Dumper::Sortkeys = 1;
print Dumper \%count;
Or if you really want individual scalar counts and do not mind global variables or symbolic references.
use strict;
use warnings;
my $seq = 'ATCGGCGCCTAT' ;
my %count;
our( $A1, $A2, $A3, $T1, $T2, $T3, $C1, $C2, $C3, $G1, $G2, $G3 )
= (0) x 12;
foreach my $i (0 .. length($seq)-1 ) {
my $pos = $i % 3 + 1;
my $base = substr $seq, $i, 1;
{no strict 'refs'; ${$base.$pos}++;}
}
print $A1, $A2, $A3, $T1, $T2, $T3, $C1, $C2, $C3, $G1, $G2, $G3;
| [reply] [d/l] [select] |
|
|