Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

How can I get the values of codon usage using the Bio::CUA module?

by supriyoch_2008 (Monk)
on Nov 09, 2015 at 01:56 UTC ( [id://1147233]=perlquestion: print w/replies, xml ) Need Help??

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I am interested in estimating a few parameters of codon usage from a coding sequence using the perl module "use Bio::CUA" retrieved from cpan. I have written a script s1.pl but it fails to work. The above module contains several other modules inside it. I am at my wit's end to solve the issue. I need help from perlmonks so that I can get the values of codon usage.

Here goes my script s1.pl:

# Code begins here ################################ # To estimate tAI,CAI & encp of cds: ################################ #!/usr/bin/perl use warnings; use strict; use Bio::CUA; # perl module my $cds="ATGCGTCTCTTCAAAACGCGCAAATCCACGGATACCTACAGCACACTAGCCGCGCAGCAA CAGCAACAGCAGCAGCAGCAACAACAACATCAAGCGGAAGGCAGCAACATTTCCCACAGC AGCAACAGCAGCAGCAACAAGAGTCACACACCGGCAACATGCAGCAACAGACTGAACAAG AGCATTGTGAGCAGCACCAGCATATCGTCATCGCTGCCTGATCTGCATGACAAGTCGCCC GTCATGATCCTCAGCTGCACCACCCTGGCCAGCAATGGAGCCACCGCCACGGCAGCGGTC ACAGCAACAGCCACCGGCACAGCAGCAACATCTGGCGGCTCGCTGCAGCAGCAACAACAG CAGCATCTGCAACACCAGCAGCAGCAGCAGCCGTTACGCACGGCCACGCCCACGTGTCTG CTGAGTGGCCGTCAGACGCCATCGGCCATATCGGTGATGTCGCTCCAAGAGGCCACCAGT CTGCACCGCCAGCAACAGCAGCCACACCAGCCACCCACCATCTACGTGCCGGTGCCTACG AAACTTGGCAACAATGTCAACACTGGCAACAGCTCGGCCACTCTGCTGCTCAGCTATGGC AGCACCAGCAGCATCGCCAACCTGCAACAGCAGCAGCAGCAGCATGCCGCCCAGTACCAG CAGTATGTTGCACAGCGGCTGCACGCCGCTTCCAGCAGTTGTTTGTACGAGAAGGGGTCG AATGCCAGCGGTGGGGCGAGCAGCAACAAAAGCAGTCTATCCCTGACCCCAAATGGTCAC TTGCCCGACTACAAGTTGGTGACAGCGATGCCAGTTGTTGTCCTGGACGATGAACACAAA TCCAATTCATTGCCGGCCACTGAAGCGAGTCGCAACAGCAACAGCAGCAGCAACATGAAC GGCAGCAGCAACAGCAACAGCCTTGACGTCAGCAACAGCAACTCGCATTCGGGGAGCTCC ACTTCTTTGGCCAGCACCACGAGAAATGTTTTCACCTGGGGCAAGCGCATGAGTCGCAAA CTGGATTTGCTGAAGCGGAGTGACTCGCCCGCCGCCGCCCACAAATCGCATTCGGATTTG AGGAGTCTGTTCCACTCGCCAACGCACCACAAGAGTGGATCCGGTGGATCCAGCGGACCC AGTTCGGCGAAGGCATCGGCCTCACCCACTGGCGGCCATCAGAACAGCTCCGGCTCCACG ACCAGCACCCTCAAGAAGTGCAAGTCGGGGCCCATCGAGACCATCAAGCAGCGACACCAG CAGCAGCAGCAGCAGCAGCAATCGGTTCAGGATGTGGGCACGGGACAGAGCCAGAGTGCT CAGTCCACGCCCACGCATCAGTTCCAGGCGGCCGCCCGCCCACAGAAAGCGCTAAAGAAC TTCTTCCATAGGATCGGGTCCACCGGCATGCTGAACCATCGCTCCCACAATCTCCTTAAG GCTTCGGAGGCGGCTCAACAGGCCACCCCGGCAGCCACCACATTGTATAGGAGCAGCTCC ACTAGCCAGCTGTCCAGCAGCTCCTATGTGAAGTGCGACGATCCCACCGAGGGACTGAAT CTTCAGAGGGAGCAGCGGGAACAGCGTCTTCCGCGGATCGCCAGCCTGAAGTCCAGTAGC TGCGATGACATAGCCAAGGTGAGCAGTTGCCTGACGGCCAGCACAAGTAGTGGCAGTGCC GCAGGCAGCTTGGGCTCTCCTCCAAGTAGTGCAGCAGCTGGTGGAGGCGGAACTGCAAAC AGCGGCCAACACGATCCCTCGCGTCGTGGTGCATTTCCTTACGCCTTCCTGCGATCACGT CTCTCCGTTTTGCCAGAGGAGAACCACGGAAATGTACCAGGACACCTGAAGCAACAAATA CAGCGGCAACGGGAGCAGCACCAGCAGCATCAGAGGGATCTCCTCCAGCAGGAGCAGACA TCGTCGCCCCTTCCCCAGCGCCGATCCCCCGAACAGGCGATGCTGAACAATGTGTCACGC AACGACAGCATCACCTCCAAGGACTGGGAACCACTTTACCAAAGATTAAGTAGTTGTCTA AGTTCAAACGAGTCCGGCTACGACAGCGATGGGGGTGCGACGGGAGCCCGACTGGGCAAT AATCTGAGCATCTCCGGCGGAGATACCGAATCTATTGCCTCGGGCACACTCAAGCGTAAC TCGCTCATCTCCCTCAGCTCCTCGGAGGGCGTTGGAATGGGCATGGGCATGAGTCTGGGA CTGGGTGCCCCATCGACGAGGAACAGCAGCATCTGCAGTGCTCCCGTGTCGCTGGGTGGC TATAACTACGACTATGAGACGGAGACGATACGGCGACGATTCAGGCAGGTTAAGCTGGAG CGCAAGTGCCAAGAGGACTACATCGGAATTGTCCTGTCGCCCAAAACGGTGATGACCAAT AGCAATGAGCAGCAGTACAGGTATCTCATCGTGGAACTGGAACCCTATGGCATGGCCCAA AAGGATGGTCGCCTTCGCCTGGGTGACGAGATCGTCAACGTAAATGGAAAACACCTGCGA GGCATTCAATCCTTTGCAGAGGTTCAGCGCCTGTTGAGCAGCTTTGTGGACAACTGTATC GACCTGGTGATTGCTCACGATGAGGTGACGACGGTAACTGATTTCTACACCAAAATCCGT ATCGATGGGATGAGCACGCAGCGCCATCGGCTGAGTTATGTGCAACGCACACAGAGCACA GACAGTCTGAGCAGCATGCAGAGTCTGCAGCTGCAGCAGGAGAGGATTCAGGGTCACAAT ACGGAACAGGAGCAGGAGGCCCAGGGCGAGGATCAGTGCGATGCGCGTTCAATGGCCAGC GTCAGCACAATGCCCACTCCGATGCCGCTGATGCAGCATCGTCGGAGCTCCACGCCCAGG CACTCACTGGACGTCGGTGCGCCGGAGCATGAGCTCCTCAGGAGGCGGGCGCGCAGCTCC TCAGGTCAGCGCAGCTTGGCTCTAACACCGACCCCACTCTTTGCCAGCGGCAGCAGCAGT TGCTCCTCCTCCCCTAACCACCGGTTGCTGGATAACGAGAACGACCCTGCTAACGACACC GATTCCTATACGCCAGTGTATGCAAATCGGGCGGCAAGCGTGTGCGTGGCCTCCTCCCTG GCGGACGATGAGAAGTGGCAGTTACTGGCCCGAAAGCGCTGCTCGGAGGGTTCCGCCCTA TCCGCTACACCGAACCCGCAGCAATTTGGCCAGCGCACTCACTACGCCAGAAACTCCATC AATCTGGCCAACTCGCATTACCGCTCGCTCCGATTTGCCCACTCGCGGCTGAGTTCGTCT CGCCTTAGTCTGTTCATGCAGGCACCGCCTAACAGTCTAACCGTCGGAGAAGGAGTCGCT AACACCCCATCCTCTACAGCTACCACAACCACTGATCTCACTAACCAGCAGCAACAGCAG CAAAACCAGCAACAGACACACCAATCACTGTACATCAAGCACTCGCCAAAGAGCGTCTCA TTGTTCTCGCCTAATCCCTATGTTAACGCCTCATCCTCACCAGCTTCGGCATCCACATCA GCGGGTGCCGGCTCCTCCCTGGCACCGCCAGCCGCTGCCCTAATGCATCACAGGCCATCG CTTCCGGTGGCCAAGCTAACAATACGCGACGAGGAAATGGCGGAGGTCATCCGTGCGTCG ATGAGCGAGGGTAGTGGACGTTGCACCCCGAAGACTATAACCTTCTTTAAGGGACCTGGA CTGAAATCGTTGGGCTTCAGCATAGTGGGAGGTCGAGATTCGCCAAAGGGCAACATGGGA ATTTTTGTAAAGACCGTGTTTCCCTCAGGCCAGGCAGCCGATGATGGCACACTGCAAGCG GGCGACGAGATTGTAGAGATCAATGGAAACTCTGTGCAGGGCATGAGTCATGCCGAAACC ATAGGACTCTTTAAGAACGTAAGAGAGGGCACCATTGTGCTAAAAATCTTAAGAAGAAAA TTACAGAAAGCTAAATCGATGGGTTGCTAG"; $cds=~ s/\s//g; ########################################################## # The following code has been taken from example of module: ########################################################## my $calc = Bio::CUA::CUB::Calculator->new( -codon_table => 1, -tAI_values => 'tai.out' # from Bio::CUA::CUB::Builder ); # create an IO to a sequence file my $io = Bio::CUA::SeqIO->new($cds); # read each sequence as a Bio::CUA::Seq object from this io while (my $seq = $io->next_seq) { my $tai = $self->tai($seq); my $CAI = $self->cai($seq); my $encp =$self->encp_r($seq,[$minTotal,[$A,$T,$C,$G]]); printf("%10s: %.7f\n", $seq->id, $tai); printf("%10s: %.7f\n", $seq->id, $CAI); printf("%10s: %.7f\n", $seq->id, $encp); my $tai_val=printf("%10s: %.7f\n", $seq->id, $tai); my $CAI_val=printf("%10s: %.7f\n", $seq->id, $CAI); my $encp_val=printf("%10s: %.7f\n", $seq->id, $encp); print "\n Results of Cds: tAI value=$tai_val; CAI value=$CAI_val; encp value=$encp_val\n"; } exit;

I get the following results in cmd:

Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\x>cd desktop C:\Users\x\Desktop>s1.pl Global symbol "$self" requires explicit package name at C:\Users\x\Des +kto p\s1.pl line 97. Global symbol "$self" requires explicit package name at C:\Users\x\Des +kto p\s1.pl line 98. Global symbol "$self" requires explicit package name at C:\Users\x\Des +kto p\s1.pl line 99. Global symbol "$minTotal" requires explicit package name at C:\Users\x +\De sktop\s1.pl line 99. Global symbol "$A" requires explicit package name at C:\Users\x\Deskto +p\s 1.pl line 99. Global symbol "$T" requires explicit package name at C:\Users\x\Deskto +p\s 1.pl line 99. Global symbol "$C" requires explicit package name at C:\Users\x\Deskto +p\s 1.pl line 99. Global symbol "$G" requires explicit package name at C:\Users\x\Deskto +p\s 1.pl line 99. Execution of C:\Users\x\Desktop\s1.pl aborted due to compilation error +s. C:\Users\x\Desktop>

Replies are listed 'Best First'.
Re: How can I get the values of codon usage using the Bio::CUA module?
by Athanasius (Archbishop) on Nov 09, 2015 at 04:12 UTC

    Hello supriyoch_2008,

    The changes proposed by u65 are a good start, but in addition I’ve had to (1) change:

    use Bio::CUA; # perl module

    to:

    use Bio::CUA::CUB::Calculator; use Bio::CUA::SeqIO;

    (2) create an empty file tai.out; (3) put the coding sequence in a file cds.dat and change:

    my $io = Bio::CUA::SeqIO->new($cds);

    to:

    my $io = Bio::CUA::SeqIO->new(-file => 'cds.dat');

    (4) change three occurrences of printf to sprintf:

    my $tai_val = sprintf "%10s: %.7f\n", $seq->id, $tai; my $CAI_val = sprintf "%10s: %.7f\n", $seq->id, $CAI; my $encp_val = sprintf "%10s: %.7f\n", $seq->id, $encp;

    But I’m still not getting meaningful output. :-( You need to specify for the PerlMonks exactly what output you expect to get. Also, you will need to address the following warnings:

    ------------- WARNING Bio::CUA::CUB::Calculator ------------- MSG: CAI values for codons were not provided for this analyzer, so can + not calculate CAI for sequences ------------------------------------------------------------- ... ------------- WARNING Bio::CUA::CUB::Calculator ------------- MSG: No default base composition for seq '', so no GC-corrected ENC -------------------------------------------------------------

    But I note that the documentation for Bio::CUA::SeqIO says that it is:

    a package to parse sequence file if module Bio::SeqIO is unavailable in the system.

    So perhaps you would be better off experimenting with the Bio::SeqIO module?

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi Athanasius,

      Thank you for your suggestions. I shall try with Bio::SeqIO module.

Re: How can I get the values of codon usage using the Bio::CUA module?
by u65 (Chaplain) on Nov 09, 2015 at 02:08 UTC

    I don't know what you've tried, but those errors are from the undeclared variables cited. If you have just copied the erroneous code, you must have missed the earlier part where they were declared.

    Update: Try replacing all instances of $self with $calc. Then change this line:

    my $encp =$self->encp_r($seq,[$minTotal,[$A,$T,$C,$G]]);

    to this:

    my $encp =$calc->encp_r($seq);

    All I did was remove the optional args from the encp_r method.

    Update 2: I just saw this reference (http://biorxiv.org/content/early/2015/11/03/019265) on today's Perl Weekly which looks like something of interest to Perl Bio folks like supriyoch_2008.

      Hi u65,

      Thank you very much for your suggestions. The reference cited by you is really very good and will be of great use to me in my future endeavor.

Re: How can I get the values of codon usage using the Bio::CUA module?
by GotToBTru (Prior) on Nov 09, 2015 at 04:05 UTC

    You've pieced together code from examples. I suggest you try with very simple things to start with. Perhaps with the Tutorial?

    Dum Spiro Spero

      Hi GotToBTru,

      Thank you for your suggestions. I shall read the tutorial.

        I have noticed several questions on here lately about working with DNA sequences. Do a super search on terms like DNA, Fasta, Sequence, etc. and you will find others who are using Perl for bioinformatics. You might learn something from their questions and answers or even help them yourself.

        Dum Spiro Spero

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1147233]
Approved by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-23 10:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found