Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: intron length

by marto (Archbishop)
on Feb 27, 2013 at 09:09 UTC ( #1020829=note: print w/replies, xml ) Need Help??

in reply to intron length

You have not asked a question. What part are you having problems with?

Update: Thanks for ignoring the advice given, and for totally changing your post while ignoring formatting. If you want to make it easy for people to help you, read and understand the links I've given you.

Replies are listed 'Best First'.
Re^2: intron length
by MBobur (Initiate) on Feb 27, 2013 at 10:35 UTC

    Thank you for your response. I got script which calculates full sequence length in fasta file format. But I want add script which will calculate intron length, which is "intron length = full length - exon length". Exon length = Transcription length.

    #!/usr/bin/perl use strict; use warnings; my $fastaName = "example2.fasta"; open FASTA, $fastaName or die "cannot open $fastaName\n"; my $sequence = ""; my $sequencelength; while ( my $line = <FASTA> ) { chomp($line); if ( substr( $line, 0, 1 ) eq ">" ) { if ( length($sequence) > 1 ) { #Calculate sequence lenght $sequencelength = length($sequence); print "Sequence length: $sequencelength"; print "\n"; $sequence = ""; } print "$line\n"; } else { $sequence .= $line; } } #Calculate sequence lenght for last sequence $sequencelength = length($sequence); #Print sequence lenght print "Sequence length: $sequencelength"; print "\n";

      "But I want add script which will calculate intron length, which is "intron length = full length - exon length". Exon length = Transcription length."

      Your code makes no mention of intron, exon or transcription. FASTA_format makes no mention of these eiter. I, like most people am not a bioinformatician. You're either going to have to describe your problem better (see links I previously gave which describe this in detail, direct link) or wait for someone who is familiar with terms you're using who is willing to help.

      It isn't clear what you are using here to determine the intron lengths. What was initially "posted" was output from cufflinks, which has your gene/transcript information and FPKM scores for each. Your newly posted script reads in a fasta file and determines the sequence length for each entry in the fasta file. Easy enough. The introns aren't marked in a fasta file, so I'm guessing that you'll use transcript information from the refFlat file from the UCSC genome browser or ensembl, etc. If you wanted to know the length of all exons combined for a given transcript (and ignoring any splicing variants, etc.) then you'll want to use the refFlat.txt file that can be downloaded from UCSC. It's easy to parse, and you can use the table browser to help figure out what values are in what columns (it notes where each exon begins and ends, for instance). You can import the data into a hash with the gene symbol or the accession number as a key and then calculate the exon/intron lengths for only the transcripts that you are interested in.

      In the future, I'd try to post a bit more information and be careful to format it better on the site. People here are willing to help, but are less likely to do so if it annoys them. I can look at what you posted and see exactly what you are doing because I work with this type of data all day long; others may not but still have invaluable input in writing your scripts properly, so try to help them get on board. Good luck!

        Ok, I missed parts at the bottom of the first post. Does your fasta file have transcript sequences? I'm assuming then you have transcript sequences that contain intronic regions that aren't spliced out? Or are these simply splicing variants and alternate exons? Either way, you can use regular expressions to grab the exon lengths in the cufflinks file, and you can compare this to the sequence length from the fasta file or other transcipt information files from the UCSC genome browser (or the GTF file that you are likely using from Illumina via the cufflinks webpage to annotate transcripts), etc. Again, you can store the transcript information in a hash and use the gene symbol or accession number as the key so that you are comparing the correct things.

        Sorry for missing that, I need more coffee it seems :).


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020829]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2018-04-26 14:15 GMT
Find Nodes?
    Voting Booth?