Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

How can I get the correct results for substrings 2 and 3 in a do-until loop?

by supriyoch_2008 (Scribe)
on May 18, 2012 at 11:40 UTC ( #971284=perlquestion: print w/ replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I am a beginner in perl programming. My interest is to estimate a few values from three substrings within a string. The string is $DNA1="GGCT CTGCGCGGNN"; At first, I have removed the Ns and white spaces. I have written a code but it makes the cmd run continuously. However, the result output in text file on desktop shows correct results for the 1st substring (4 bases), although it does not show the results for substring 2 (CTGC) and substring 3(GCGG). The substrings are non-overlapping and adjacent to each other. How can I correct the code in line 10 for while loop while (my $fm= substr ($DNA1,0,4)) { so that I get the correct results for all the 3 substrings. I have given the code, the correct results that I have got for 1st substring and my expected results for all the substrings. Can any perlmonk help me correct the mistake in code?

My code goes like

#!/usr/bin/perl use strict; use warnings; my $DNA1 = "GGCT CTGCGCGGNN"; # Total base count my $ total1=12; # Remove N from sequence $DNA1 =~ s/N//ig; # Remove whitespace Line 5 $DNA1 =~ s/\s//g; # In a loop, find every 4-base substring & then find its # GC%, GC-skew & Purine Loading Index (PLI): my $fm = 1.010; # Line 9 do { while ( my $fm = substr( $DNA1, 0, 4 ) ) { my $A = 0; my $T = 0; my $G = 0; my $C = 0; while ( $fm =~ /A/ig ) { $A++ } while ( $fm =~ /T/ig ) { $T++ } while ( $fm =~ /G/ig ) { $G++ } while ( $fm =~ /C/ig ) { $C++ } my $tot1 = $A + $T + $G + $C; my $gc1 = $G - $C; my $gc2 = $G + $C; # Line 16 my $cent = 100; my $gccon2 = $gc2 / $tot1; my $gccon3 = $cent * $gccon2; my $gccon4 = sprintf( "%.2f", $gccon3 ); my $gcskew = $gc1 / $gc2; my $GCSkew = sprintf( "%.4f", $gcskew ); # To find Purine Loading Index (PLI): my $four = 4; my $at1 = $A - $T; my $x1 = ( $gc1 + $at1 ) / $tot1; my $thousand=1000; my $pli = $thousand * $x1; my $PLI = sprintf( "%.0f", $pli ); # No. of sliding Windows: my $numberwin = $total1 / $four; my $NoWindows = sprintf( "%.0f", $numberwin ); print " Purine Loading Index of each 1Kb Window=$PLI bases/4- +base.\n"; my $output = "GC-SkewResult .txt"; unless ( open( RESULT, ">my $output" ) ) { print "Cannot open file\"my $output\".\n\n"; exit; } print RESULT"\n RESULTS for substrings:\n GC-Skew values of substrings:\n $GCSkew\n\n Percent GC Content of substrings:\n $gccon4\n\n"; close(RESULT); } } until ( $fm =~ /^\s*$/ ); exit;

I have got the correct results for 1st substring: i.e.

RESULTS for substrings: GC-Skew values of substrings: 0.3333 Percent GC Content of substrings: 75.00

My Expected Results are:

RESULTS for substrings: GC-Skew values of substrings: 0.3333 1.0000 0.5000 Percent GC Content of substrings: 75.00 50.00 100.00

Comment on How can I get the correct results for substrings 2 and 3 in a do-until loop?
Select or Download Code
Re: Why am I getting the result only from 1st substring and not from substring 2 and 3 in using the do-until loop with while loop inside it?
by marto (Bishop) on May 18, 2012 at 11:53 UTC

    You seem to enjoy making life difficult for yourself. You post messy code making it difficult to read and follow program structure. Some posts include use strict; others don't. Be consistent. Here is the perltidy output for this script to get you started, with a couple of changes:

    #!/usr/bin/perl use strict; use warnings; $DNA1 = "GGCT CTGCGCGGNN"; # Remove N from sequence $DNA1 =~ s/N//ig; # Remove whitespace Line 5 $DNA1 =~ s/\s//g; # In a loop, find every 4-base substring & then find its # GC%, GC-skew & Purine Loading Index (PLI): $fm = 1.010; # Line 9 do { while ( my $fm = substr( $DNA1, 0, 4 ) ) { $A = 0; $T = 0; $G = 0; $C = 0; while ( $fm =~ /A/ig ) { $A++ } while ( $fm =~ /T/ig ) { $T++ } while ( $fm =~ /G/ig ) { $G++ } while ( $fm =~ /C/ig ) { $C++ } $tot1 = $A + $T + $G + $C; $gc1 = $G - $C; $gc2 = $G + $C; # Line 16 $cent = 100; $gccon2 = $gc2 / $tot1; $gccon3 = $cent * $gccon2; $gccon4 = sprintf( "%.2f", $gccon3 ); $gcskew = $gc1 / $gc2; $GCSkew = sprintf( "%.4f", $gcskew ); # To find Purine Loading Index (PLI): $four = 4; $at1 = $A - $T; $x1 = ( $gc1 + $at1 ) / $tot1; $pli = $thousand * $x1; $PLI = sprintf( "%.0f", $pli ); # No. of sliding Windows: $numberwin = $total1 / $four; $NoWindows = sprintf( "%.0f", $numberwin ); print " Purine Loading Index of each 1Kb Window=$PLI bases/4- +base.\n"; $output = "GC-SkewResult .txt"; unless ( open( RESULT, ">$output" ) ) { print "Cannot open file\"$output\".\n\n"; exit; } print RESULT"\n RESULTS for substrings:\n GC-Skew values of substrings:\n $GCSkew\n\n Percent GC Content of substrings:\n $gccon4\n\n"; close(RESULT); } } until ( $fm =~ /^\s*$/ ); exit;

    Poor formatting isn't going to make things any easier to fix. Define the required variables and actually work through the program yourself, adding debugging lines where required (start with simple print statements indicating the point your program is executing). See also Debugging and Optimization from the tutorials section.

    Update: Also, your post titles are always too long, see How do I compose an effective node title?.

    Update 2: In future please clearly mark updates, please don't replace the entire post with a new one rendering existing replies out of context.

Re: Why am I getting the result only from 1st substring and not from substring 2 and 3 in using the do-until loop with while loop inside it?
by Eliya (Vicar) on May 18, 2012 at 12:04 UTC
    How can I correct the code in line 10 for while loop while (my $fm= substr ($DNA1,0,4)) {

    If I'm understanding you correctly, you want to update the offset in that substr:

    my $offs = 0; while (my $fm= substr($DNA1, $offs, 4)) { $offs += 4; ... }

    or

    for (my $offs=0; my $fm=substr($DNA1, $offs, 4); $offs+=4 ) { ... }

    That would give you the 3 substrings "GGCT", "CTGC" and "GCGG".

Re: Why am I getting the result only from 1st substring and not from substring 2 and 3 in using the do-until loop with while loop inside it?
by Anonymous Monk on May 18, 2012 at 19:46 UTC

    I am a beginner in perl programming. .... Can any perlmonk help me correct the mistake in code?

    Why should any perlmonk waste their time?

    You've gotten lots of advice on this piece of code already

    and you've managed to ignore most of the advice you've gotten, and ask pretty much the same question again.

    Do you think you should change your strategy?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://971284]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (14)
As of 2014-12-18 13:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (51 votes), past polls