Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Request to correct the perl code for getting substrings

by supriyoch_2008 (Monk)
on Apr 19, 2012 at 11:04 UTC ( #965913=perlquestion: print w/replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perlmonks,

I have written a perl code which is giving wrong results. I need your help to sort out this problem.

From an array, I am interested in getting the substrings starting with either A or G, followed by TG, followed by any number of characters, followed by T, followed by A or G, ending with A or G i.e. AGTG.*TAGAG

#!/usr/bin/perl -w # Finding substring starting with [AG]TG followed by any number of cha +racter but ending with T[AG][AG]. # Getting the output of each substring, its length and locating its po +sition from left to right # by number with starting position as zero @d = “CCATGNNNTAACCNNATGNNTAGCC”; use 1.010; my $string ="@d"; # Remove whitespace $string=~ s/\s//g; $x= () =$string=~ /[AG]TG.*T[AG][AG]/ig; say $_ for map{("%d('%S')",length,$_)}split/$x/,$string; my @a=$_ for map{("%d('%S')",length,$_)}split/$x/,$string; print "The sequences and their lengths are:\n @a \n"; exit;

Result: I am getting the following wrong result from cmd:

Microsoft Windows Version 6.1.7600

Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\DR-SUPRIYO>cd desktop


Can't call method "say" without a package or object reference at C:\Users\DR-SUP RIYO\Desktop\ line 11.

Correct Result should be like this:

The sequences , their lengths and positions are:

ATGNNNTAA; 9; 2-10

<ATGNNTAG; 8; 15-22

I hope perl monks will help me get the correct result.

Replies are listed 'Best First'.
Re: Request to correct the perl code for getting substrings
by ww (Archbishop) on Apr 19, 2012 at 11:16 UTC
    You'll be able to use say if you specify use (version); where, for example, you replace the '(version)' with the version of Perl you're using -- perhaps 5.010, 5.012; or 5.014;) or use feature qw/say/;

    If you don't know the version, try perl -v at your command prompt.

    Update: Originally, forgot to include 5.10.

      this can't be helping him out much:

      use 1.010;


      use 5.10.0;
Re: Request to correct the perl code for getting substrings
by eyepopslikeamosquito (Chancellor) on Apr 19, 2012 at 12:23 UTC

    I hope perl monks will help me get the correct result.
    I suggest you take the time to read and understand the answers to your original question.

    Rather than fire off another top level question so quickly, it would have been better for you to be a bit more patient, read the replies to your original question, and then post replies in that node asking for clarification or further refinements/improvements.

Re: Request to correct the perl code for getting substrings
by oldtomas (Novice) on Apr 19, 2012 at 19:01 UTC
    Hi, I don't know why you are assigning first to an array, like so:
    (this assigns the scalar value "CCATG..." to $d[0]) just to flatten it later like so:
    my $string ="@d";
    There are a couple of other things in your code which are unclear to me. That said, the essential bits might be done this way:
    #!/usr/bin/perl -w my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while($string =~ /[AG]TG.*?[AG][AG]/g) { print "matched at ", pos($string) - length($&), ": ", $&, "\n"; }
    • The /g flag at the end of the match operator makes the match "global". In scalar context (that's given within the while() parentheses) it makes that the match is retried each time where it left off the last time, until no more matches are left.
    • Once a match is found, the end position of the match is available in pos($string). The match itself is available in the special variable $&
    • I flagged the "any" part in the regexp (i.e. the ".*") with a non-greedy modifier (that makes ".*?"). This will try the shortest possible matches. If that's not what you want, leave the "?" out.
    • This won't find overlapping matches. You could achieve that resetting pos($string), e.g. to pos($string) - length($&) + 1 or so.
    I hope I understood your problem statement correctly; I'm a bit confused by your sample code, though.

      Don't use $&, it will slow down every regular expression in the program.

      my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while ( $string =~ /([AG]TG.*?T[AG][AG])/ig ) { print "matched at $-[0]: $1\n"; }
        Don't use $&, it will slow down every regular expression in the program
        You're quite right, Sir ;-)
        #!/usr/bin/perl -w use Benchmark; my $string = "CCATGNNNTAACCNNATGNNTAGCC" x 10000; timethese(10, { 'matchamp' => sub {matchamp($string)}, 'matchpar' => sub {matchpar($string)}, }); sub matchamp { my $string = shift; my $matchlen = 0; while($string =~ /[AG]TG.*?[AG][AG]/g) { $matchlen += length($&); # Actually reference $& to keep Perl from + cheating } return $matchlen; } sub matchpar { my $string = shift; my $matchlen = 0; while ( $string =~ /([AG]TG.*?T[AG][AG])/g ) { $matchlen += length($1); # Reference $1, for fairness } return $matchlen; }
        Benchmark: timing 10 iterations of matchamp, matchpar... matchamp: 24 wallclock secs (23.48 usr + 0.00 sys = 23.48 CPU) @ 0 +.43/s (n=10) matchpar: 26 wallclock secs (25.70 usr + 0.00 sys = 25.70 CPU) @ 0 +.39/s (n=10)

        The "$&" leg seems a tad faster, right?


        removing every trace of matchamp in the above program yields:

        Benchmark: timing 10 iterations of matchpar... matchpar: 1 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 18 +.52/s (n=10)

        That's a speedup by a factor ~26. I'm impressed.

        Thanks for pointing that out

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://965913]
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2018-06-23 08:32 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.