http://www.perlmonks.org?node_id=965913

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Request to correct the perl code for getting substrings
by ww (Archbishop) on Apr 19, 2012 at 11:16 UTC
    You'll be able to use say if you specify use (version); where, for example, you replace the '(version)' with the version of Perl you're using -- perhaps 5.010, 5.012; or 5.014;) or use feature qw/say/;

    If you don't know the version, try perl -v at your command prompt.

    Update: Originally, forgot to include 5.10.

      this can't be helping him out much:

      use 1.010;

      Try

      use 5.10.0;
Re: Request to correct the perl code for getting substrings
by eyepopslikeamosquito (Archbishop) on Apr 19, 2012 at 12:23 UTC

    I hope perl monks will help me get the correct result.
    I suggest you take the time to read and understand the answers to your original question.

    Rather than fire off another top level question so quickly, it would have been better for you to be a bit more patient, read the replies to your original question, and then post replies in that node asking for clarification or further refinements/improvements.

Re: Request to correct the perl code for getting substrings
by oldtomas (Novice) on Apr 19, 2012 at 19:01 UTC
    Hi, I don't know why you are assigning first to an array, like so:
    @d = “CCATGNNNTAACCNNATGNNTAGCC”;
    (this assigns the scalar value "CCATG..." to $d[0]) just to flatten it later like so:
    my $string ="@d";
    There are a couple of other things in your code which are unclear to me. That said, the essential bits might be done this way:
    #!/usr/bin/perl -w my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while($string =~ /[AG]TG.*?[AG][AG]/g) { print "matched at ", pos($string) - length($&), ": ", $&, "\n"; }
    Notes:
    • The /g flag at the end of the match operator makes the match "global". In scalar context (that's given within the while() parentheses) it makes that the match is retried each time where it left off the last time, until no more matches are left.
    • Once a match is found, the end position of the match is available in pos($string). The match itself is available in the special variable $&
    • I flagged the "any" part in the regexp (i.e. the ".*") with a non-greedy modifier (that makes ".*?"). This will try the shortest possible matches. If that's not what you want, leave the "?" out.
    • This won't find overlapping matches. You could achieve that resetting pos($string), e.g. to pos($string) - length($&) + 1 or so.
    I hope I understood your problem statement correctly; I'm a bit confused by your sample code, though.

      Don't use $&, it will slow down every regular expression in the program.

      my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while ( $string =~ /([AG]TG.*?T[AG][AG])/ig ) { print "matched at $-[0]: $1\n"; }
        Don't use $&, it will slow down every regular expression in the program
        You're quite right, Sir ;-)
        #!/usr/bin/perl -w use Benchmark; my $string = "CCATGNNNTAACCNNATGNNTAGCC" x 10000; timethese(10, { 'matchamp' => sub {matchamp($string)}, 'matchpar' => sub {matchpar($string)}, }); sub matchamp { my $string = shift; my $matchlen = 0; while($string =~ /[AG]TG.*?[AG][AG]/g) { $matchlen += length($&); # Actually reference $& to keep Perl from + cheating } return $matchlen; } sub matchpar { my $string = shift; my $matchlen = 0; while ( $string =~ /([AG]TG.*?T[AG][AG])/g ) { $matchlen += length($1); # Reference $1, for fairness } return $matchlen; }
        yields:
        Benchmark: timing 10 iterations of matchamp, matchpar... matchamp: 24 wallclock secs (23.48 usr + 0.00 sys = 23.48 CPU) @ 0 +.43/s (n=10) matchpar: 26 wallclock secs (25.70 usr + 0.00 sys = 25.70 CPU) @ 0 +.39/s (n=10)

        The "$&" leg seems a tad faster, right?

        BUT...

        removing every trace of matchamp in the above program yields:

        Benchmark: timing 10 iterations of matchpar... matchpar: 1 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 18 +.52/s (n=10)

        That's a speedup by a factor ~26. I'm impressed.

        Thanks for pointing that out