Re: Request to correct the perl code for getting substrings

Hi, I don't know why you are assigning first to an array, like so:

@d = “CCATGNNNTAACCNNATGNNTAGCC”;
[download]

(this assigns the scalar value "CCATG..." to $d[0]) just to flatten it later like so:

my $string ="@d";
[download]

There are a couple of other things in your code which are unclear to me. That said, the essential bits might be done this way:

#!/usr/bin/perl -w
my $string = "CCATGNNNTAACCNNATGNNTAGCC";
while($string =~ /[AG]TG.*?[AG][AG]/g) {
  print "matched at ", pos($string) - length($&), ": ", $&, "\n";
}
[download]

Notes:

The /g flag at the end of the match operator makes the match "global". In scalar context (that's given within the while() parentheses) it makes that the match is retried each time where it left off the last time, until no more matches are left.
Once a match is found, the end position of the match is available in pos($string). The match itself is available in the special variable $&
I flagged the "any" part in the regexp (i.e. the ".*") with a non-greedy modifier (that makes ".*?"). This will try the shortest possible matches. If that's not what you want, leave the "?" out.
This won't find overlapping matches. You could achieve that resetting pos($string), e.g. to pos($string) - length($&) + 1 or so.

I hope I understood your problem statement correctly; I'm a bit confused by your sample code, though.

Comment on Re: Request to correct the perl code for getting substrings Select or Download Code

Replies are listed 'Best First'.
Re^2: Request to correct the perl code for getting substrings by jwkrahn (Abbot) on Apr 19, 2012 at 22:17 UTC
Don't use `$&`, it will slow down every regular expression in the program. `my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while ( $string =~ /([AG]TG.*?T[AG][AG])/ig ) { print "matched at $-[0]: $1\n"; }` [download]	[reply] [d/l] [select]
Re^3: Request to correct the perl code for getting substrings by oldtomas (Novice) on Apr 20, 2012 at 06:57 UTC
Don't use $&, it will slow down every regular expression in the program You're quite right, Sir ;-) #!/usr/bin/perl -w use Benchmark; my $string = "CCATGNNNTAACCNNATGNNTAGCC" x 10000; timethese(10, { 'matchamp' => sub {matchamp($string)}, 'matchpar' => sub {matchpar($string)}, }); sub matchamp { my $string = shift; my $matchlen = 0; while($string =~ /[AG]TG.?[AG][AG]/g) { $matchlen += length($&); # Actually reference $& to keep Perl from + cheating } return $matchlen; } sub matchpar { my $string = shift; my $matchlen = 0; while ( $string =~ /([AG]TG.?T[AG][AG])/g ) { $matchlen += length($1); # Reference $1, for fairness } return $matchlen; } [download] yields: `Benchmark: timing 10 iterations of matchamp, matchpar... matchamp: 24 wallclock secs (23.48 usr + 0.00 sys = 23.48 CPU) @ 0 +.43/s (n=10) matchpar: 26 wallclock secs (25.70 usr + 0.00 sys = 25.70 CPU) @ 0 +.39/s (n=10)` [download] The "$&" leg seems a tad faster, right? BUT... removing every trace of matchamp in the above program yields: `Benchmark: timing 10 iterations of matchpar... matchpar: 1 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 18 +.52/s (n=10)` [download] That's a speedup by a factor ~26. I'm impressed. Thanks for pointing that out	[reply] [d/l] [select]


Do you know where your variables are?
	PerlMonks