Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Request to correct the perl code for getting substrings

by oldtomas (Novice)
on Apr 19, 2012 at 19:01 UTC ( [id://966033]=note: print w/replies, xml ) Need Help??


in reply to Request to correct the perl code for getting substrings

Hi, I don't know why you are assigning first to an array, like so:
@d = “CCATGNNNTAACCNNATGNNTAGCC”;
(this assigns the scalar value "CCATG..." to $d[0]) just to flatten it later like so:
my $string ="@d";
There are a couple of other things in your code which are unclear to me. That said, the essential bits might be done this way:
#!/usr/bin/perl -w my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while($string =~ /[AG]TG.*?[AG][AG]/g) { print "matched at ", pos($string) - length($&), ": ", $&, "\n"; }
Notes:
  • The /g flag at the end of the match operator makes the match "global". In scalar context (that's given within the while() parentheses) it makes that the match is retried each time where it left off the last time, until no more matches are left.
  • Once a match is found, the end position of the match is available in pos($string). The match itself is available in the special variable $&
  • I flagged the "any" part in the regexp (i.e. the ".*") with a non-greedy modifier (that makes ".*?"). This will try the shortest possible matches. If that's not what you want, leave the "?" out.
  • This won't find overlapping matches. You could achieve that resetting pos($string), e.g. to pos($string) - length($&) + 1 or so.
I hope I understood your problem statement correctly; I'm a bit confused by your sample code, though.

Replies are listed 'Best First'.
Re^2: Request to correct the perl code for getting substrings
by jwkrahn (Abbot) on Apr 19, 2012 at 22:17 UTC

    Don't use $&, it will slow down every regular expression in the program.

    my $string = "CCATGNNNTAACCNNATGNNTAGCC"; while ( $string =~ /([AG]TG.*?T[AG][AG])/ig ) { print "matched at $-[0]: $1\n"; }
      Don't use $&, it will slow down every regular expression in the program
      You're quite right, Sir ;-)
      #!/usr/bin/perl -w use Benchmark; my $string = "CCATGNNNTAACCNNATGNNTAGCC" x 10000; timethese(10, { 'matchamp' => sub {matchamp($string)}, 'matchpar' => sub {matchpar($string)}, }); sub matchamp { my $string = shift; my $matchlen = 0; while($string =~ /[AG]TG.*?[AG][AG]/g) { $matchlen += length($&); # Actually reference $& to keep Perl from + cheating } return $matchlen; } sub matchpar { my $string = shift; my $matchlen = 0; while ( $string =~ /([AG]TG.*?T[AG][AG])/g ) { $matchlen += length($1); # Reference $1, for fairness } return $matchlen; }
      yields:
      Benchmark: timing 10 iterations of matchamp, matchpar... matchamp: 24 wallclock secs (23.48 usr + 0.00 sys = 23.48 CPU) @ 0 +.43/s (n=10) matchpar: 26 wallclock secs (25.70 usr + 0.00 sys = 25.70 CPU) @ 0 +.39/s (n=10)

      The "$&" leg seems a tad faster, right?

      BUT...

      removing every trace of matchamp in the above program yields:

      Benchmark: timing 10 iterations of matchpar... matchpar: 1 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 18 +.52/s (n=10)

      That's a speedup by a factor ~26. I'm impressed.

      Thanks for pointing that out

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://966033]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-25 13:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found