Contributed by NotProud
on Sep 26, 2000 at 05:08 UTC
Q&A
> regular expressions
Description: Hm. I can not get the left to show up. Any ideas?
"left center right" =~ /center/;
print "<br>Left: <$`>\n";
print "<br>Match: $&\n";
print "<br>Right: <$'>\n";
Thanks
Answer: How do I get what is to the left of my match? contributed by tye
Note that if you ever even mention $`,
$&, or $', anywhere, then all
regular expressions anywhere in that run of Perl will be
not-insignificantly slower. So their use is strongly
discouraged in code that might be reused or where
performance is important. This is because using those
anywhere forces each regex to make copies of those strings
every time, even though most of those copies will never be
used (if Perl ever needs them then Perl can't
predict when it might need them and so must always make the
copies).
But the latest version of Perl adds an alternate way to
get this type of information, @- and @+.
Here is a sample of how to use them:
( my $str= "left center right" ) =~ /center/;
print "\nLeft: <", substr( $str, 0, $-[0] ),
">\nMatch: <", substr( $str, $-[$#-], $+[$#-] - $-[$#-] ),
">\nRight: <", substr( $str, $+[$#+] ), ">\n";
__END__
This prints:
Left: <left >
Match: <center>
Right: < right>
At the time of this writing, perlvar isn't recent enough
to mention @- and @+. But if you have a
version of Perl recent enough to have @- and
@+, (Perl 5.6.0 or later) then your
perlvar.pod will also include documentation on
them.
If you can't find perlvar.pod then enter the
command perldoc perlvar or cd to your
perl lib directory and there should be a pod directory that
contains that file. These pod files contain some
simple "mark-up" codes but are designed to be easy for
humans to read. (You can read perlpod.pod for more
information on the mark-up language.)
-
tye | Answer: How do I get what is to the left of my match? contributed by fundflow It works for me.
The following should give you the same result and
seems more readable:
$_="left center right";
m/(.*)(center)(.*)/;
print "$1:$2:$3\n";
| Answer: How do I get what is to the left of my match? contributed by lima1 The CPAN module Regexp::MatchContext was written for this task. The module's SYNOPSIS:
use Regexp::MatchContext -vars;
$str = m/(?p) \d+ /;
print "Before: $PREMATCH\n";
print "Matched: $MATCH\n";
print "After: $POSTMATCH\n";
$MATCH = 2 * $MATCH; # substitute into original $str
Note that this and the previous solutions are significantly slower than using the matchvariables &`, $& and $'. However, as tye mentioned, these variables will slow down EVERY other regular expression without capturing parentheses.
The following benchmark (searching a short (11 characters or base pairs) DNA sequence in a 2000 bp DNA sequence) shows the results of a comparison of all four solutions:
Rate regex context at_minus matchvars
regex 17271/s -- -22% -66% -84%
context 22239/s 29% -- -56% -79%
at_minus 50420/s 192% 127% -- -53%
matchvars 107527/s 523% 384% 113% --
Note that this benchmark uses match variables and thus slows down all four solutions. The results without the match variable solution are:
Rate regex context at_minus
regex 17544/s -- -24% -69%
context 23112/s 32% -- -60%
at_minus 57361/s 227% 148% --
Appendix: Source code of the benchmark
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
use Regexp::MatchContext;
my $count = 300000;
# to test that all solutions produce the same output
my $VERBOSE = 0;
$count = 2 if $VERBOSE;
my $seq
= 'GGGTTGAAGTTTAGACCGCTCACAGTAGTTCTACCTATAGAAAAGATCATGAAAGAGGCGATC
+AGAATGGTACTCGAATCCATTTACGATCCCGAGTTTCCAGACACATCGCATTTCCGCTCGGGTCAAGGC
+TGCCACTCGGTCCTAAGACGGATCAAAGAAGAGTGGGGAATCTCTCGCTGGTTTTTAGAATTCGACATC
+AGGAAGTGTTTTCACACCATCGACCGACATCGACTCATCCAAATTTTGAAGGAAGAGATCGACGATCCC
+AAGTTCTTTTACTCCATTCAGAAAGTATTTTCCGCCGGACGACTCGTAGGAGTTGAGAGGGGCCCTTAC
+TCCGTCCCACACAGTGTACTACTATCGGCCCTACCAGGCAACATCTACCTACACAAGCTCGATCAGGAG
+ATAGGGAGGATCCGACAGAAGTACGAAATTCCGATTGTTCAGAGAGTCAGATCGGTTCTATTAAGGACA
+GGTCGTCGTATTGATGACCAAGAAAACCCTGGAGAAGAAGCAAGCTTCAACGCTCCCCAAGACAACAGA
+GCCATCATTGTGGGGAGCGTTAAGAGCATGCAACGCAAAGCGGCCTTTCATTCCCTTGTTTCGTCGTGG
+CACACCCCCCCCACAAGCACCCTCCGGCTCAGGGGGGACCAGAAAAGGCCTTTCGTTTTCCCCCCTTCG
+TCGGCCCTTGCCGTCTTCCTTAACAAGCCCTCGAGCCTTCTTTGCGCCGCCTTCCTCATAGAAGCCGCC
+GGGTTGACCCCGAAGGCTGAATTCTATGGTGGAGAACGCTGTAATAATAATTGGGCCATGAGAGACCTT
+CTTAAGTATTGCAAAAGAAAGGGCCTGCTGATAGAGCTGGGCGGGGAGGCGATACTAGTTATCAGGTCA
+GAGAGAGGCCTGGCCCGTAAGCAGGCCCCCTTAAAAACCCATTACTTAATAAGGATTTGTTACGCGCGA
+TATGCCGACGACTTACTACTGGGAATCGTGGGTGCCGTAGAGCTTCTCATAGAAATACAAAAACGTATC
+GCCCATTTCCTACAATCTGGCCTGAACCTTTGGGTAGGCTCCGCAGGATCAACAACAATAGCTGCACGG
+AGTACGGTAGAATTCCTTGGTACGGTCATTCGGGAAGTCCCTCCGAGGACGACTCCCATACAATTTTTG
+CGAGAGCTGGAAAAGCGTCTACGGGTAAAGCACCGTATCCATATAACTGCTTGCCACCTACGCTCCGCC
+ATCCATTCAAAGTTTAGGAACCTAGGTGATAGTATCCCGATCAAACAGCTGACGAAGGGGATGAGCAAA
+ACAGGGAGTCTACAGGACGGGGTTCAACTAGCGGAGACTCTTGGAACAGCTGGAGTCAGAAGTCCCCAA
+GTTAGCGTATTATGGGGGACCGTCAAGCACATCCGGCAAGGATCAAGGGGGATCTCGTTCTTGCATAGC
+TCAGGTCGGAGCAACGCGTCATCGGACGTTCAACAGGTAGTCTCACGATCGGGCACTCATGCCCGTAAG
+TTGTCATTGTATACTCCCCCGGGTCGGAAGGCGGCGGGGGAGGGAGGAGGACACTGGGCGGGATCTATC
+AGCAGCGAATTCCCCATAAAGATAGAGGCACCTATAAAAAAGATACTCCGAAGGCTTCGGGATCGAGGT
+ATCATTAGCCGAAGAAGACCCTGGCCAATCCACGTGGCCTGTTTGACGAACGTCAGCGACGAAGACATC
+GTAAATTGGTCCGCGGGCATCGCGATAAGTCCTCTGTCCTACTACAGGTGCCGCGACAACCTTTATCAA
+GTCCGAACGATTGTCGACCACCAGATTCGCTGGTCTGCAATATTCACCCTAGCCCACAAGCACAAATCC
+TCGGCGCCGAATATAATCCTCAAGTACTCCAAAGACTCAAATATTGTAAATCAAGAAGGTGGCAAGATC
+CTTGCAGAGTTCCCCAACAGCATAGAGCTTGGGAAGCTCGGACCCGGTCAAGACCTGAACAAGAAGGAA
+CACTCAACTACTAGTCTAGTCTAG';
cmpthese(
$count,
{ 'regex' => sub {
my ( $prematch, $match, $postmatch )
= $seq =~ m{(\A .*?) (CTGGCCCGTAA) (.*\z) }xms;
warn "$prematch $match $postmatch" if $VERBOSE;
},
'matchvars' => sub {
$seq =~ m{CTGGCCCGTAA}xms;
my ($prematch, $match, $postmatch) = ($`, $&, $');
warn "$prematch $match $postmatch" if $VERBOSE;
},
'context' => sub {
$seq =~ m{(?p)CTGGCCCGTAA}xms;
my ( $prematch, $match, $postmatch )
= ( PREMATCH(), MATCH(), POSTMATCH() );
warn "$prematch $match $postmatch" if $VERBOSE;
},
'at_minus' => sub {
$seq =~ m{CTGGCCCGTAA}xms;
my $prematch = substr( $seq, 0, $-[0] );
my $match = substr( $seq, $-[$#-], $+[$#-] - $-[$#-] );
my $postmatch = substr( $seq, $+[$#+] );
warn "$prematch $match $postmatch" if $VERBOSE;
},
}
);
|
Please (register and) log in if you wish to add an answer
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|