Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Using index when the substring as is a regular expression

by rbi (Monk)
on Mar 12, 2001 at 20:59 UTC ( [id://63849]=perlquestion: print w/replies, xml ) Need Help??

rbi has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I would like to locate the position where one out of several possible substrings occours into a string.
say I have strigs like:
some_letters-01-some_other_letters some_other_letters-03-some_other_letters ...
and I want to set my mask to catch:
/-(..)-/
It would be like using index STR, SUBSTR, where SUBSTR is not a string, but some regular expression building a mask.
How should this be done?
Thanks in advance,
Roberto

Replies are listed 'Best First'.
Re: Using index when the substring as is a regular expression
by stephen (Priest) on Mar 12, 2001 at 21:22 UTC
    From the perlvar manpage:

    @-

    $-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.

    Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with substr $_, $-[n], $+[n] - $-[n] if $-[n] is defined, and $+ coincides with substr $_, $-[$#-], $+[$#-]. One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with @+.

    This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The nth element of this array holds the offset of the nth submatch, so $+[1] is the offset where $1 begins, $+[2] the offset where $2 begins, and so on. You can use $#- to determine how many subgroups were in the last successful match. Compare with the @+ variable.

    After a match against some variable $var:

    $` is the same as substr($var, 0, $-[0]) $& is the same as substr($var, $-[0], $+[0] - $-[0]) $' is the same as substr($var, $+[0]) $1 is the same as substr($var, $-[1], $+[1] - $-[1]) $2 is the same as substr($var, $-[2], $+[2] - $-[2]) $3 is the same as substr $var, $-[3], $+[3] - $-[3])

    Therefore, you can do something like this:

    my $q = <<End_Text; some_letters-01-some_other_letters some_other_letters-03-some_other_letters End_Text while ( $q =~ m{\w+ \-(\d\d)\- \w* }gx ) { my $start_index = $-[1]; print "Mask start index is ", $start_index, "\n"; }
    I don't know if this method incurs the same overhead as $`, etc... anybody?

    Update: The previous code made loud banging sounds when tested. This code fixes it. Works on 5.6; uncertain if it will work for previous Perls.

    stephen

Re: Using index when the substring as is a regular expression
by Corion (Patriarch) on Mar 12, 2001 at 21:13 UTC

    Take a look at the pos() function, it returns the position of the end of the match. But really looks to me like you don't know about the magic variables $1 to $9, which capture your parentheses...

    First some codes that does what you ask for :

    $s = "Test-01-xxx"; if ($s =~ /-(\d\d)-/g) { print "Pos: $pos - ", pos( $s )-3, "\n"; }

    And here's some code that does what you need :

    $s = "Test-01-xxx"; if ($s =~ /(.*?)-(\d\d)-(.*)/g) { print "The string was split into :\n" print "Left part : $1\n"; print "Number part : $2\n"; print "Right part : $3\n"; }

    Update : On rereading your post, I see that you want the start of a RE, so there is no way you can get around solution number two. Here's the necessary merge of method one and two. One method is to use the $& variable, which will slow your program down because Perl will keep track of every match in the $& variable then (but maybe perl does anyway, see here for more info about $&). The other method would be to introduce even more parentheses :

    $s = "Test-01-xxx"; # Method 1, possibly slow if ($s =~ /-(\d\d)-/g) { print "Pos: $pos - ", pos( $s )- length( $& ), "\n"; } # Method 2, possibly slow if ($s =~ /(-(\d\d)-)/g) { print "Pos: $pos - ", pos( $s )- length( $1 ), "\n"; }

    Update : See below for stephens info about @- - there's something to learn about Perl every day :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://63849]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-20 00:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found