Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

position of first matching regex

by techtaskers.com (Novice)
on May 16, 2014 at 23:27 UTC ( #1086404=perlquestion: print w/replies, xml ) Need Help??

techtaskers.com has asked for the wisdom of the Perl Monks concerning the following question:

How do I get the location for the first matching occurance of a regex, in a string? I need an expression that will take the regex /[0-9]/, find its first occurance in a string such as "_835A_Photo_descriptor.png". The point being that sometimes the first character of the file name is a number, sometimes it is not.

index is not good because I need to match a single digit, not a particular digit, thus regex.

I've tried pos, but that does not work, because of the m/xxx/g requsite, and besides, it does not give the beginning location, only where it last looked.

I'm trying @-, per perlvar.pod, but it does not seem to work.

I need this positional value, to evaluate other parts of the string.

foreach $filename (@curdirlist) { # print "$filename\n"; #sane if ( $filename =~ /([0-9])/ ) { ($startloc = @-) && ($alphaloc = $startloc + 4); } print "$startloc "; }

All $startloc s are printing 2, which is never correct given the sample set. It should be 0 or 1 only. (?) Most files start with a digit, some do not.

I stopped learning python because in my opinion, python does not natively support regular expressions, and I got tired of seeing how python works in MSDOS, which I do not use. Most of what I've done in the past has been with bash, sed and awk. But now I need more! More power imagne:<span style="text-decoration: underline;">NOW!!!</span>

So I'm learning perl. But I feel really stupid, like I'm missing the most obvious thing, on this problem.

David

Replies are listed 'Best First'.
Re: position of first matching regex
by GrandFather (Saint) on May 16, 2014 at 23:52 UTC

    The special variable @- is an array that contains the offsets from the start of the string to each match. Consider:

    use strict; use warnings; for my $str ('1 2 3', 'a b c', ' a3 b2 c1') { next if $str !~ /\w?([0-9])/; my $mpos = $-[0]; my $cpos = $-[1]; print "Found $1 in '$str' at index $cpos. Overall match started at + $mpos\n"; }

    $-[0] accesses the first element of the array - the start of the entire match. $-[1] gives the index of the start of the first capture. The script prints:

    Found 1 in '1 2 3' at index 0. Overall match started at 0 Found 3 in ' a3 b2 c1' at index 2. Overall match started at 1

    See the perlvar regular expression variables section. You should take a look at perlretut and perlre too.

    Perl is the programming world's equivalent of English

      Yup! There's one of those obvious things I feared: Duh??? ``@'' is an "array" sigil. Wow. I love learning, and it is usually quit a laugh too. Like this time.

      Nice code! I'm still used to c, just learning perl, and spontaneous variable declarations seem soooo immoral! Just habit. Also, the next if $str !~ /\w?([0-9])/; type of code is still new to me.

      I am assuming that the $- values are keying on the positional declaration within the re. And unlike pos, it does not require m/.../g ?

      Changed this to:

      if ( $filename =~ /([0-9])/ ) { ($startloc = $-[0]); ($alphaloc = $startloc + 4); } print "$startloc$alphaloc ";

      The &&'d failed to set $alphaloc if $startloc was zero. (More hilarity!) A case of trying to be clever, instead of being clever.

      Which now works. Perhaps before I die, I'll remember that perl's arrays are scalar references. Perhaps not.

      Thank you for your help. This now works, and I can get this done.


      David

        Why print if the match fails?

        Note that my code includes strictures (use strict; use warnings;). That will warn you of such unhealthy coding practises. I strongly recommend you use strictures, especially if you are just starting out with Perl.

        The () around your two assignment statements don't add any value and make understanding the intent of your code harder.

        @array is not a scalar reference. It is the array.

        Perl is the programming world's equivalent of English
Re: position of first matching regex
by AnomalousMonk (Bishop) on May 17, 2014 at 02:25 UTC

    Note also the general point that an expression like  $startloc = @- evaluates an array in scalar context and thus gives you the number of elements in the array, not the value of any particular element. See Context in perldata, and the tutorials Context tutorial, and Arrays: A Tutorial/Reference, in particular the section "Get count of elements".

      Yup. That is an excellent goof on my part. I should be good at goofs, I've practiced all my life.

      My two lessons here are that: (@- != $-[0]) && $-[0] requires /...(...).../ re positional variables.

      Thank you for your reply.


      David
        ... $-[0] requires /...(...).../ re positional variables.

        You show a capturing group in your example and refer to "positional variables", so I thought I'd mention that  $-[0] is the offset of the start of the overall match and does not depend at all on capture groups.

        c:\@Work\Perl>perl -wMstrict -le "my $s = 'xxxabcyyy'; ;; print qq{overall match begining at offset $-[0] in '$s'} if $s =~ m{ abc }xms; " overall match begining at offset 3 in 'xxxabcyyy'

        (And BTW: I've been practicing goofs for quite a while myself.)

Re: position of first matching regex
by InfiniteSilence (Curate) on May 17, 2014 at 01:43 UTC

    Just set a variable once and only once with the position the first time it matches.

    perl -e '$shoo=q!my name is prince!; while($shoo=~m/i/g){$rx = $+[0] +unless $rx } print $rx;
    Makes
    9

    But like Grandfather said, read perlvar.

    Celebrate Intellectual Diversity

Re: position of first matching regex
by wjw (Priest) on May 16, 2014 at 23:35 UTC
    If you know that your number will be in position 0 or position 1, and that is all you care about, use the '^' to determine if the first character is a number or not. If it is, you will get a match, if it is not, then you know the first character is an non-number...

    if ($filename =~ /^[0-9]/) { .... } (untested)

    Hope that is helpful....

    ...the majority is always wrong, and always the last to know about it...
    Insanity: Doing the same thing over and over again and expecting different results...

      Yes that would work, given the assumptions. What I hope to accomplish is to code a procedure that is less assumptive and more informing. I can run further contingencies based on the location metric.

      I also need to key onto another part of the file name, a letter indexed serialization of the file. These are tutorial slides, and it is important to account for that too.

      Thank you for your reply.

      David

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1086404]
Approved by GrandFather
Front-paged by boftx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2022-05-17 01:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (65 votes). Check out past polls.

    Notices?