Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

regular expression

by Rahul Gupta (Sexton)
on Aug 03, 2012 at 14:31 UTC ( [id://985249]=perlquestion: print w/replies, xml ) Need Help??

Rahul Gupta has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

I have this string

Sanity_001_Sanity_001_0001_00_15

i want this

Sanity_001_0001_00_15

and i tried this code

m/.*_?(.*)/

but it did not work for me

please help me

Replies are listed 'Best First'.
Re: regular expression
by choroba (Cardinal) on Aug 03, 2012 at 14:51 UTC
    You can use this regex:
    /(.*)(\1.*)/
    The desired string is stored in $2.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      While this will work for this string, we're obviously dealing with a perl novice here and this solution is difficult to understand at best for someone learning the language.

      If you're ever uncertain as to why your regex is not working, put a capture '()' around each element of the regex and dump them out. You'll see exactly which regex elements are matching which pieces of your string. This will give you a lot of insight into what might be going wrong. Also be sure that you check the return value of your match is true. You might not be matching at all.

      As pointed out by many previous posters, I recommend the OP research greedy and non-greedy matches. Also read up on anchoring regexes with '^' and '$'.

      Woah!

      That works. I expected it would fail because the first group would match everything and then there would be nothing for the 2nd group to match against.

      Here's the code I used to test:

      #! /usr/bin/perl -w -T use strict; my $str = 'foo_bar_foo_bar_12345'; print "$str\n"; $str =~ /(.*)(\1.*)/ || die "Failed!\n"; print "$2\n";

      Can anyone explaine why that is?

        From Backtracking:

        For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part—that's why it's called backtracking.

        The regex engine begins as you say, by matching everything to the first .*, but when the whole match fails it then backtracks one character and tries again. Eventually, it has backtracked to the point at which $1 contains foo_bar and $2 contains foo_bar_12345. The regex engine then verifies that this value of $2 does finally satisfy the condition \1.*, so the entire match succeeds and the regex engine stops looking and returns.

        HTH,

        Athanasius <°(((><contra mundum

      Thanks.It worked for me :-)
Re: regular expression
by Narveson (Chaplain) on Aug 03, 2012 at 14:45 UTC

    Your regular expression /.*_?(.*)/ starts with .*, which matches any string of any length. The _?(.*) matches the empty string that remains after your first piece has matched everything. So you captured the empty string.

    When you review the manual, pay special attention to the meaning of the quantifiers * and ?.

Re: regular expression
by ww (Archbishop) on Aug 03, 2012 at 14:51 UTC
    The initial deathstar ( .*) is greedy, qv.

    IOW, there's nothing left for your capture after leading .* matched everything in your string.

    Read, instead, about look-aheads or try capturing the first Sanity_001_ and the balance of the string (to a second capture clause), and printing/saving only $2 -- either can help you, but learning about look-aheads (and the general category, of which it's a part, "look arounds") will help you greatly, even in the short term.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://985249]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-19 05:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found