Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Question on Regex grouping

by ajguitarmaniac (Sexton)
on Dec 21, 2010 at 05:29 UTC ( #878154=perlquestion: print w/ replies, xml ) Need Help??
ajguitarmaniac has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks! Here's my question. I'm writing a regex within a IF condition and this is how it goes..

foreach $arr(@arr){ if (($arr =~ /(\w*)(abc)(\d{5})/) && ($arr =~ / (\w*)(def)(\d{8})/)){

From the regex above, I'm looking to capture the value of the 8 digit number found in the second regex. How do I go about this? I must admit that both regexes are part of the same string (which is really long).

Comment on Question on Regex grouping
Download Code
Re: Question on Regex grouping
by Anonyrnous Monk (Hermit) on Dec 21, 2010 at 05:52 UTC
    I'm looking to capture the value of the 8 digit number found in the second regex
    foreach $arr ("foo abc12345 def12345678 bar") { if (($arr =~ /(\w*)(abc)(\d{5})/) && ($arr =~ / (\w*)(def)(\d{8})/ +)) { print $3; # "12345678" } }

      Thanks Anonyrnous Monk! Does this mean that, irrespective of the number of regexes I write within the IF statement, $3 would point to the 3rd group of the last regex? Out of curiosity,what do I have to do if I need to print the 5 digit number in the first regex as well? Please bear with my questions if at all they sound silly. I'm new to perl and I really want to learn! Thanks :-)

        http://search.cpan.org/~jesse/perl-5.12.2/pod/perlre.pod#Capture_buffers
        The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, $`, $', and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match

        Out of curiosity,what do I have to do if I need to print the 5 digit number in the first regex as well?

        store it, or print it before performing another match operation

        Numbering of captures counts per match/regex, of which you have two here - the one with the 8-digit capture being evaluated last.

        If you wanted to extract the 5-digit number from the first regex, you could simply reverse the order of the &&-combined tests:

        if (($arr =~ / (\w*)(def)(\d{8})/) && ($arr =~ /(\w*)(abc)(\d{5})/)) {

        Now, $3 is the 5-digit number.

        If you wanted to keep all captures, you could assign them to variables, e.g.:

        if (( my ($c1, $c2, $c3) = $arr =~ /(\w*)(abc)(\d{5})/) && ( my ($c4, $c5, $c6) = $arr =~ / (\w*)(def)(\d{8})/)) {
Re: Question on Regex grouping
by Marshall (Prior) on Dec 21, 2010 at 08:36 UTC
    There are many ways to go about this.

    First, do not put parens '()' around things that you have no interest in using later. "Capturing" these things consumes time and resources and to no effect.

    In general, I avoid using $1, $5 etc. Use Perl list slice instead. Assign directly to a variable like the code below shows. As you write more and more Perl code this $1, $2 stuff will appear less and less often.

    This term m/abc\d{5}/ is a pre-condition - I have no problem at all with writing code that says: "forget this line if that pre-condition is not satisfied" and that is what the code below does.

    Trying to compress things into a single statement gives Perl a bad name as a "write only" language and that reputation is undeserved! I am a big fan of both C and of Perl. It is easy to write obscure stuff in both languages, but you don't have to!

    #!/usr/bin/perl -w use strict; my @x = ( 'smomedef12345', 'anabc12345 and there is some def12345678', 'qwerabc12345def55', 'def87654321abc54321', ); foreach (@x) { next unless (m/abc\d{5}/); #pre-condition to look further (my $string8) = m/def(\d{8})/; #puts $string8 in list context #$string8 = (m/def(\d{8})/)[0]; #alternate way with list slice if ( !defined($string8) ) { $string8 = 'undefined'; #Perl 5.10 has a special way to do this #Probably here just do "next;" # because an undefined value means the # regex above did not match! } print "var def=$string8\n"; } __END__ prints: ..note that first item is silently skipped! var def=12345678 var def=undefined var def=87654321 #note that this works even though #the pre-condition of abc\d{5} #occurs later in the line! Wow!
    Update: ok a more obtuse solution:
    my @x = ( 'smomedef12345', 'anabc12345 and there is some def12345678', 'qwerabc12345def55', 'def87654321abc54321', ); print map{ /abc\d{5}/ and /def(\d{8})/ ? "def=$1\n" : () }@x; __END__ prints: def=12345678 def=87654321
    Does essentially the same thing but in a much more obtuse way.
    I think the first code is better for a lot of reasons.

      Thanks Marshall!

      First, do not put parens '()' around things that you have no interest in using later. "Capturing" these things consumes time and resources and to no effect.
      Most of the cost is paid by the first parenthesis, that is, there's a significant cost difference between not using capturing parens at all, and using capturing parens. Additional parens don't contribute that much.
      In general, I avoid using $1, $5 etc. Use Perl list slice instead.
      Careful here. Using a list slice (which I find quite ugly), or assigning the list to puts the match in list context, which will change the behaviour if /g is present.

      But more importantly, in certain cases, when using list slices, you do not know whether there was a match or not:

      my $a = rand() < .5 ? "f" : "g" my $b = rand() < .5 ? "p" : "o"; my $c = ("foo" =~ /($a)*$b/)[0];
      Did it match, or didn't it? If $c is defined, it matched. But what if $c isn't? If $a eq "g", and $b eq "o", there is a match, but $c is undefined.
        "Additional parens don't contribute that much".

        Fair enough, there is overhead in doing it at all. I am saying "don't over do it".

        list slice, hash slice, etc are some of the most cool features in Perl! You are completely correct in that list slice does not "play well with match global" because the number of things that can be returned is variable and therefore there is no way to specifiy a subset of range indicies that are of interest.

        The classic example of list slice is used when spliting a line and you want 127,[3..5],93,8 things on that line. And I do work with DB lines like that - it is actually common for such a thing. List slice allows me to assign those 6 things directly into variables that mean something within the program. I usually assign vars on the left ($x,$y,$z..) in the order that the following code will use them. And adjust the slice accordingly.

        If you are saying that "do not use list slice when doing a match global", I would absolutely agree with that. And I do not think that I have recommended that.

        In your code, my $c = ("foo" =~ /($a)*$b/)[0]; is an improper use of list slice.

        Properly used, list slice is beautiful.

        People who dislike list slicing should avoid scripting languages, especially Perl. It's FALSE that you don't know whether there was a match with m//g because the created list is simply empty. Also, the GOATSE ( =()=) recreates the right context if that's an issue. Take this code: $x = "a123b345c7865d87"; @L = ($x =~ /a-z/g)1,3; print "@L"; ## Prins b d @X = ($x =~ /#/g)1,3; print (defined(@X) ? "YES" : "NO"; It prints NO ... therefore, JavaFan, your assertions are FALSE and FALSE. TenThouPerlStudents

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://878154]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-07-12 14:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (240 votes), past polls