Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Quick question on matching

by licking9Volts (Pilgrim)
on May 21, 2002 at 16:35 UTC ( #168182=perlquestion: print w/ replies, xml ) Need Help??
licking9Volts has asked for the wisdom of the Perl Monks concerning the following question:

I have a line in my program that looks for an occurrence of any of three phrases. Here's what I have so far:
if (/\bAPI|APIN|UWI\s*\./) { $myvar = $something; }
Is there a way to know exactly which phrase matched and assign it to variable? Example:
if (/\bAPI|APIN|UWI\s*\./) { somesub($matchedpart); }
or would I have to:
if (/\bAPI|APIN|UWI\s*./) { $matchedpart =~ /\bAPI|APIN|UWI\s*./; somesub($matchedpart); }
I've just started using PERL a couple of weeks now, so any advice would be greatly welcomed.

Comment on Quick question on matching
Select or Download Code
Re: Quick question on matching
by kappa (Chaplain) on May 21, 2002 at 16:45 UTC
    Exactly. Use bracketing construct like this:
    if (/\b(API|APIN|UWI)\s*\./) { somesub($1); }
    Read perlre on saving parentheses.

    Update: note that my regexp isn't equal to yours. It instead matches what you probably wanted it to match. The difference is due to the vertical bar | construct. Your original regexp matches \bAPI or APIN or UWI\s*\..

Re: Quick question on matching
by c-era (Curate) on May 21, 2002 at 16:45 UTC
    You can use a reference, place parens around the part you want, and use $1 to access that part of the regex. If you have more then one set of parens, you would use $2 to access the second one, $3 to access the third, etc.
    if (/\b(API|APIN|UWI)\s*\./) { somesub($1); }
Re: Quick question on matching
by graff (Chancellor) on May 21, 2002 at 16:50 UTC
    You need to use parentheses in the matching expression. This will do two things for you: make the "|" work the way you want AND give you the matched string as $1:
    if ( /\b(API|APIN|UWI)\s*\./ ) { $myvar = $1 } # $1 is set to whatever was matched inside the parens # (if there were more than 1 set of parens, additional # paren'ed strings would be set to $2, $3...
    As you originally wrote the expression, it would match "API" following a word boundary, or "APIN" anywhere on a line (e.g. in CHAPIN), or "UWI\s*\." anywhere on a line.

    Read the perlre documentation -- it's long, but very worthwhile.

Re: Quick question on matching
by VSarkiss (Monsignor) on May 21, 2002 at 16:50 UTC

    Yes, you need to use parentheses, like this:

    if (/\b(API|APIN|UWI)\s*\./) { somesub($1); }
    The parentheses group parts of the regular expression, in addition to "capturing" parts of it that you can use in subsequent code. Both functions are important here: I'm guessing that you actually wanted to match just the letters, not the surrounding whitespace. In other words, I'm guessing that you didn't want what your original expression was doing: matching one of
    • \bAPI
    • APIN
    • UWI\s*\.
    Because the alternation character | doesn't bind very tightly.

    More information at perlre.

    HTH

Re: Quick question on matching
by ph0enix (Friar) on May 21, 2002 at 16:52 UTC

    Do you want something like this?

    if (/\b(API|APIN|UWI)\s*\./) { do_something if $1 eq 'API'; do_another if $1 eq 'APIN'; do_else if $1 eq 'UWI'; }

    All in brackets is remembered for future use. Text matched in first () is stored in variable $1.

Re: Quick question on matching
by arunhorne (Pilgrim) on May 21, 2002 at 16:55 UTC

    You could try something like the following. Its heavily commented as you say you just started out in Perl.

    use strict; # Put some phrases in an array for us to test my @phrases = (" APIblahblah", "blah APINblah", "blahUWI ."); # Match on each phrase my $phrase; foreach $phrase (@phrases) { # Do a match - note the brackets around the entire expression. # Brackets are used as a memory # See a Regular Exp guide for more info if ($phrase =~ m/(\bAPI|APIN|UWI\s*\.)/) { # If it matched we are here... $1 prints the memory of what # matched inside the brackets. Similarly if there was a # 2nd set of parentheses $2 would contain their match etc. print "Phrase: $phrase, matched on: $1\n"; # Rather than printing it here you could call your sub... # e.g. # somesub($1); } }

    Do note that the phrase "blah APINblah" will match on API not APIN. However the phrase "blahAPINblah" will only match on APIN. This is due to the word boundary. You might want to take this into account in your regex?

    Good luck, Arun

Re: Quick question on matching
by licking9Volts (Pilgrim) on May 21, 2002 at 18:55 UTC
    Ah! Thank you everyone for your help. It was exactly what I needed. Using parentheses never even occurred to me. I have Learning Perl and it seems to hint at using 1, 2, 3, etc. as remembered patterns, but even then, I didn't notice it until after I read everyone's response here. Hopefully I'll be able to pick up Mastering Regular Expressions this week and work on my regex building. Thanks again.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://168182]
Approved by VSarkiss
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2014-10-01 23:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (41 votes), past polls