Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

regex clarification

by Anonymous Monk
on Mar 05, 2005 at 10:12 UTC ( [id://436876]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

For the following input i am getting output as shown below. But what i expected was, Jackie, what is the reason for this.

Input:

David Veterinarian Jackie Orthopedist Karen Veterinarian
while (<>) { push(@array, $&) if m/^\w+(?!\s+Vet)/; } print("@array\n");

ouptut:Davi Jackie Kare

Replies are listed 'Best First'.
Re: regex clarification
by Enlil (Parson) on Mar 05, 2005 at 10:33 UTC
    What is happening is that the regex engine tries real hard to succeed. It greedily will grab as much as it can, and then back off if it means it can still succeed.

    In the first row it matches the start of your line then David, but then it notices that David is followed by whitespace and then Vet, so this is a failure. So it backs up a letter and tries again. So now it matches Davi which is not followed by whitespace followed by Vet, so it is a successful match and as such you put Davi into the array. With Jackie it does not fail so it does not backup any letters just puts Jackie into the array, and then finally on the final line it tries Karen finds that it fails backs up a letter for Kare which of course succeeds for the same reason that Davi succeeded above, and viola you get the results you see.

    FWIW here is one to do what intended to do in the first place with a regex:

    while (<>) { push(@array, $&) if m/^(?>\w+)(?!\s+Vet)/; } print("@array\n");
    Though that is only one way, and probably not the most efficient.

    -enlil

Re: regex clarification
by gopalr (Priest) on Mar 05, 2005 at 11:15 UTC

    You can also use

    push(@array, $1) if (/(^\w+)/ && $' !~ /\s+Vet/)
Re: regex clarification
by Thelonious (Scribe) on Mar 05, 2005 at 10:52 UTC
    Of course there are going to be many ways to do this. Here's one:

    while (<>) { push(@array, $1) if m/^(\w+)\s+(?!Vet)\S/; } print @array;

    What this has going for it is that it doesn't use $& (the use of which is a considerable performance hit). Also, it doesn't use (?>pattern) which is: "considered highly experimental, and may be changed or deleted without notice." see perldoc perlre (of course) for more details.

      Thelonius, for some reason that "not space" at the end of your regex seems weird to me.

      I like this solution, because it's an easy way to rule out lines that contain something (ie, "vet") anywhere after the first word, which is I think what the OP really wants to do.

      while (<DATA>) { # match the first word # in lines that don't contain "vet" anywhere push(@array, $1) if m/^(?!\w+\b.*vet)(\w+)/i; } print("@array\n"); __DATA__ David Veterinarian Jackie Orthopedist Karen Veterinarian Vetch Orthopedist Vetch Veterinarian
Re: regex clarification
by holli (Abbot) on Mar 05, 2005 at 13:03 UTC
    TIMTOWTDI!
    while (<DATA>) { m/^(\w+)\s+(\w+)/; push (@array, $1) unless $2 eq "Veterinarian"; } print("@array\n"); __DATA__ David Veterinarian Jackie Orthopedist Karen Veterinarian
    or even
    while (<DATA>) { @_ = split /\s+/; push (@array, $_[0]) unless $_[1] eq "Veterinarian"; } print("@array\n"); __DATA__ David Veterinarian Jackie Orthopedist Karen Veterinarian


    holli, /regexed monk/
Re: regex clarification
by TedPride (Priest) on Mar 05, 2005 at 19:16 UTC
    You don't even need regex for this, btw -
    while (<DATA>) { push(@array, substr($_,0,index($_,' '))) if index($_,'Vet') == -1; } print join("\n",@array); __DATA__ David Veterinarian Jackie Orthopedist Karen Veterinarian

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://436876]
Approved by virtualsue
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-19 16:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found