Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

regex search valid only if registers n and n+1 are equal?

by Voronich (Hermit)
on Jun 30, 2006 at 19:30 UTC ( #558665=perlquestion: print w/replies, xml ) Need Help??
Voronich has asked for the wisdom of the Perl Monks concerning the following question:

Help me obi-several, you're my only hope.

I'm not confident I phrased that right in the title. But here 'goes. (I know this is out there but I don't have any hooks on what to call it to search.)

I have 2 strings:
  • "foo 1000 bar 1000"
  • "foo 1000 bar 500"

and I'm starting with something like:

/foo (\d+) bar (???)/

My goal is to match the first string but not the second. I couldn't care less what the values are, merely whether or not they are equal.

I seem to recall it being possible (and perhaps fundamental) to match against "the string that matched grouping N".

But for the life of me I can't figure out how to do it, or what kind of keywords to search for to dig this up.

(please feel invited to edit this for clarity.)

UPDATE: Damnit. My fault. There's an additional restriction. The regex itself is to be included in a fairly obtuse dispatch mechanism so the test needs to be constrained entirely to the matching of the expression. (does that make sense?)

Here's another wrinkle for the willing: Given that I can match easily enough with backrefs those two numbers, anyone know how to, again entirely within the scope of the regex match, return a POSITIVE match if the two numbers (both of \d+ format) do NOT match? (Yes, this is a practical independent real-world case. Ah, the joys of parsing log files in realtime.)

Thanks very much for your help (this article pushes me up to Monk! *SQUEE*)


Replies are listed 'Best First'.
Re: regex search valid only if registers n and n+1 are equal?
by Hue-Bond (Priest) on Jun 30, 2006 at 19:36 UTC

    Use backreferences, like this:

    while (<DATA>) { print "match on $1\n" if /^\S+ (\d+) \S+ \1$/; } __DATA__ foo 1000 bar 1000 foo 1000 bar 500

    David Serrano

      yeesh. "backreference" is the word that unlocked the universe here. Thanks :)
Re: regex search valid only if registers n and n+1 are equal?
by TedPride (Priest) on Jul 01, 2006 at 01:08 UTC
    Why make things overly complicated?
    while (<DATA>) { next if !m/foo (\d+) bar (\d+)/ || $1 != $2; print; } __DATA__ foo 1000 bar 1000 foo 1000 bar 500
      That can also be written using positive logic, as follows:
      while (<DATA>) { print if m/foo (\d+) bar (\d+)/ && $1 == $2; } __DATA__ foo 1000 bar 1000 foo 1000 bar 500

        Hello De Morgan's laws.

        Yes, I think you already knew what it is. This is for the other members of the audience.

Re: regex search valid only if registers n and n+1 are equal?
by Miguel (Friar) on Jul 01, 2006 at 12:23 UTC
    Just another way:
    UPDATE: Another 3 ways:
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { # 1 # my @arr = split; # print if $arr[1] != $arr[3]; # 2 print if sub { $_[1] != $_[3] }->(split); } # 3 # while (@_ = <DATA> =~ /(\w+)\s+/g) {print "@_\n" if $_[1] != $_[3]} __DATA__ foo 1000 bar 1000 foo 1000 bar 500
    foo 1000 bar 500
Re: regex search valid only if registers n and n+1 are equal?
by gam3 (Curate) on Jul 01, 2006 at 15:01 UTC
    And here is a benchmark for the 3 solutions.

    It seems that using the regular expression natively is very fast. This is followed by the regular exprssion and OR logic. The split is slowest.

    Rate split simple backref split 4329/s -- -31% -66% simple 6313/s 46% -- -50% backref 12658/s 192% 101% --
    UPDATE: As Sam points out I had the effeciency of the different techniques completely backwards. I also noticed a bug in the simple technique. Fixxing that speed it up by 50%.
    -- gam3
    A picture is worth a thousand words, but takes 200K.
      The backreference is much slower as you would expect.

      Really? Looks like the backreference is faster in your test.


Re: regex search valid only if registers n and n+1 are equal?
by sh1tn (Priest) on Jun 30, 2006 at 23:19 UTC
    Another possible solution can be:
    while (<DATA>) { print "match on $1 and no 2nd$2\n" if /\S+ (\d+) \S+ (?:\d+)/; }

      I get two prints (as expected) for the provided data. It should only print the first line. Perhaps you didn't understand the question?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://558665]
Approved by Hue-Bond
Front-paged by tye
[usemodperl]: what the hell happened to the monastery?
usemodperl gets scolded by little old ladies on every dang post
[usemodperl]: did we get invaded by soy boys or what? it wasn't like this 20 years ago :-)
[usemodperl]: i can see the old tgimers hiding out on reddit! lol

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2018-06-24 15:41 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.