Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

RE: Death to Dot Star!

by merlyn (Sage)
on Jul 27, 2000 at 16:45 UTC ( #24661=note: print w/ replies, xml ) Need Help??


in reply to Death to Dot Star!

Aha! One of the classic mistakes was made on this code:

$myvar =~ /" # First quote ( # Capture text to $1 (?: # Non-backreferencing parentheses [^?"] # Anything that's not a question mark or quote | # or \?[^"] # A question mark not followed by a quote (to a +llow embedded question marks) )* # Zero or more ) # End capture \?"/x; # Followed by a question mark and quote
Try this with
$myvar = q{ abc"def??"ghi?"jkl };
And you'll see that it matches the ghi, not def??. The problem is that the "question mark NOT followed by a quote" can sometimes eat up the question mark that you need to begin your closing delimiter.

The proper way to tackle this is to "inch-along"...

$myvar = q{ abc"def??"ghi?"jkl }; print "matched <$1>" if $myvar =~ /" # First quote ( # Capture text to $1 (?: # Non-backreferencing parentheses (?!\?") # not question quote? . # ok to inch along )* # Zero or more ) # End capture \?"/sx; # Followed by a question mark and quote
which properly prints:
matched <def?>

I was tackling this kind of thing a lot when people would keep posing the "how do I match a C comment?" back in the early days Pre-Ilya-RE. I got pretty good at breaking just about any regex that claimed to match a comment, by undoing any assumption made.

-- Randal L. Schwartz, Perl hacker


Comment on RE: Death to Dot Star!
Select or Download Code
(Ovid) RE(2): Death to Dot Star!
by Ovid (Cardinal) on Jul 27, 2000 at 19:37 UTC
    Drat, drat drat! And I was on a roll :) Nice work.

    Rather than simply testing for a question mark followed by a character that is not a quote (\?[^"]), I should have tested for a question mark with a negative look-ahead (\?(?!")) for a quote. This appears to work:

    $myvar =~ /"((?:[^?"]|\?(?!"))*)\?"/';
    Unfortunately, Benchmark shows that it's not quite as fast as merlyn's version.

    For those unfamiliar with lookaheads, they allow you to test for text without "bumping along" the regex. In other words, \?[^"] will check for a question mark followed by a non-quote character, but further matching of the regex continues after the non-quote character. \?(?!") allows you to check for a question mark not followed by a quote, but continues matching after the question mark.

    Note: There is a subtle difference between the negated character class and the negative lookahead. The negated character class generally requires a character after the question mark (in the above example), while the negative lookahead just makes sure that a quote doesn't follow the question mark and doesn't require a character.

    Cheers,
    Ovid

WARNING merlyn wrote BAD CODE
by Ovid (Cardinal) on Jul 27, 2000 at 22:55 UTC
    Okay, the title is kind of a joke. It's just a good-natured tweak at merlyn for the brouhaha over his WARNING t0mas wrote BAD CODE node that generated so much flak. No offense intended :)

    merlyn's code was bugging me, but I couldn't quite put my finger on it. My problem was that the dot metacharacter is so indiscriminating that it will match anything. However, I simply assumed that if merlyn posted the code, it must work. His code is great if you're checking for C-style comments that begin and end in something like /* comment here */ or "? comment here ?". But if you read my post, that's not what we were checking for:

      What happens if you were trying to extract questions in quotes without the trailing question mark?
    I mentioned embedded question marks (my idea was that we might have more than one question in a quote), but I never mentioned embedded quotes. I just wanted one set of quotes and my original post bears that out. Here's merlyn's code and my correction:
    #!/usr/bin/perl -w $myvar = q{ abc"def"g"hi?"jkl }; # This regex is from merlyn print "matched <$1>\n" if $myvar =~ /" # First quote ( # Capture text to $1 (?: # Non-backreferencing parentheses (?!\?") # not question quote? . # ok to inch along )* # Zero or more ) # End capture \?"/sx; # Followed by a question mark and quote # This regex is from Ovid print "matched <$1>\n" if $myvar =~ /" # First quote ( # Capture text to $1 (?: # Non-backreferencing parentheses [^?"] # Not a question mark or parentheses | # or \?(?!") # A question mark not followed by a quote )* # Zero or more ) # End Capture \?"/sx; # Followed by a question mark and quote
    The first regex will print matched <def"g"hi>. The second will print matched <hi>.

    No disrespect is intended towards Randal as he was right in pointing out that my first regex was broken.

    Cheers,
    Ovid

        I misunderstood the goal, but I knew that yours wouldn't solve the goal. :)

        Typical Randal. Hey, Randal, have you ever thought about just acknowledging someone else's effort without taking a swipe at the bloke afterwards? I mean, really, how many times do we have to put up with you acting like the big dog and pretending that no one else has anything else to offer?

        I've read through some of your posts here and I have never seen you compliment anyone. On this one, you could have just admitted you were wrong and complimentd "ovid" on his code, or just admitted that you were wrong. Period. No, your ego wouldn't allow that. You had to cut him down.

        If you even remotely care about why you have problems with people, perhaps you should wonder why you cannot be bothered to acknowledge anyone. You know why the witch hunt at Intel and your felony conviction happened? It is not because you were wrong. It looks like you were trying to act in the company's best interest. It's because you are so arrogant that people want to take you down a peg.

        Try admitting that someone besides Randal has something worthwhile to contribute.

        Oh, and that boycott thing is typical Randal BS. Grow up.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://24661]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-08-02 08:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (55 votes), past polls