http://www.perlmonks.org?node_id=1061796


in reply to Re^4: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain ppixregexplain.pl _desc.pl )
in thread Parsing and translating Perl Regexes

OK, thanks. The object hierarchy may be what it is now, for good or ill. But to the extent I understand your code it is not only a documentation of PPIx::Regexp bugs, but a neat piece of work in its own right.

I can't guarantee timing, since all sorts of things have come up in the last couple days, but:

Which of the bugs can I fix without breaking your code? Lacking any preference from you I'll probably start with the \g10 thing, which I hope to be reasonably straightforward.

Do I need to reserve some method names for your use? Like xplain()? I'm not sure how to document it, but I'm willing at least to avoid a few that you designate.

Can you be a little clearer on the modifier propagation thing? Or at least point me to the relevant part of your postings? I remember worrying about this during the Perl::Critic integration, but it may be that I was so focused on dealing with 'use re "/xms";' that I missed something in the regex itself.

Tom Wyant

  • Comment on Re^5: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain ppixregexplain.pl _desc.pl )

Replies are listed 'Best First'.
Re^6: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain ppixregexplain.pl _desc.pl )
by Anonymous Monk on Nov 09, 2013 at 09:34 UTC

    OK, thanks. The object hierarchy may be what it is now, for good or ill.

    I understand, naming things is hard and its not like I have better ideas, I still have puns (chits) in my code; you saved me so many months of work I'm just surprised you didn't go the extra mile :D I think I'm worth the effort , don't you? :P

    But to the extent I understand your code it is not only a documentation of PPIx::Regexp bugs, but a neat piece of work in its own right.

    :D

    Which of the bugs can I fix without breaking your code?

    Now, I think you can fix all of them without breaking anything :) that is if they are bugs, bugs you'd consider fixing ; I think I tended to use "bug" as euphemism for "now I have to think some more" ...

    Do I need to reserve some method names for your use? Like xplain()? I'm not sure how to document it, but I'm willing at least to avoid a few that you designate.

    Not sure that you actually have to reserve them , but xplain prefix sounds ok ... i'm very unsure about the whole what/should/go/where/oop deal, some would say don't play with other peoples namespace :)

    Can you be a little clearer on the modifier propagation thing?

    Hmmm, I'll try
    /(?i)foo/ is same as /foo/i, but the "foo" node doesn't know its case insensitive
    /foo/aad says semantics are "d" but "aa" and d are mutually exclusive (regexerror++)
    /\w/aia says semantics are "a" but "aa" and "a" are different

    So I guess its more of a wishlist; I tried to propagate parent modifiers to children using ->modifiers but ->modifiers discarded some information (like errors), so I ended up doing my own exploding :) I also guess the xplicit propagation might not fit with the purpose of tokens, so PPIx::Regexp wasn't doing propagation of modifers, so not much for you to do regarding propagation :)

    Thanks

      Thanks for the clarification.

      It was the intent that the actual modifiers in effect be propagated so that you actually know what is in effect at any point in the Regex. Whether I did it right is another question. Or whether I did it in an obscure way and then didn't document worth a hoot. At any rate, I think the knowledge that the token for the "f" in /(?i)foo/ is not case-sensitive should be available somewhere.

      On the other hand, I am not sure how seriously I am going to take /foo/aad, since it does not compile (at least under 5.18.1). On the gripping hand, there are already representations of invalid code, so maybe the "d" could become an invalid token. The disgusting thing is that it looks like I actually programmed semantics for this case, and that's definitely wrong.

      As for munging around with other peoples' name spaces, I believe it is generally frowned upon. But I have also done it when desperate. It appears you have a genuine need to attach extra functionality to the PPIx::Regex classes. And the strict O-O way requires you to subclass all however-many-there-are of them, and you STILL have to go through and rebless everything the parser spits out -- or I have to figure out how to make it use your classes as an option. The fact of the matter is that Perl does Aspect-Oriented programming right out of the box, so we may as well recognize the fact.

      What I'm currently thinking about is reserving to myself all subroutine names that begin with ASCII a-w, plus all that begin with one or two underscores, plus all the all-uppercase ones like DESTROY (which I actually use), AUTOLOAD (which I don't (yet)) and so on. Anything else would be fair game. If you plan to release your code as a CPAN module, I might need to document what parts of the name space you are using (and therefore break your anonymity to some hopefully-minimal extent), in case someone else wants to try the same thing.

      Yes, I thought about having PPIx::Regexp actually explain what the tokens were, but I had no pressing need. The problem I was trying to solve was that I was helping out with Perl::Critic, and they were using a different regex parser, which was weird, unmaintained, and started throwing warnings about Perl 5.12 (or maybe 5.14).

      Tom Wyant

        Status update:

        The "\g10" thing was an out-and-out bug, caused by blindly reblessing backreferences over and above the number of capture groups present. Only things of the form \10 should be so reblessed.

        The failure to recognize /foo/aia as equivalent to /foo/aai is also a bug.

        The thing with recognizing /foo/ad as /foo/d is more problematic, partly because my design goal was never to distinguish valid regexes from invalid ones, but only to parse valid ones "correctly". The practical problems are that I can't do anything about the error at the point I might detect it, since the code at that point needs to consider also stuff fed in from (e.g.) "use re '/x';". So for the moment nothing is going to be done about those.

        On the other hand, I have come up with a method on PPIx::Regexp::Element (i.e. inherited by all PPIx::Regexp objects) that will tell whether a given modifier is asserted. I'm sure there are all sorts of edge cases that I have not considered, but in "/(?-i:foo)/i" it correctly says /i is _not_ asserted on the "f".

        Unfortunately I am looking at a very busy week, and probably will not get anything published until very late in the week at the earliest.

        Tom Wyant