http://www.perlmonks.org?node_id=1061170

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I'm interested to translate Perl regexes to other languages, especially JavaScript and eLISP.

This in the most reliable way, i.e. also including warnings if a feature is not translatable.

Anyone aware of a working approach? I found many online analyzers which are easily fooled...especially with many experimental features in Perl RE.

Otherwise I'd like to use the re pragma in debug mode, but unfortunately this is not meant to be used for interactice introspection, or do I miss something?

Plz check the following workaround and suggest better ways to do it:

use strict; use warnings; my $log = parse_regex( q#(\\.|["']|x)# ); print $log; sub parse_regex { my ($regex) = @_; #--- redirect STDERR open my $olderr,">&STDERR"; close STDERR; open STDERR,">",\ my $parselog; # --- compile regex eval q{ use re 'debug'; qr/$regex/; }; # --- restore STDERR close STDERR; open STDERR, ">&", $olderr; # warn "STDERR Restored! =)\n"; return $parselog; }

out
/usr/bin/perl -w /home/lanx/B/PL/PM/parsere.pl Compiling REx "(\.|[%"']|x)" Final program: 1: OPEN1 (3) 3: BRANCH (6) 4: EXACT <.> (21) 6: BRANCH (18) 7: ANYOF["'] (21) 18: BRANCH (FAIL) 19: EXACT <x> (21) 21: CLOSE1 (23) 23: END (0) minlen 1 Freeing REx: "(\.|[%"']|x)" Compilation finished at Mon Nov 4 19:10:27

Of course this is only the first step, I still need to parse the output for RE-opcodes ...

Suggestions for simplifications welcome.

Cheers Rolf

( addicted to the Perl Programming Language)

Replies are listed 'Best First'.
Re: Parsing and translating Perl Regexes
by kennethk (Abbot) on Nov 04, 2013 at 19:26 UTC

    w.r.t the warning redirect, I've been preferring a local clobber of the __WARN__ handler recently. It's less intimidating to the unwashed, and expires naturally.

    use strict; use warnings; my $log = parse_regex( q#(\\.|["']|x)# ); print $log; sub parse_regex { my ($regex) = @_; # Catch and stash warnings my $parselog = ''; local $SIG{__WARN__} = sub { $parselog .= $_ for @_; }; # --- compile regex eval q{ use re 'debug'; qr/$regex/; }; return $parselog; }

    Best of luck; strikes me as an interesting challenge. Is use re 'debug'; output stable enough for automated parsing?

    Update: Huh, the debug actually bypasses the warning handler and prints directly to STDERR. Now starting penance.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Thx,

      I didn't redirect warn() b/c I didn't expect it to be used by the underlying C code.

      And indeed your code doens't work! (unfortunately)

      $log will be undefined ... the output you are getting is just directly written to STDERR.

      Just try to change it before it's written, like lowercasing with lc().

      > Is use re debug; output stable enough for automated parsing?

      we'll see, some stuff isn't escaped (like EXACT <>> (42) to match ">") so hopefully the regexes can still match them by surrouding context.

      Do I have a better choice? (well thats why I started this thread =)

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        And indeed your code doesn't work!
        Yeah, I caught that a little late. As penance I tried to track down a hook, but couldn't find one. And so I offer up a solution using a guard. It's more intimidating, but at least it auto-expires and allows fine-grained localization.
        use strict; use warnings; my $log = parse_regex( q#(\\.|["']|x)# ); print $log; warn "And STDERR is back\n"; sub parse_regex { my ($regex) = @_; # Catch and stash warnings my $guard = stderr_eater(my $parselog); # --- compile regex eval q{ use re 'debug'; qr/$regex/; }; return $parselog; } sub stderr_eater { package STDERR::Eater; open my $guard, '>&', STDERR; close STDERR; open STDERR, ">", \$_[0]; return bless \$guard; sub DESTROY { close STDERR; open STDERR, ">&", $_[0]; } }

        My first thought was YAPE::Regex. Actually my first thought was roll my own, but that's more of a reflection on my own poor choices. It looks like there's some reasonable robustness there, but was unsure if this fell into the "online analyzers which are easily fooled". It'll at least take care of the tokenizing for you.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain Regexp::Debugger )
by Anonymous Monk on Nov 05, 2013 at 01:09 UTC

      Would you be willing to provide the particulars on the problems you have with PPIx::Regexp's modifier propagation so I can improve it? Either an RT ticket or electronic mail to wyant at cpan dot org will do the trick.

      Thanks,

      Tom Wyant

        :) But if I do that I'll have to update my program, and I'm lazy and its hard to juggle :)

        Ok, if you're willing to accept this is a roundabout way to report an issue, a work in progress that stalled few months ago, that started organically as a single subroutine walking the PPIx::Regexp tree and grew from there, slowly as I am learning my way around, into its current state, still in need of refactoring ...

        I'll post the full code in two followups but here is an excerpt from ppixregexplain.pl from what I thought were bugs that received a "TODO.*BUG" note. If there are any inaccuracies thinkos typos you have been warned :)

        In in furtherance of blind copying, the corresponding entries from my "test suite" (it tests my eyeball interface)

        And one more

        In the code below the modifiers propagation code is in the following definitions (you can copy/paste each line to find the sub definition)

A reply falls below the community's threshold of quality. You may see it by logging in.