Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Parsing and translating Perl Regexes

by LanX (Canon)
on Nov 04, 2013 at 18:21 UTC ( #1061170=perlquestion: print w/ replies, xml ) Need Help??
LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I'm interested to translate Perl regexes to other languages, especially JavaScript and eLISP.

This in the most reliable way, i.e. also including warnings if a feature is not translatable.

Anyone aware of a working approach? I found many online analyzers which are easily fooled...especially with many experimental features in Perl RE.

Otherwise I'd like to use the re pragma in debug mode, but unfortunately this is not meant to be used for interactice introspection, or do I miss something?

Plz check the following workaround and suggest better ways to do it:

use strict; use warnings; my $log = parse_regex( q#(\\.|["']|x)# ); print $log; sub parse_regex { my ($regex) = @_; #--- redirect STDERR open my $olderr,">&STDERR"; close STDERR; open STDERR,">",\ my $parselog; # --- compile regex eval q{ use re 'debug'; qr/$regex/; }; # --- restore STDERR close STDERR; open STDERR, ">&", $olderr; # warn "STDERR Restored! =)\n"; return $parselog; }

out
/usr/bin/perl -w /home/lanx/B/PL/PM/parsere.pl Compiling REx "(\.|[%"']|x)" Final program: 1: OPEN1 (3) 3: BRANCH (6) 4: EXACT <.> (21) 6: BRANCH (18) 7: ANYOF["'] (21) 18: BRANCH (FAIL) 19: EXACT <x> (21) 21: CLOSE1 (23) 23: END (0) minlen 1 Freeing REx: "(\.|[%"']|x)" Compilation finished at Mon Nov 4 19:10:27

Of course this is only the first step, I still need to parse the output for RE-opcodes ...

Suggestions for simplifications welcome.

Cheers Rolf

( addicted to the Perl Programming Language)

Comment on Parsing and translating Perl Regexes
Select or Download Code
Re: Parsing and translating Perl Regexes
by kennethk (Monsignor) on Nov 04, 2013 at 19:26 UTC

    w.r.t the warning redirect, I've been preferring a local clobber of the __WARN__ handler recently. It's less intimidating to the unwashed, and expires naturally.

    use strict; use warnings; my $log = parse_regex( q#(\\.|["']|x)# ); print $log; sub parse_regex { my ($regex) = @_; # Catch and stash warnings my $parselog = ''; local $SIG{__WARN__} = sub { $parselog .= $_ for @_; }; # --- compile regex eval q{ use re 'debug'; qr/$regex/; }; return $parselog; }

    Best of luck; strikes me as an interesting challenge. Is use re 'debug'; output stable enough for automated parsing?

    Update: Huh, the debug actually bypasses the warning handler and prints directly to STDERR. Now starting penance.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Thx,

      I didn't redirect warn() b/c I didn't expect it to be used by the underlying C code.

      And indeed your code doens't work! (unfortunately)

      $log will be undefined ... the output you are getting is just directly written to STDERR.

      Just try to change it before it's written, like lowercasing with lc().

      > Is use re debug; output stable enough for automated parsing?

      we'll see, some stuff isn't escaped (like EXACT <>> (42) to match ">") so hopefully the regexes can still match them by surrouding context.

      Do I have a better choice? (well thats why I started this thread =)

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        And indeed your code doesn't work!
        Yeah, I caught that a little late. As penance I tried to track down a hook, but couldn't find one. And so I offer up a solution using a guard. It's more intimidating, but at least it auto-expires and allows fine-grained localization.
        use strict; use warnings; my $log = parse_regex( q#(\\.|["']|x)# ); print $log; warn "And STDERR is back\n"; sub parse_regex { my ($regex) = @_; # Catch and stash warnings my $guard = stderr_eater(my $parselog); # --- compile regex eval q{ use re 'debug'; qr/$regex/; }; return $parselog; } sub stderr_eater { package STDERR::Eater; open my $guard, '>&', STDERR; close STDERR; open STDERR, ">", \$_[0]; return bless \$guard; sub DESTROY { close STDERR; open STDERR, ">&", $_[0]; } }

        My first thought was YAPE::Regex. Actually my first thought was roll my own, but that's more of a reflection on my own poor choices. It looks like there's some reasonable robustness there, but was unsure if this fell into the "online analyzers which are easily fooled". It'll at least take care of the tokenizing for you.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain Regexp::Debugger )
by Anonymous Monk on Nov 05, 2013 at 01:09 UTC

      Would you be willing to provide the particulars on the problems you have with PPIx::Regexp's modifier propagation so I can improve it? Either an RT ticket or electronic mail to wyant at cpan dot org will do the trick.

      Thanks,

      Tom Wyant

        :) But if I do that I'll have to update my program, and I'm lazy and its hard to juggle :)

        Ok, if you're willing to accept this is a roundabout way to report an issue, a work in progress that stalled few months ago, that started organically as a single subroutine walking the PPIx::Regexp tree and grew from there, slowly as I am learning my way around, into its current state, still in need of refactoring ...

        I'll post the full code in two followups but here is an excerpt from ppixregexplain.pl from what I thought were bugs that received a "TODO.*BUG" note. If there are any inaccuracies thinkos typos you have been warned :)

        In in furtherance of blind copying, the corresponding entries from my "test suite" (it tests my eyeball interface)

        And one more

        In the code below the modifiers propagation code is in the following definitions (you can copy/paste each line to find the sub definition)

Re: Parsing and translating Perl Regexes
by sundialsvc4 (Abbot) on Nov 05, 2013 at 01:42 UTC

    Pardon me for perhaps being stupid here ... but ... what exclusive right does «the Perl language» (tah, dahhh...!) have to “the same interpretation of RE-syntax that «the Perl language» does?”

    Perl is (just ...) “a programming language,” and, as such, its underlying implementation of “regular expressions” most-certainly is not unique.   In fact, a great many (other...) languages “rather go out of their way” to implement “Perl-compatible” versions of it.

    So ... prithee ... “down-vote away,” I still really want to know ... what exactly is The Problem Here?

    If you (merely ...) “want access to a ‘Perl-compatible™ Regex’ from ...” any of the aforesaid languages, then you certainly can do so without resorting to “Perl-language mechinations!”

      ... what exactly is The Problem Here? ...

      Too many italics

      Despite the name is PCRE ... not compatible , I recommend here reading Friedl's book about Mastering Regular Expressions.

      JS' RegEx is based on Perl4, so plenty of things must be excluded.

      eLISP RegEx predates Perl (i.e. old POSIX), many escapes are inverted - e.g. \( <-> ( - and since regexes are always strings, any slash must be additionally escaped from string interpolation (leading to so called "slasheritis")

      Hope the last 2 phrases were still readable for you.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        What people name REGEX comes in 3 main dialects:

        • POSIX basic
        • POSIX extended
        • Perl

        The above also have dialect/implementation variants.

        And there are the different REGEX-notations used in other sciences like linguistics, which Friedl AFAIR does not include in his book.

        Helmut Wollmersdorfer

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1061170]
Approved by 2teez
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2014-12-27 09:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls