[emacs] converting perl regex into elisp regex

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: [emacs] converting perl regex into elisp regex by doom (Deacon) on Sep 18, 2009 at 01:59 UTC
The short answer is "no", I can't claim to have gone very far with this problem. I can talk about the nature of the problem a little, though: the trouble with elisp regexps is magnified by the way elisp strings work. There are two separate design decisions that interact very badly with each other. The regexp design decision: emacs uses an older style of regexps where these are not special characters: "(", "\|", ")". If you want them to have what would be their usual meaning in perl regexps, you need to escape them, i.e. `"$"`, `"\\|"`, `"$"`. The string design decision: just as with perl, the backslash is used to escape characters to get a special meaning, e.g. `"\t"` means a tab. Strings are delimited by double-quotes, so if you want a double-quote in a string, you escape it: `"\""`. And if you want a backslash to just be a backslash, then you double it: `"\\"`. And in elisp regexps are stored as strings. So if you want to capture some text within double quotes, the regexp might be `"$.?$"`, but the string would be `"\"\$.?\$\""` And you can't even follow a simple rule like "double-up all the backwhacks when you put a regexp in a string", because that fails with something like the aforementioned tab code. This is a tab: `"\t"`, but `"\\t"` is a backslash, followed by a "t".	[reply]
Re^2: [emacs] converting perl regex into elisp regex by doom (Deacon) on Sep 18, 2009 at 02:04 UTC
Some emacs code to start with: (defun perlish-fix-regexps (regexp) "Simple translation of a perlish REGEXP to an emacs one." (let ( (new-pattern regexp) ) (setq new-pattern (replace-regexp-in-string "(" "\\\$" new-patter +n)) (setq new-pattern (replace-regexp-in-string ")" "\\\$" new-patter +n)) (setq new-pattern (replace-regexp-in-string "\|" "\\\\\|" new-patter +n)) (setq new-pattern (replace-regexp-in-string "\\\\\"" "\"" new-patt +ern)) new-pattern)) (perlish-fix-regexps "(.?)") (perlish-fix-regexps "(ha\|ho)") (perlish-fix-regexps "\"[ \t](.?)[ \t]\"") (defun perlish-match (string pattern) "Apply the perlish PATTERN to STRING, returns capture from first gro +up of parens." (let ( (emacs-pattern (perlish-fix-regexps pattern)) (found "") ) (if (string-match emacs-pattern string) (setq found (match-string 1 string)) ) )) (perlish-match "ha, ha, ho, ho!" "(ha\|ho)" ) (perlish-match " \" quote \" " "\"[ \t](.?)[ \t]*\"") [download]	[reply] [d/l]
Re^2: [emacs] converting perl regex into elisp regex by LanX (Saint) on Sep 18, 2009 at 02:53 UTC
Hi Joe! I know most of this and just experimented a little bit, it's a hack but I thinks it's a good start 8) Of course substituting \\ as \0 is only a temporarily solution ... $\="\n"; #--- flags my $flag_interactive; # true => no extra escaping of backslashes my $RE='\w*(a\|b\|c)\d\('; $RE='\d{2,3}'; print $RE; #--- hide pairs of backslashes $RE=~s#\\\\#\0#g; #--- toggle escaping of 'backslash constructs' my $bsc='(){}\|'; $RE=~s#[$bsc]#\\$&#g; # escape them once $RE=~s#\\\\##g; # and erase double-escaping #--- replace character classes my %charclass=( w => 'word' , # TODO: emacs22 already knows \w ??? d => 'digit', s => 'space' ); my $kc=join "\|",keys %charclass; $RE=~s#\\($kc)#[[:$charclass{$1}:]]#g; #--- unhide pairs of backslashes $RE=~s#\0#\\\\#g; #--- escape backslashes for elisp string $RE=~s#\\#\\\\#g unless $flag_interactive; print $RE; [download] Do you see any problems? Cheers Rolf	[reply] [d/l]
Re^3: [emacs] converting perl regex into elisp regex by LanX (Saint) on Sep 18, 2009 at 03:30 UTC
this version translates your example well, IMHO there are only the two mentioned TODOs left to be covered. `/usr/bin/perl -w /tmp/plre2el.pl Perlcode: "(.?)" Elispcode: \"\$.?\$\"` [download] use strict; use warnings; # version 0.2 $\="\n"; #--- flags my $flag_interactive; # true => no extra escaping of backslashes my $RE='\w(a\|b\|c)\d\('; $RE='\d{2,3}'; $RE='"(.?)"'; print "Perlcode: $RE"; #--- hide pairs of backslashes $RE=~s#\\\\#\0#g; # TODO check for suitable long "hidesequence" instead of a simple \0 #--- TODO normalisation of needless escaping # e.g. from /\"/ to /"/, since it's no difference in perl but might +confuse elisp #--- toggle escaping of 'backslash constructs' my $bsc='(){}\|'; $RE=~s#[$bsc]#\\$&#g; # escape them once $RE=~s#\\\\##g; # and erase double-escaping #--- replace character classes my %charclass=( w => 'word' , # TODO: emacs22 already knows \w ??? d => 'digit', s => 'space' ); my $kc=join "\|",keys %charclass; $RE=~s#\\($kc)#[[:$charclass{$1}:]]#g; #--- unhide pairs of backslashes $RE=~s#\0#\\\\#g; #--- escaping for elisp string unless ($flag_interactive){ $RE=~s#\\#\\\\#g; # ... backslashes $RE=~s#"#\\"#g; # ... quotes } print "Elispcode: $RE"; [download] Cheers Rolf Please note: xemacs knows "Raw Strings" where escaping is not neccessary, but I doubt that normal Perl RE syntax can be used from within Gnu Emacs Lisp because of the quoting problem, so the conversion can in general only be done with perl! UPDATE: -TODO: Just noticed that i still need a special treatment for escape sequences like \t and \n. I'll add this tomorrow. last version re_pl2el.pl Read more... (2 kB)	[reply] [d/l] [select]
Re^4: [emacs] converting perl regex into elisp regex by u65 (Chaplain) on Jul 27, 2016 at 15:08 UTC
Re^5: [emacs] converting perl regex into elisp regex by LanX (Saint) on Jul 27, 2016 at 15:40 UTC
Re: [emacs] converting perl regex into elisp regex by Anonymous Monk on Sep 18, 2009 at 01:10 UTC
Have you looked at YAPE::Regex ?	[reply]
Re^2: [emacs] converting perl regex into elisp regex by LanX (Saint) on Sep 18, 2009 at 03:58 UTC
well the first thing I found and checked was Regexp::Parser which actually is the successor of YAPE::Regex by the same author. But it looks quite complicated ... ...OTOH if my own transformation with some regexes work, chances are good to port them easily to elisp (or any other language). 8) Cheers Rolf	[reply]
Re^3: [emacs] converting perl regex into elisp regex by ikegami (Patriarch) on Sep 18, 2009 at 04:07 UTC
which actually is the successor of YAPE::Regex Why do you say that? ~~Neither module references the other~~, and YAPE::Regex was updated two years after Regexp::Parser. Whether one is a replacement for the other or not, that means that only YAPE::Regex has any chance of being able to parse Perl regex patterns (as they exist today).	[reply]
Re^4: [emacs] converting perl regex into elisp regex by LanX (Saint) on Sep 18, 2009 at 04:12 UTC
Re^5: [emacs] converting perl regex into elisp regex by ikegami (Patriarch) on Sep 18, 2009 at 04:14 UTC


more useful options
	PerlMonks