Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Challenge: Transforming markups

by LanX (Canon)
on Dec 06, 2013 at 21:22 UTC ( #1066059=perlquestion: print w/ replies, xml ) Need Help??
LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I need to transform or disable different (non-recursive!) markup-syntax-elements, in this case org-mode to kwiki.

Now links which are freely included in text are tricky...

for instance http://www.gmx.de or CamelCaseLink are valid links for kwiki ...

... and [[http://gmx.de][BlubBlub]] is a named link in org-mode.

As you can see are named-links allowed to include matches for http:://links or CamelCase words which mustn't be processed again. And links can occasionally include CamelCase words.

One approach is to OR the regexes with priority to the more complex ones, such that their matches are only processed once:

 s/ ($named|$http|$camel) / tranform() /gxe

Now in the substitution part it's tricky to know which regex matched, thats why I use named captures.

The following code demonstrates what I'm doing!

(please note that for simplicity of the example my only transformation is to return the name of the match.)

I have the impression to reinvent the wheel and fiddling with %- doesn't seem stable...

So please show me easier approaches... :)

use warnings; use strict; #= ordered hash my @patterns=( named => '\[ \[ (?<link>.*?) \] \[ (?<name>.*?) \] \]', http => 'https?://[a-zA-Z./]+', camel => '[A-Z][a-z]+[A-Z][a-z]+', blank => '\s+', unknown => '.+?', ); my %patterns=@patterns; #= build regex pattern my @regexes; my @names; while ( my ($name,$regex) = splice @patterns,0,2) { push @names, $name; push @regexes,"(?<$name>$regex)"; } my $regex = '(' . join ("\n|", map {"\n\t$_"} @regexes) . "\n)"; print "Pattern:\n$regex\n\n"; # = return which pattern matched sub transform { my @matching= grep { $-{$_}[0] } @names ; return "@matching\t=>\t $1\n"; } #= apply regex while (<DATA>){ chomp; print "Line:\n$_\n\n"; s/$regex/transform()/gex; print "Result:\n$_\n\n"; } __DATA__ http://www.gmx.de CamelCase/WikiLink [[http://gmx.de][BlubBlub]]

OUTPUT:

Pattern: ( (?<named>\[ \[ (?<link>.*?) \] \[ (?<name>.*?) \] \]) | (?<http>https?://[a-zA-Z./]+) | (?<camel>[A-Z][a-z]+[A-Z][a-z]+) | (?<blank>\s+) | (?<unknown>.+?) ) Line: http://www.gmx.de CamelCase/WikiLink [[http://gmx.de][BlubBlub]] Result: http => http://www.gmx.de blank => camel => CamelCase unknown => / camel => WikiLink blank => named => [[http://gmx.de][BlubBlub]]

Cheers Rolf

( addicted to the Perl Programming Language)

Comment on Challenge: Transforming markups
Select or Download Code
Re: Challenge: Transforming markups
by Anonymous Monk on Dec 06, 2013 at 22:17 UTC
    s/($named)|($http)|($camel)/$1?tansform_named():$2?transform_http():transform_camel()/gxe

    But that starts to get awkward if you have a lot of patterns to deal with.

    Putting your patterns in a hash might be a bad idea, because it randomizes the order, and you might end up with the unknown pattern first.

      Argh

      s/($named)|($http)|($camel)/$1?tansform_named($1):$2?transform_http($2):transform_camel($3)/gxe
        OK, but the problem here are nested capture groups.

        you can't rely on $2 or $3 to be correct if $named is something like [[ ($http) ][ ($title) ]]

        Cheers Rolf

        ( addicted to the Perl Programming Language)

      > Putting your patterns in a hash might be a bad idea,

      look at the code again, it's an array that preserves the order.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        In that case, it seems the hash isn't used.
Re: Challenge: Transforming markups
by pobocks (Chaplain) on Dec 07, 2013 at 19:08 UTC
    If you're looking to export org-mode specifically, it seems to me like it might be easier (and have more reuse value) as an org-mode exporter. It would probably be a bit more verbose, but it would also be more robust, and error messaging would be more likely to give useful output.
    for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";
      Thanks for pointing me to org-mode exporter, forgot about that! =)

      Unfortunately in this case I have to equally parse for kwiki constructs.

      And I need the approach to convert markups on numerous other cases...

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Re: Challenge: Transforming markups
by bigdogs (Novice) on Dec 09, 2013 at 16:01 UTC
    Nobody mentioned Parse::RecDescent yet, so here I am! This looks like a good opportunity to learn about it.
      > Nobody mentioned Parse::RecDescent yet, so here I am!

      Many people mentioned Parse::RecDescent , Marpa::R2 and Perl6::Rules in the past, but I suppose it's overkill if I have no nested ("descending") syntax to deal with!(?)

      > This looks like a good opportunity to learn about it.

      Great, do you wanna show me some sample code for my use case and explain why it's better that way?

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1066059]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (21)
As of 2014-07-25 16:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (173 votes), past polls