Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

ORG to POD translator

by LanX (Chancellor)
on Apr 07, 2011 at 15:45 UTC ( #898097=CUFP: print w/replies, xml ) Need Help??

WARNING: the following code is a hack and doesn't meet many quality standards.˛


Emacs's org-mode is very convenient to organize and transform documents.

So when asked to produce 2 articles for the German Perl magazine $foo I started sketching and outlining ideas in org-mode. But then I needed to produce a special POD format for printing

It was easier to hack the following script to just translate the markups I needed.

There are already ORG-Parsers on CPANš but I needed a lightweight solution which allows to mix in some POD markups, and to DWIM-add some other markups to facilitate the creative process. It's far from being complete or error free, but fits my needs.

#!/usr/bin/perl use strict; use warnings; use Data::Dumper qw/Dumper/; my $outfile= my $infile= $ARGV[0]; $outfile =~ s/\.org$/.pod/; my $f_in; if ($infile) { print "* Processing $infile => $outfile\n"; open $f_in , "<", "$infile" ; } else { $outfile="/tmp/test.pod"; print "* Processing __DATA__ => $outfile\n"; $f_in=\*DATA; } open my $f_out, ">", "$outfile"; my $OUT; my $r_begin_src='^\s*#\+BEGIN_SRC (\w*)\s*$'; my $r_end_src = '^\s*#\+END_SRC\s*$'; print $f_out "\n=encoding utf8\n\n"; #--- First pass while(<$f_in>) { #--- Codeblock? if ( /$r_begin_src/ .. /$r_end_src/) { s/($r_end_src|$r_begin_src)/\n/; $_=" $_"; # add indentation s#\s*Listing{(\w+)}#listing_grep($1)#gie; } else { #--- Heading? if (s/^(\*+)(\s.*)/"\n=head".length($1)."$2\n"/e ) { #--- Heading == "__DATA__" ? last if $2 =~ /^\s*__DATA__\s*$/; # stop parsing } else { #--- Textbody s/^\s+(\S)/$1/; # delete indentation s/^\s+$/\n/; # delete empty lines convert_markup(); } } $OUT.=$_; } listing_dump(); #--- Second pass $OUT =~ s#Listing{(\w+)}#listing_ref($1)#gie; #--- Output print $f_out $OUT; ##--- Process pod-file #my $do=` $outfile`; exit; # ---------------------------------------- sub convert_markup { my $in=$_; my $out; my $notPOD=0; # flipflop #--- ignore POD markups for (split /([CBIEZ]<.*?>)/,$in){ if ( $notPOD ^= 1 ) { # odd => not Pod #--- translate s#/(.+?)/#I<$1>#g; # / -> Italic s#\*(.+?)\*#B<$1>#g; # * -> Bold # s#_(.+?)_#I<$1>#g; # _ -> Underline #--- add markup DWIM s#(?<!<)(\w+(::\w+)+)(?!>)# L<$1> #g; # -> L<Mod::ule> s#([\$\%\@\&]\w+)#C<$1>#g; # -> C<$var> } $out.=$_; } $_=$out; } # ---------------------------------------- # name and refrence listings by name # "LISTING{label}" in code -> incremented number # "LISTING{label}" in text -> reference code listing # TODO: # * support org reference markup instead # like <<label>> or (ref: label) # * more checks for possible typos in labels my %listing_nr; # Number hash my $listing_c; # Counter sub listing_dump { # check Listing hash print Dumper \%listing_nr; } sub listing_grep { my $name=uc(shift); # insensitif $listing_nr{$name} = ++$listing_c; $name=qq{Listing $listing_c}; $name= " "x(40-length($name)) .$name; # align right return $name; } sub listing_ref { my $name=uc(shift); # insensitif $listing_c = $listing_nr{$name}; warn "Listing $name unknown!" unless defined $listing_c; $name=qq{Listing $listing_c}; return $name; } __DATA__ * Example Org for testing ** heading 2 Text in C<Path/path/path> might be /Italic/ or *Bold* and org-markup nested in POD-markup is I<*ignored*>. #+BEGIN_SRC Perl print("huhu") while(1); LISTING{huhu} #+END_SRC bla bla ... and the code in LISTING{huhu} prints "huhu" * __DATA__ this text will not be processed anymore

Cheers Rolf


1) which I found too late like Org::Parser

2) DOGMA: Release_early,_release_often

Replies are listed 'Best First'.
Re: ORG to POD translator
by MidLifeXis (Monsignor) on Apr 07, 2011 at 16:35 UTC

    First, don't take these in any way as criticisms, but more as opportunities to make it process a larger subset of the org files out there.

    I really like that it is simple and does not have prerequisites, so it is able to parse simple org files without a large footprint. That being said, my org files are very seldom simple, and I rely heavily on data within drawers, properties (node, tree, and global), among other things. Additionally, you have started the ball rolling with some actual code.

    • See lisp/org-exp.el, around line 2167 (barring any major updates) for a better pattern for the BEGIN_SRC line. It should allow an optional ':' after the BEGIN_SRC, and it is case insensitive.
    • A link to Org::Parser is probably in order :-)
    • Placing this code on something like github or the like might help to have more hands to help.

    That is all that I have at the moment. Hopefully I will be able to take a closer look at it later. Thanks for starting this moving.


      Well I wouldn't mind if _you_ put it on git-hub! :)

      (this already took more time that I wanted to spend)

      I once started a thread "[emacs] converting perl regex into elisp regex", that could be of help for understanding the original format definition and parsing for updates in the lisp files.

      Those other features you want shouldn't be to difficult to parse, the question is what kind of POD should they produce?

      And how is the interface supposed to look like?

      If you have an agenda please define it.

      Cheers Rolf

        The closest thing that I would have for an agenda would be to have a defined format for the org file itself. I would like to see the org community adopt a format for the org file, and a set of core functions (dealing with manipulating files, nodes, and properties) that behave in a defined fashion. Beyond that, it is basically up to the interface / library how it behaves based on the content of the file.

        Update: It looks like someone else may have some similar thoughts.


        I see in rereading this that my comment about the source for the emacs regexp was misunderstood. I did not mean to imply that all of the other block types contained in that regexp should be parsed, I was only commenting on the possible formats that a "BEGIN_SRC" block could take.

        The link to Org::Parser was a reference to the unlinked reference (now linked in a footnote), not an indicator that you should use it in this script.

        The suggestion for github was to make it more readily able to have patches applied to it.


Re: ORG to POD translator
by educated_foo (Vicar) on Apr 07, 2011 at 17:37 UTC

    OT: Just as only perl can parse Perl, only org-mode can parse Org -- and Org is always changing. For simple documents, a simple script like this is great; for true Org translation, the way to go would be to export HTML or DocBook and work from there. IMHO, Org::Parser is doing it wrong: it's no longer simple, yet it will never truly parse Org.

      >only org-mode can parse Org -- and Org is always changing.

      you're so right! I was pondering to migrate the regexes from org.el and org-exp.el when I noticed that emphasizes are allowed to spread 2 lines (actually the number of lines is customizable in "Org Emphasis Regexp Components")

      /abc def/

      This wasn't clear for me since the fontlock in emacs doesn't always get it right.

      But it's exported to

      <i>abc def</i>

      So a clean parser would at least need to read the customizations from emacs!

      Cheers Rolf

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://898097]
Approved by Corion
Front-paged by Arunbear
[Your Mother]: Forgot to sign in as my sockpuppet, how embarrassing!
[1nickt]: pryrt I am creating a Type to check valid user IDs, which must be a whole number greater than or equal to zero. I would like to disallow 1.0 but because of this behaviour, by the time it is checked by the constraint, it *is* an Int.
[LanX]: should this be considered? Re: Parsing .txt into arrays
[LanX]: and this Re^4: Hash user input
[pryrt]: LanX, I would vote "keep" if it were considered: it's not helpful, but it could be taken multiple ways, not all of which are offenseive...
[pryrt]: 1nickt: if you are checking for valid user IDs, then I wouldn't care about the difference between 1.0 and 1: I would take either as a valid representation of the integer user ID#1
[LanX]: and this Re^2: extract column data
[Lady_Aleena]: I have two sub recurse { my ($directory, $other_var) = @_; my @files = file_list($directo ry); for my $file (@files) { if (-f $file) { do "stuff"; } if (-d $file) { recurse(" $directory/$file" , $other_var); } } } This was when I hated File::Find.
[tobyink]: 1nickt: your code?
[LanX]: pryrt: yeah, that's why I didn't consider, but the last >10 anonymous posts are from the same troll-person

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (14)
As of 2017-05-24 20:14 GMT
Find Nodes?
    Voting Booth?