ORG to POD translator

WARNING: the following code is a hack and doesn't meet many quality standards.²

Emacs's org-mode is very convenient to organize and transform documents.

So when asked to produce 2 articles for the German Perl magazine $foo I started sketching and outlining ideas in org-mode. But then I needed to produce a special POD format for printing

It was easier to hack the following script to just translate the markups I needed.

There are already ORG-Parsers on CPAN¹ but I needed a lightweight solution which allows to mix in some POD markups, and to DWIM-add some other markups to facilitate the creative process. It's far from being complete or error free, but fits my needs.

#!/usr/bin/perl
 
use strict;
use warnings;
use Data::Dumper qw/Dumper/;
 
 
my $outfile= my $infile= $ARGV[0];
$outfile =~ s/\.org$/.pod/;
 
my $f_in;
if ($infile) {
  print "* Processing $infile => $outfile\n";
  open $f_in , "<", "$infile" ;
  
} else {
  $outfile="/tmp/test.pod";
  print "* Processing __DATA__ => $outfile\n"; 
  $f_in=\*DATA;
}
 
open my $f_out, ">", "$outfile";
 
 
my $OUT;
 
my $r_begin_src='^\s*#\+BEGIN_SRC (\w*)\s*$';
my $r_end_src = '^\s*#\+END_SRC\s*$';
 
print $f_out "\n=encoding utf8\n\n";
 
#--- First pass
while(<$f_in>) {
  #--- Codeblock?
  if ( /$r_begin_src/ .. /$r_end_src/) {
    s/($r_end_src|$r_begin_src)/\n/;
    $_=" $_";                           # add indentation
    s#\s*Listing{(\w+)}#listing_grep($1)#gie;
    
  } else {
    #--- Heading?
    if (s/^(\*+)(\s.*)/"\n=head".length($1)."$2\n"/e ) {
      #--- Heading == "__DATA__" ?
      last if $2 =~ /^\s*__DATA__\s*$/; #  stop parsing
 
    } else {
      #--- Textbody
      s/^\s+(\S)/$1/;                   # delete indentation
      s/^\s+$/\n/;                      # delete empty lines
      convert_markup();
    }
  }
  $OUT.=$_;
}
 
listing_dump();
 
#--- Second pass
$OUT =~   s#Listing{(\w+)}#listing_ref($1)#gie;
 
#--- Output
print $f_out $OUT;
 
##--- Process pod-file
#my $do=`make.pl $outfile`; 
 
exit;
 
 
# ----------------------------------------
sub convert_markup {
  my $in=$_;
  my $out;
  my $notPOD=0;                         # flipflop
 
  #--- ignore POD markups
  for (split /([CBIEZ]<.*?>)/,$in){      
    if ( $notPOD ^= 1 ) {               # odd => not Pod
      #--- translate
      s#/(.+?)/#I<$1>#g;                    # / -> Italic
      s#\*(.+?)\*#B<$1>#g;                  # * -> Bold
#      s#_(.+?)_#I<$1>#g;                   # _ -> Underline
 
      #--- add markup DWIM    
      s#(?<!<)(\w+(::\w+)+)(?!>)# L<$1> #g; # -> L<Mod::ule>
      s#([\$\%\@\&]\w+)#C<$1>#g;            # -> C<$var>
    }
    $out.=$_;
  }
  $_=$out;
}       
 
 
 
 
# ----------------------------------------
# name and refrence listings by name 
 
# "LISTING{label}" in code -> incremented number
# "LISTING{label}" in text -> reference code listing
 
#  TODO:
#   * support org reference markup instead
#     like <<label>>  or (ref: label) 
#   * more checks for possible typos in labels
 
 
my %listing_nr;                        # Number hash
my $listing_c;                         # Counter
 
sub listing_dump {
  # check Listing hash
  print Dumper \%listing_nr;
}
 
sub listing_grep {
  my $name=uc(shift);                   # insensitif
  $listing_nr{$name} = ++$listing_c;  
  $name=qq{Listing $listing_c};
  $name= " "x(40-length($name)) .$name; # align right 
  return $name;
}
 
sub listing_ref {
  my $name=uc(shift);                   # insensitif
  $listing_c = $listing_nr{$name};
  warn "Listing $name unknown!" unless defined $listing_c;
  $name=qq{Listing $listing_c};
  return $name;
}
 
__DATA__
*  Example Org for testing
** heading 2
   Text in C<Path/path/path> might be /Italic/ or *Bold* and
   org-markup nested in POD-markup is I<*ignored*>.
   
  #+BEGIN_SRC Perl
  print("huhu") while(1);
 
 
  LISTING{huhu}
  #+END_SRC 
 
   bla bla ... and the code in LISTING{huhu} prints "huhu"
 
* __DATA__
this text will not be processed anymore
[download]

Cheers Rolf

Footnotes:

1) which I found too late like Org::Parser

2) DOGMA: Release_early,_release_often

Comment on ORG to POD translator Select or Download Code

Replies are listed 'Best First'.
Re: ORG to POD translator by MidLifeXis (Monsignor) on Apr 07, 2011 at 16:35 UTC
First, don't take these in any way as criticisms, but more as opportunities to make it process a larger subset of the org files out there. I really like that it is simple and does not have prerequisites, so it is able to parse simple org files without a large footprint. That being said, my org files are very seldom simple, and I rely heavily on data within drawers, properties (node, tree, and global), among other things. Additionally, you have started the ball rolling with some actual code. See lisp/org-exp.el, around line 2167 (barring any major updates) for a better pattern for the BEGIN_SRC line. It should allow an optional ':' after the BEGIN_SRC, and it is case insensitive. A link to Org::Parser is probably in order :-) Placing this code on something like github or the like might help to have more hands to help. That is all that I have at the moment. Hopefully I will be able to take a closer look at it later. Thanks for starting this moving. --MidLifeXis	[reply]
Re^2: ORG to POD translator by LanX (Saint) on Apr 07, 2011 at 17:08 UTC
Well I wouldn't mind if _you_ put it on git-hub! :) (this already took more time that I wanted to spend) I once started a thread "[emacs] converting perl regex into elisp regex", that could be of help for understanding the original format definition and parsing for updates in the lisp files. Those other features you want shouldn't be to difficult to parse, the question is what kind of POD should they produce? And how is the interface supposed to look like? If you have an agenda please define it. Cheers Rolf	[reply]
Re^3: ORG to POD translator by MidLifeXis (Monsignor) on Apr 07, 2011 at 22:33 UTC
The closest thing that I would have for an agenda would be to have a defined format for the org file itself. I would like to see the org community adopt a format for the org file, and a set of core functions (dealing with manipulating files, nodes, and properties) that behave in a defined fashion. Beyond that, it is basically up to the interface / library how it behaves based on the content of the file. Update: It looks like someone else may have some similar thoughts. --MidLifeXis	[reply]
Re^4: ORG to POD translator by LanX (Saint) on Apr 07, 2011 at 22:42 UTC
Re^5: ORG to POD translator by MidLifeXis (Monsignor) on Apr 07, 2011 at 22:59 UTC
Re^3: ORG to POD translator by MidLifeXis (Monsignor) on Apr 07, 2011 at 23:12 UTC
I see in rereading this that my comment about the source for the emacs regexp was misunderstood. I did not mean to imply that all of the other block types contained in that regexp should be parsed, I was only commenting on the possible formats that a "BEGIN_SRC" block could take. The link to Org::Parser was a reference to the unlinked reference (now linked in a footnote), not an indicator that you should use it in this script. The suggestion for github was to make it more readily able to have patches applied to it. --MidLifeXis	[reply]
Re^4: ORG to POD translator by LanX (Saint) on Apr 08, 2011 at 01:15 UTC
Re: ORG to POD translator by educated_foo (Vicar) on Apr 07, 2011 at 17:37 UTC
Cool! OT: Just as only perl can parse Perl, only org-mode can parse Org -- and Org is always changing. For simple documents, a simple script like this is great; for true Org translation, the way to go would be to export HTML or DocBook and work from there. IMHO, Org::Parser is doing it wrong: it's no longer simple, yet it will never truly parse Org.	[reply]
Re^2: ORG to POD translator by LanX (Saint) on Apr 12, 2011 at 13:08 UTC
>only org-mode can parse Org -- and Org is always changing. you're so right! I was pondering to migrate the regexes from org.el and org-exp.el when I noticed that emphasizes are allowed to spread 2 lines (actually the number of lines is customizable in "Org Emphasis Regexp Components") `/abc def/` [download] This wasn't clear for me since the fontlock in emacs doesn't always get it right. But it's exported to `<i>abc def</i>` [download] So a clean parser would at least need to read the customizations from emacs! Cheers Rolf	[reply] [d/l] [select]

Back to Cool Uses for Perl