Chris Monahan aka Maze
ForeverWatcher@googlemail.com
Description:
this guesses the semantic structure from a text document, stripping the line endings and guessing where the paragraph breaks and headers should be.
Good for processing Gutenburg 'plain vanilla ASCII'
this is version 3 of txt2docbook, the obfuscated beyond repair one
#!/usr/bin/perl
use warnings;
open FH, $ARGV[0] || die("$!");
print '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC
+"-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/doc
+book/xml/4.1.2/docbookx.dtd">';
print "\n<article>\n";
print "hello";
$marker = 0,$isheader = 1;
while(<FH>){
do {
print "\t<title>$line</title>\n" if $isheader;
print "\t<para>$line</para>\n" unless $isheader;
$line = undef;
$isheader = undef;
$marker++;
next;} if ($_ eq "\n" && $marker == 0);
$isheader = 1 if ($_ eq "\n" && $marker == 1);
$marker = 0 if $_ ne "\n" ;
chomp;
$line = $line.$_ if $line;
$line = $_ unless $line;
}
print "\n</article>";