Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

txt2docbook 3

by Maze (Sexton)
on Apr 16, 2007 at 19:08 UTC ( [id://610409]=sourcecode: print w/replies, xml ) Need Help??
Category: Text Processing
Author/Contact Info Chris Monahan aka Maze ForeverWatcher@googlemail.com
Description: this guesses the semantic structure from a text document, stripping the line endings and guessing where the paragraph breaks and headers should be. Good for processing Gutenburg 'plain vanilla ASCII' this is version 3 of txt2docbook, the obfuscated beyond repair one
#!/usr/bin/perl
use warnings;
open FH, $ARGV[0] || die("$!");
print '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC 
+"-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/doc
+book/xml/4.1.2/docbookx.dtd">';
print "\n<article>\n";
print "hello";
$marker = 0,$isheader = 1;
while(<FH>){
    do {
    print "\t<title>$line</title>\n" if $isheader;
    print "\t<para>$line</para>\n" unless $isheader;
    $line = undef;
    $isheader = undef;
    $marker++;
    next;} if ($_ eq "\n" && $marker == 0);
    $isheader = 1 if ($_ eq "\n" && $marker == 1);
    $marker = 0 if $_ ne "\n" ;
    chomp;
    $line = $line.$_ if $line;
    $line = $_ unless $line;
}
print "\n</article>";

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://610409]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-29 15:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found