Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: docx to html conversion.

by huchister (Acolyte)
on Feb 27, 2013 at 20:46 UTC ( #1020958=note: print w/ replies, xml ) Need Help??


in reply to Re: docx to html conversion.
in thread docx to html conversion.

I've tried use unoconv again to convert docx into html, the final product was not really desirable, but was able to produce docx into html with couple tweeaks.
If anybody needs reference, below is the example code / unix line I've worked with.

`unoconv --stdout -f html "$docxfileloc" > "$htmfile"`; my $t = HTML::TreeBuilder ->new_from_file("$upload_dir/$htmfile"); my $body = $t->look_down(_tag => q{body}); my @content = $body->detach_content; #grep body my $html = $_->as_HTML for @content; #exclude <body>, </body> tag $html = decode_entities($html); #decode special characters

i.e, If its possible, use abiword for docx -> html. Output is better than unoconv, I just couldn't use it due to version compatibility issue.

Thank you for your replies and hope my solution help the other.


Comment on Re^2: docx to html conversion.
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020958]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-07-11 09:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (223 votes), past polls