Re: Convert word(.doc) file to html file
by Corion (Patriarch) on Nov 26, 2003 at 11:12 UTC
|
Perl under Windows has the great capability of automating other programs through the Win32::OLE module. That way you can remotely control MS Word the same as with Visual Basic for Applications through the Office Object Model.
The easiest way to get a (Visual Basic) stub of what you want to do is:
- Practice what you want to do with the program manually
- Switch on the macro recorder
- Do what you want to automate one final time
- Stop the macro recorder
- Look at the recorded macros
- Convert the recorded Visual Basic macro to Perl
Conversion of the recorded Visual Basic macro to Perl is a fairly mechanical process, and what you need to know beyond that should be explained in the Win32::OLE documentation.
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The
$d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider
($c = $d->accept())->get_request(); $c->send_response( new #in the
HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
| [reply] [d/l] |
Re: Convert word(.doc) file to html file
by davis (Vicar) on Nov 26, 2003 at 11:29 UTC
|
It's not perl, and the results are not perfect, but WvWare has a program called wvHtml, which does exactly what you want.
cheers
davis
It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
| [reply] |
|
WORD -> wvText -> custom Perl -> HTML::FromText -> HTML Tidy
The results are still up over at the Nashville
Film Festival 2003 films page.
| [reply] |
Re: Convert word(.doc) file to html file
by falic (Beadle) on Nov 26, 2003 at 13:22 UTC
|
Use Win32::OLE, should allow you to open a word doc and save it as a HTML file.
Should be easy enough, along the lines of,
use Win32::OLE;
use Win32::OLE::Const 'Microsoft.Word';
my $Word = Win32::OLE->new('Word.Application', 'Quit');
my $Doc = $Word->Documents->Open($File);
$Word->ActiveDocument->SaveAs( { Filename => $HTMLFile, FileFormat => wdFormatHTML } );
$Word->ActiveDocument->Close();
$Word->Close();
Where $File is the Word.doc file and $HTMLFile is the Word.html file | [reply] |
|
I assume then that this can be automated if you supply all filenames etc in advance?
| [reply] |
Re: Convert word(.doc) file to html file
by wine (Scribe) on Nov 26, 2003 at 11:32 UTC
|
| [reply] |
Re: Convert word(.doc) file to html file
by EvdB (Deacon) on Nov 26, 2003 at 11:14 UTC
|
Word files are in a proprietary format. Your best bet is to open it in word and save it to HTML from there... If you need to do this to many files perhaps word can be scripted on your platform?
--tidiness is the memory loss of environmental mnemonics
| [reply] [d/l] |
|
Word will convert .doc files to HTML easily but there's a drawback.
Word has a tendency to throw a lot of junk into HTML files--I've seen webpages at my workplace that specifically request that nobody use Word to edit them because the author doesn't want to deal with the messes Word makes.
| [reply] |
Re: Convert word(.doc) file to html file
by thraxil (Prior) on Nov 26, 2003 at 17:06 UTC
|
have Word save as html as others have suggested, then, you'll probably want to run it through something like htmltidy to clean up the horrible markup that Word produces.
also, if you ever switch to openoffice, you can convert those documents to xhtml pretty easily.
| [reply] |
Re: Convert word(.doc) file to html file
by warthurton (Sexton) on Nov 26, 2003 at 21:52 UTC
|
| [reply] |