Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

html2doc server connection dies early

by IdleResonance (Acolyte)
on Oct 23, 2005 at 02:16 UTC ( #502273=perlquestion: print w/replies, xml ) Need Help??
IdleResonance has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a server in Perl to run on a Windows system for POSIX clients. The server accepts HTML from incoming connections, returning a Word document for the response. It requires a Windows system with Office and Services for UNIX (Perl w/OLE), as well as socat or similar bi-directional socket tool on the clients. The server works about half the time. The rest of the time, the connection dies early without an error on either side. The output is the same whether or not the connection dies early.

On the Windows system:
C:\Documents and Settings\Administrator>\SFU\Perl\bin\Perl
Server startup in 9 seconds on tcp/7422
New connection
3 1024-byte input buffers processed
Open temporary c:\SFU\tmp\s8o.0.html
Write temporary c:\SFU\tmp\s8o.0.html.doc
24 1024-byte output buffers processed
Done with connection
New connection
3 1024-byte input buffers processed
Open temporary c:\SFU\tmp\s8o.10.html
Write temporary c:\SFU\tmp\s8o.10.html.doc
24 1024-byte output buffers processed
Done with connection

On the POSIX system:
$ cat input.html | socat STDIO TCP4:windows:7422 > output.doc
$ file output.doc
output.doc: Microsoft Office Document
$ cat input.html | socat STDIO TCP4:windows:7422 > output.doc
$ file output.doc
output.doc: empty
use IO::Socket; use Win32::OLE qw(in with); use Win32::OLE::Const 'Microsoft Word'; use Win32::OLE::Variant; use IO::Handle; use POSIX; ## This is a server that accepts HTML documents on port ## 7422, returning a Word document. As far as I know, it ## can only be run with ActiveState ActivePerl on Windows ## with Microsoft Word installed. Also, if using SFU, ## must run from C:\SFU\Perl\bin\Perl.exe, not the ## POSIX version at C:\SFU\usr\local\bin\perl. The POSIX ## version has no support for Win32::OLE. If anyone hacks ## OLE support into the POSIX version, let me know at: ## <>. my $server_port = 7422; my $current_time = time(); my $word; eval {$word = Win32::OLE->GetActiveObject('Word.Application')}; die "Word not installed" if $@; unless (defined $word) { $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit; }) or die "Cannot start Word"; } Win32::OLE->Option(Warn => 3); $server = IO::Socket::INET->new(LocalPort => $server_port, Proto => 'tcp', Type => SOCK_STREAM, Reuse => 1, Listen => 5) or die "Could not open server on tcp/$server_port +: $@\n"; print STDERR "Server startup in ", time() - $current_time, " seconds o +n tcp/$server_port\n"; while ($client = $server->accept()) { print STDERR "New connection\n"; ###TODO: GET REAL TEMPORARY FILENAME (this is hit-and-miss) my $file = "c:\\SFU\\tmp" . POSIX::tmpnam() . "0.html"; ###RECEIVE HTML FROM CLIENT open(IFILE, ">$file") or next; #TODO: log error my $i = 0; my $buf = ""; while (read($client, $buf, 1024, 0) > 0) { print IFILE $buf; $i++; } print STDERR "$i 1024-byte input buffers processed\n"; undef $i; undef $buf; close(IFILE); print STDERR "Open temporary $file\n"; #TODO: if verbose my $doc = $word->{'Documents'}->Open("$file") or next; #TODO: log +error print STDERR "Write temporary $file.doc\n"; #TODO: if verbose $doc->SaveAs("$file.doc", { 'FileFormat' => wdFormatDocument }); $doc->Close(); undef $doc; ###SEND DOC BACK TO CLIENT open(OFILE, "<$file.doc") or die "Could not open Office Document"; + #TODO: log error binmode(OFILE); my $buf = ""; my $i = 0; while (read(OFILE, $buf, 1024, 0) > 0) { print $client $buf; $i++; } print STDERR "$i 1024-byte output buffers processed\n"; undef $i; undef $buf; close(OFILE); ###REMOVE TEMPORARY HTML & DOC unlink("$file") or print STDERR "ERROR: could not delete $file\n"; unlink("$file.doc") or print STDERR "ERROR: could not delete $file +.doc\n"; $client->close; print STDERR "Done with connection\n"; } undef $word; close($server);

Replies are listed 'Best First'.
Re: html2doc server connection dies early
by pg (Canon) on Oct 23, 2005 at 02:36 UTC

    It could well be an illusion when yuo said that the connection died early. I noticed that you didn't binmode() the socket - $client. When you read the content of the word document, you did binmode() the file handler, but then you print the same content to a socket without binmode(). That is not correct.

    My guess is that the content get cut short because there was no binmode(), which then made you thought that the connection died.

      If this is the case, then why does it work on some connections? Also, I am fairly certain that it is dying early because I run Windows under VMware, so I can look at the output of both at the same time. When socat creates an empty file instead of a Word document, it ends during the server output about opening and writing the temporary files. Either case, the same output is printed on the console. Anyway, I just tried adding binmod($client) to the top of the while loop; worked twice, empty file twice. Thanks anyway.

        Generic, unsubstantiated possibility:
        The system may (or, may not) time out after identical wall_clock durations... and the time it takes for your connection to the source to complete its work may well vary.

Re: html2doc server connection dies early
by sgifford (Prior) on Oct 23, 2005 at 02:43 UTC
    It looks to me like asking Word to open the document is failing, here:
    my $doc = $word->{'Documents'}->Open("$file") or next; #TODO: log +error
    The debug output pretty much stops after that. I can't explain why it still prints Done with connection, though, unless the code you're actually running is slightly different than what you posted.

    I would add that error logging mentioned in the TODO, and also some error checking after you close statements (at least on Unix, some errors aren't reported until the file is closed).

      I am certain this is not where it is failing for me. The console always shows output generated past this line. I just copied in the file directly. I'll try to add some more logging though. Thanks.
Re: html2doc server connection dies early
by CountZero (Bishop) on Oct 23, 2005 at 15:08 UTC
    I don't have a similar set-up as yours so I cannot test your code and therefore I can only give some general pointers.

    • Add use strict; and use warnings; and change your code so it runs without complaints under these pragma's.
    • use indirect filehandles in the three-argument form of open: open(my $IFILE, ">", $file"); rather than open(IFILE, ">$file"). These filehandles close by themselves when they fall out of scope. Of course there is nothing wrong with explicitly closing a filehandle once you are done with it.
    • You declare lexical variables (which is Good), but then undef them only to redeclare them a few lines later. Probably you want to make sure that you start with fresh variables, but OTOH you initialize these lexical variables when you declare them! You could dispense with the undef and the second declaration and just keep the initialisations.
    • Test all open and close functions for errors. As a general rule, test everything which depends on an event outside your own script (this includes all OLE-calls!).
    • Check the file-length immediately before you start sending it to the client. If it is still zero-length, there was something wrong with the writing of the temporary word-file. Perhaps it was not yet flushed to the disk. Wait a moment and check again. If the file remains at zero length, the OLE-call to write the Word-file did not work.


    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      • will do
      • will do
      • With the undefs, I just didn't want a possibility of a carryover buffer of HTML sent back to the client on a zero-length file, but I should just re-initialize them.
      • will do
      • I haven't had any zero-length files on the server though. As far as the file-length is concerned, you can see in the server output that between 23-24K are read from the document and supposedly sent back to the client.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://502273]
Approved by monkfan
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2018-06-23 10:56 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.