Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Mixing sysread() with <FILEHANDLE>?

by wanna_code_perl (Pilgrim)
on May 26, 2008 at 18:30 UTC ( #688544=perlquestion: print w/ replies, xml ) Need Help??
wanna_code_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl experts. I'm writing a Perl client/server program that operates over a remote shell (SSH) with a simple line-based text protocol. However, I need to also transmit blobs of binary data.

The client uses open2(*Reader, *Writer, "ssh ... my_server") to get a Reader and a Writer filehandle to the remote server.

The server simply reads <STDIN> and writes to <STDOUT>

This setup works fine for the text-based parts of the protocol.

I am not sure of the best way to do the binary transfer from client to server. I tried using syswrite() from the client and sysread() on the server (basically) as like this:

Client: Sends text "----- Chunk of length $len -----\n" Server: Receives above line OK. Parses correctly and calls sysread(STDIN, $buf, $len);

Client: Sends exactly $len bytes with syswrite(Writer, $data, $len)

Server: sysread() blocks indefinitely

If I send an extra "\n" after my chunk of data, sysread() unblocks, but I am concerned that if my binary data contains newlines that this may cause other problems.

I realise it is ill-advised to mix sysread()/syswrite() with other (buffered) IO types, and that this is part of the problem. Any suggestions on either how to make this work, or a better design?

Thank you!

Comment on Mixing sysread() with <FILEHANDLE>?
Download Code
Re: Mixing sysread() with <FILEHANDLE>?
by ikegami (Pope) on May 26, 2008 at 18:54 UTC

    read is compatible with <>, but neither is compatible with sysread.

    Without careful use of flushing, print and syswrite aren't compatible. Use only one or the other to avoid problems.

    You may need to play with the layers if you wish change the LF ⇔ CRLF conversion settings or change the encoding/decoding settings.

    Finally and most importantly, you are Suffering from Buffering.

Re: Mixing sysread() with <FILEHANDLE>?
by pc88mxer (Vicar) on May 26, 2008 at 18:58 UTC
    I would make sure you are not mixing buffered and unbuffered I/O on the same file handle. That is, the client should not use both print and syswrite, and the server should not read from STDIN using readline (or equivalently <STDIN>) and sysread - it will just cause you a lot of problems.

    To solve your binary data problem, how about implement a simple encoding scheme?

    sub write_binary { my $fh = shift; my $data = shift; print $fh unpack("H*", $data), "\n"; } sub read_binary { my $fh = shift; my $line = <$fh>; chomp; return pack("H*", $line); }
    The point is that interaction with a network server is done via sending and receiving messages, and so it's natural to have subroutines or methods to perform those functions:
    my $server = ...; $server->send_message("Hello"); ... $msg = $server->receive_message(); ...
    Encapsulating the interaction this way allows you to easily later change how the message sending is performed (use of syswrite or print, choice of encoding, etc.) Moreover, you'll need this encoding layer if you ever want to transmit anything more complex than a simple octet string. So if you want to send two data values, or an array of values or even Unicode code-points, you'll need an encoding layer.

      Thank you, this is very helpful.

      I would indeed prefer not to have to use syswrite. Encoding with pack makes sense, but is there a more space-efficient way to portably transfer the data? Even with uuencoding pack("u*"), there is a 1/3 increase in required bandwidth. (True, SSH compression would mostly negate that if it is enabled, which I can not guarantee).

      I need to transmit blobs of raw binary data, up to several gigabytes in a single session hence I do need to optimize for size. For my application, it's the raw bits that are important and need to be preserved. I will never be transmitting any higher-order structure.

      Thank you for the advice about encapsulation as well. I have already done that. Now I just need a transmission method that works. :)

Re: Mixing sysread() with <FILEHANDLE>?
by sasdrtx (Friar) on May 26, 2008 at 23:12 UTC

    This sounds a lot like you're trying to re-invent FTP (SFTP to be more precise). If using FTP isn't workable for your scenario, you could take a hint from how it opens up a separate session to transmit files.

    Passive mode FTP is a good model... one side or the other opens up a socket on a random port, then tells the other side what it is. Connect for one blob, and have no worries about misinterpreting data.


    sas

      Truly, I don't want to re-invent FTP. The key here is that the only channel I can rely on is SSH, and it is going to be way too much overhead to open up separate data channels for each binary object.

      Objects vary in size from a few bytes to about 1GB, and there could easily be thousands in a single session.

      I guess I could open up one extra data channel when the original connection is initiated, and then use syswrite/read on that, approximately like this:

      Control channel:

      C: STORE Name=<name> Content-Length=<length> MD5=<hash> S: OK, go ahead

      Then the client transmits the object in a series of syswrite() calls to the separate SSH. The separate process on the server would do sysread() only.

      However, I would prefer to in-line it in one channel, to avoid the complexity of the extra connection and extra processes/threads. HTTP transmits a mixture of text and binary data pretty readily through a single socket.

      I would definitely not use the FTP model. It just brings in too many operational issues (firewalling, security, etc.) Moreover, you have to make a separate TCP connection for each 'blob', and that will easily eliminate any speed-up gained by not having to encode the data.

        Security and firewalls are a concern for all TCP/IP communication; FTP isn't secure, but that's not relevant here. Given the use of SSH, SFTP could be used, which should add no security concerns.

        As for using FTP as a model, the key point is having a different session for the binary data. As for performance, I don't think you can assume that a new session per blob is going to be a major factor. It depends on the average size of those blobs. Anyway, as the original poster noted, a single separate data session would likely serve his purpose.


        sas

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://688544]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2014-07-11 01:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (217 votes), past polls