|Keep It Simple, Stupid|
Re: Socket Programmingby etcshadow (Priest)
|on Jan 07, 2004 at 06:26 UTC||Need Help??|
The internet, as defined by IP (internet protocol) is actually a crazy, complicated, unreliable thing. The basic way that it works is by carrying bundles of data (called packets) around. Basically, all that a packet is is a sort of digital postcard. It has a from address and a to address, and room to write in some "payload" (the information you want carried from here to there).
Now, these packets are NOT guaranteed to arrive where they're going. They are NOT guaranteed to arrive in order if and when they get where they are going. They are not even guaranteed to contain the same data on arrival as they did upon sending. Holy crap!
TCP is a technology built ON TOP of IP. TCP stands for Transfer Control Protocol. TCP presents an abstraction that looks like a pipe. That is, you send a stream of data in, and you get the ssame stream of data out. This is actually performed through a complicated mechanism called the TCP stack, which I won't explain in detail... but basically consists of: breaking your data stream up into little bunches of data, numbering those bunches of data, and putting a "checksum" on each bundle of data (so that the receiver can verify that it didn't get smudged in transit). Then each side keeps track of which numbered bundles have been sent and which have been received, and retransmits any that got lost in route, and ignores any duplicates of the same, and so on and so on. The packets that TCP passes over IP correspond to these bundles of data, and the control data that goes with them (that is, the checksum, and the sequence number (as well as the "port" number... which is used to keep track of one tcp session out of possibly several going on at the same time on a particular computer)), as well as some extra packets that are used to do things like initiate the conversation, shut it down, and acknowledge recepit of packets or ask for retransmits.
Anyway, what this means to you, when using a TCP socket is: it behaves, to your program, in almost exactly the same manner as a pipe would. So, for example, whether the data is sent imediately through the pipe/socket or sent later (after spending some time in a buffer) is dependant on wheteher the socket's autoflush property is turned on. By default, perl sockets have autoflush turned on (since version 1.18 of IO::Socket).
I would urge you not to worry about packets when dealing with TCP... that is the beauty of it, somebody else did all of the worrying about packets, so you don't have to... you worry about a stream, which is a much easier concept to program to. The truth is that it is possible for "hello" to be split into two or even five packets (although it is most likely that it would all go in one). You haven't got much real control over that, and you shouldn't worry about it. In fact... even if the socket is auto-flushing, it's possible (but unlikely) that "hello" and "world" could end up sharing the same packet, because tcp stack is sending data out slower than your program is sticking data in.
What it comes down to is: don't worry about the packets... you should frame whatever conversation you are carrying out over TCP such that it is clear which side of the conversation is doing what and when. Think about having a conversation on a walky-talky, at any given time one of you is talking and one of you is listening... if you both try to talk at the same time, or if both of you think the other guy is the one that should talk next... it all goes to hell. This, for example, is roughly what http looks like:
you to server: hi there, please send me /index.html. *over*
Of course, you have to be careful to treat the magic words *over* and *over-and-out* specially... much like how you have to treat the quote character specially when inside of a quoted string (so you don't confuse *meaning* a quote, from *using* a quote for the purpose of marking the beginning and end).