Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re: FTP and checksum

by Elian (Parson)
on Oct 03, 2003 at 13:18 UTC ( #296217=note: print w/replies, xml ) Need Help??

in reply to FTP and checksum

As to bullet #2.... there's no way to check the correctness of the file without having a remote checksum to verify against. You can, however, if you have some control of the destination server issue a SITE command to calculate that checksum and compare it to your local version. (Which is a very good idea, and I'm glad you're considering doing it--TCP doesn't guarantee end-to-end error-free transfer)

Replies are listed 'Best First'.
Re: Re: FTP and checksum
by Rhys (Pilgrim) on Oct 03, 2003 at 13:37 UTC
    Ummm... no. Sequence numbers and packet checksumming in TCP *do* guarantee error-free transmission.


      No, in fact they do not guarantee error-free transmission. Ignoring the possibility of multibit errors that result in the same checksum with bad data (IP checksums aren't particularly rigorous, on purpose for speed reasons, and errors can result in bad data that still matches the checksum) the TCP checksum is a per-hop checksum, as routers may, and some do, recalculate and reset it when sending packets on to their next destination.

      Checksums are generally done as packets come into a router, on over-the-wire data, to validate the packet, and will note some (but not all, by any means) errors. Packets then hit router memory, and if the checksum is regenerated it's done against the in-memory copy. If this in-memory copy is corrupt, for example because you have a bad RAM cell, transient power issues, or just cosmic rays, the checksum will be generated against this now-corrupt data and there will be no way to detect, as part of the transmission, that the data has gone bad. ECC and parity memory, if the router has it, will catch some, but again not all, instances of this.

      This isn't theoretical. I know of cases where this has happened, and the only thing that caught the fact that the data was being corrupted in-transit by a router with bad memory was the fact that DecNET does do end-to-end checksumming of files transfered and it was yelling about bad data transimissions that the TCP streams didn't note.

      If the data is important enough to go to some effort to validate the destination copy, then there's also the non-zero possibility of some sort of man-in-the-middle data alteration.

      You can certainly argue that failures or attacks such as this are really, really unlikely. On the other hand, do you want a financial institution trusting that it won't happen when moving transactions against your bank account?

        The last part is indeed part of my argument. Not only are there checksums at multiple levels in most networks (Ethernet, TCP, perhaps others, depending upon the transmission method), but most modern networks have extremely low error rates, particularly if the admin has taken any care at all on the important links.

        Secondly, the TCP checksum is NOT calculated on a hop-by-hop basis. The TCP data should NEVER be modified on it's way across the network (modern QoS implementations notwithstanding). It is the layer 2 headers and checksums that are stripped and rebuilt at every hop. IP (layer 3), TCP (layer 4), and everything above that should not be changed from end-to-end.

        Thirdly, the layer 2 checksums are going to be checked at every switch across the network. Layer 3 checksums are going to be checked at every router across the network. Both of these AND the layer 4 TCP checksums are going to be verified at the receiving FTP server. While you argue that the possibility of a multibit error that would create the same checksum is non-zero, I would argue that the possiblity of a multibit error that would allow ALL THREE checksums to remain the same AND still cause the data to be accepted at the far end (e.g. the TCP sequence numbers, IP addresses, MAC addresses, etc. weren't part of what changed) is either zero, or so remote that it isn't worth discussing. At that level, the user is more likely to get struck by lightning, thus removing his or her concern about file integrity.

        In your example of the financial institution, I would expect the net admin to pay attention to the output of /sbin/ifconfig ethX and the statistics on the switches, looking for bad packets. I would also expect care to be taken that common things like duplex mismatches are not allowed to occur.

        Even further, other methods of checksumming, such as MD5 and even the use of 'diff' are not proof against the kind of multibit errors you're talking about. Even 'diff' does not (by default) do a byte-by-byte comparison between two files.

        Lastly, in the case of a financial institution, I would really expect them to be using SCP instead of FTP in the first place.

        If the network is considered unreliable, fixing the network is preferable to adding extra layers of data checking. If there is no way to consider the network reliable - perhaps for political or philosophical reasons - just transfer the file twice and do a byte-by-byte diff. If they don't match exactly, throw a warning and don't attempt another transfer until the issue is resolved.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://296217]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (1)
As of 2021-08-05 00:57 GMT
Find Nodes?
    Voting Booth?
    My primary motivation for participating at PerlMonks is: (Choices in context)

    Results (44 votes). Check out past polls.