in reply to Re: Google's MapReduce
in thread Google's MapReduce

TC deals with computation in terms of functions. Anything with I/O is pretty much outside Turing's view. And you'll be doing a good deal of I/O in a distributed application like this. So just being TC isn't good enough.

Just how much I/O you'll be doing depends on your application. There are some problems that could get sufficient bandwidth by having an intern load data off a floppy. Others are going to need high-speed fiber optic connections in order to keep up. Some problems are going to be just plain slower than doing it on a single machine.

In any case, you could certainly do this with Perl. Would it be useful? If the application's bottleneck is I/O, then Perl would probably be a viable choice. However, good candidates for distributed systems are usually not I/O-bound. They're CPU-bound, like "take this DES-encrypted message and try decrypting it with keys x through x + 2**y, and let me know if any of them break the message". For something like that, you want a good number-cruncher language like C or FORTRAN.

"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Replies are listed 'Best First'.
Re^3: Google's MapReduce
by Anonymous Monk on Oct 27, 2004 at 23:58 UTC
    If the application's bottleneck is I/O, then Perl would probably be a viable choice.

    That depends on what kind of I/O we are talking about. There are various kinds of I/O, of which I will mention three:

    1. Network I/O. The least interesting category. If your application is bound by network I/O, there isn't much you can do except upgrade your network.
    2. Disk I/O. Interesting category, and which brings us in the realm of SANs, fibre-channel and multiple controllers. You're right that Perl might be a good for those applications.
    3. CPU-Memory I/O. Perl would absolutely suck for those kind of applications, as Perl is very memory hungry, and gives the programmer very little control over what is stored where. It uses gazillion pointers, storing stuff all over the place, resulting in a low cache hit ratio.