I want to pick up on the serialisation point.
In my work, lots of tasks are of the form "read, transform, transform, transform, write" on quite large datasets. The read and write are always I/O bound, but the transforms can be cpu bound. I'd like to run the subtasks in parallel, to speed up execution, but it isn't worth forking processes for the transformations - the cost of serialisation/deserialisation is too high.
What would help is a threading model like fork, except instead of standard i/o channels the threads/processes would be able to expose native datastructures for direct read and/or write by their siblings.
Is there any facility like this in existence?
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.