Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Perl: the Markov chain saw
 
PerlMonks  

Re: Converting a parallel-serial shell script

by kubrat (Scribe)
on Sep 22, 2008 at 13:46 UTC ( #713019=note: print w/ replies, xml ) Need Help??


in reply to Converting a parallel-serial shell script

I would advise against using Threads to solve your problem. There's nothing to gain from using threads since the problem you describe is disk bound and computationally intensive. Forks, on the other hand, are bit simpler to use and have less potential gotchas.

You could use SystemV semaphores to serialize the bulk loading part of the code, which should be available on any Posix complaint system including Windows, although I have never tried using SystemV semaphores on Windows before.

To be honest I am not convinced that you need semaphores at all. What if somebody or something else starts doing bulk upload while your script is running? Instead, in every forked process I would try to do the bulk load and if it fails go into sleep for 5 seconds and then try again. And I would keep trying until the bulk load succeedes.


Comment on Re: Converting a parallel-serial shell script
Re^2: Converting a parallel-serial shell script
by BrowserUk (Pope) on Sep 22, 2008 at 14:02 UTC

    Perhaps you would enlighten us by posting a forking solution to the OPs problem that's simpler than this?

    And for bonus points, you could try making it also be scalable, portable and efficient?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I think you have failed to see the motivation behind my post. I just wanted to share my thoughts and experience in approaching this type of problem. And that is why I haven't given a code example. It is probably my fault because of the way I have expressed myself but Corion seems to have got it right.

      Your solution proves me wrong. It is neat and elegant and I really like it. But I still think that I make good points when considering the problem of parallelization in more general terms.

      Finally, you could perhaps shed some light on how what I am talking about is not scalable - after all you could fork as many processes as you need. Portable? I am not sure how portable fork and semaphores really are. Though, fork() works for me on Windows with ActivePerl, it appears to be using threads behind the scenes, so does that mean that you the speed benefits of threads without the disadvantages of having to be careful with shared data? Efficient? I don't think that there will be a noticeable difference between a forking and a threading implementation.

        Finally, you could perhaps shed some light on how what I am talking about is not scalable ... Portable? .... Efficient?

        I'm not for one moment going to suggest that a forking solution, where (native) forks are available, couldn't be just as scalable and efficient. Or even moreso. But, until you have implemented such a solution, you will not be aware of how hard it can be to make it so. And you won't truely appreciate the simplicity of threaded version, until you see the complexity of the forked version.

        And when you post your solution, we can try running them on *nix and windows and compare them for those 3 criteria.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Converting a parallel-serial shell script
by Corion (Pope) on Sep 22, 2008 at 14:11 UTC

    Thanks for your advice. I know that my problem is not disk bound as running four processes instead of one reduces the overall runtime to 25% of the serial runtime. I'm aware that forks() in principle are simpler to use if you don't have to pass information around, but in my case, I have information to pass around to the master process.

    As I'm the only user with administrative privileges on the database, I will be the only one doing bulk uploads and hence I want the bulk uploads to be serialized in the fashion I proposed. Having the conversion retry at fixed (or even at random) intervals creates the risk of flooding the machine with (infinitely) retrying programs, which I dislike.

      If you don't mind me asking what type of DB are you running? Is it Mysql, Oracle, ...?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://713019]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (13)
As of 2014-04-23 21:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (554 votes), past polls