Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^10: Thread Design help

by perlCrazy (Monk)
on Sep 11, 2010 at 10:05 UTC ( #859733=note: print w/ replies, xml ) Need Help??


in reply to Re^9: Thread Design help
in thread Thread Design help

Thanks for quick response. please see my comments
1. Connect : if datasever then DBI, if host then ssh or tcp
2. Process: remotely on the server if host.
if database then execute few sql query
3. How long will that processing run? >> for few server it might take hour, for few it will take less.
4. Get the data: How much data?
>> data will vary from dataserver to dataserver, depending on activities on server. it can be in KB but not more than 1-2 MB. Since we are planning to run very frequently so we can handle data properly.
5. Write to file:
Just read & write as is, or read, process locally and write?
>>read, process locally and then write.
6. Connect again may be after 1/2 hour: Exactly half an hour? Or as quickly as possible after all other servers have been serviced?
>> this we will decide, depending onnature od dataserver we can decide interval time.
7. What determines the frequency of reconnection? How important is the timing? Must it be done to the second, or is 'best endevours' good enough?
>>depends on interval time or we can decide the best way. idea is to collect data after every 10 minutes or 30 minutes, from each server
8. should run for many servers ( may be 1000s):
You don't yet know? How many 1000s?
>>This will keep growing in future. initially we are targetting for 1000.
Thanks


Comment on Re^10: Thread Design help
Re^11: Thread Design help
by BrowserUk (Pope) on Sep 11, 2010 at 10:34 UTC
    for few server it might take hour, ... idea is to collect data after every 10 minutes or 30 minutes,

    So, you going to collect data that takes an hour to query, every 10 or 30 minutes.

    depends on interval time or we can decide the best way.

    What is "the interval time"? Like, a value you choose to program? In which case, how is that different from "or we can decide the best way."?

    Reading between the lines, what I think you are saying is; "as often as possible"? If so, that is good, because it is very easy to program "as often as possible".

    But if it is imperative that server X be serviced every 10 minutes; and server Y every 30 minutes; and server Z every 19.27071 minutes; things get much, much more complicated.

    idea is to collect data after every 10 minutes or 30 minutes, from each server

    "the idea"? This sounds like a "suck thumb and wave finger in air" metric. And that is no basis upon which to make design decisions.

    So far, there is so much contradiction in your 'spec', it is pretty much impossible to make any real assessment of whether a single, multi-threaded program is a suitable way of tackling the problem. You'll need to be a lot clearer in specifiying the actual requirements, rather than speculative "it would be nice ifs".

    Another question: why are you considering threads rather than processes?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      thanks again and sorry for being not very clear in my reply.

      we want to collect infromation from each server in every 30 minutes. for time being we will not consider servers which takes >=1 hour
      We will have one group which will contain 30 data servers, if we process each server sequentially, might not be good idea. If we use thread then we can process parallely, execution will be faster and will be easy to manage.
      Why not considring process ?
      since forking many process( may be 100s) in one time and monitoring them will be complex. Also will consume lots of memory. But if you think forking is better option than threads, we can consider that option as well.

        Given your very vague specifications, I would very strongly consider using completely separate programs. Both, fork and threads will consume many resources and make debugging your program very hard.

        As you need to perform very different tasks (connect via ssh, connect via DBI, ...), putting all the code for these different tasks into one program makes little sense.

        Have one central program that starts the specific programs as separate children. Consider maybe Parallel::Jobs or simple open "$child |" to run your child processes in parallel.

        But before thinking about how to do things in parallel, I really, really urge you to first get things working in a serial fashion.

        Here's a very basic skeleton that will serve as a basis for you to tweak to do the job. It uses a prioritising subclass of Thread::Queue to 'schedule' the repeat jobs:

        #! perl -slw use strict; use threads; use Thread::Queue; use LWP::Simple; { package T::Q::O; use Data::Dump qw[ pp ]; require Thread::Queue; use threads; use threads::shared; our @ISA = 'Thread::Queue'; sub enqueue { local $^W; my $Q = shift; lock @$Q; for( @_ ) { push @$Q, $_; my $n = $#$Q; @{ $Q }[ $n, $n-1 ] = @{ $Q }[ $n-1, $n ], --$n while $n-1 and $Q->[ $n ] < $Q->[ $n - 1 ]; } cond_signal( @$Q ); } sub dump { my $Q = shift; lock @$Q; pp $Q; cond_signal( @$Q ); } } sub dbiFetch { my @args = @_; return getstore $args[ 0 ], 'nul'; } sub sshFetch { my @args = @_; return getstore $args[ 0 ], 'nul'; } sub tcpFetch { my @args = @_; return getstore $args[ 0 ], 'nul'; } sub worker { no strict 'refs'; my $Q = shift; while( my $work = $Q->dequeue ) { my( $time, $interval, $sub, @args ) = split $;, $work; sleep 1 while time() < $time; # printf "now: %.f time:$time int:$interval sub:$sub [@args]", +time(); my $result = $sub->( @args ); # print $result; $Q->enqueue( join $;, $time + $interval, $interval, $sub, @arg +s ); } } our $W //= 10; my $Q = new T::Q::O; my @workers = map async( \&worker, $Q ), 1 .. $W; while( <> ) { chomp; $Q->enqueue( join $;, time(), (map $_*60, 1,2,3 )[ rand 3 ], ( qw[dbiFetch sshFetch tcpFetch ] )[ rand 3 ], $_ ); } 1 while sleep 1;

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://859733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-09-23 21:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls