Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^4: Perl threading stability?

by guice (Scribe)
on Jul 25, 2005 at 21:05 UTC ( #477967=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Perl threading stability?
in thread Perl threading stability?

When thinking threading, I always viewed it in the eyes of user threading like in Java. Granted java is memory intensive, it still is efficient at the code level.

Please describe your application in some detail. Ithreads can be used to accomadate many uses, but it does require that you understand both their advantages and limitations.

Right now the application is built using forks. Each fork spawns off a DBI connection of it's own and reads in data from a text file based off the system it's currently updating within the database (data to update is gotten remotely by another script).

The restrictions I currently have and trying to fix is data sharing. Right now when I get an error, I dump it to a temp file and then read in the temp file at the end of the processing. This is possibly one of the most efficient ways of doing it due to the size of the load I have I still rather find something a bit more "less problemantic".

Hashes storing data like server hostnames and a hostname changes. I have a DB loaded hash containing all "active" systems. I need the ability to "change" that has in the case of a hostname change.

I'm also wanting to build an XML file containing human entered discription data for each system (right now it uses bunches of little name=val strings--one file per server). 300+ systems, while a single file might be too much, it's definatly too much for threads. Either case, when a hostname, etc, changes on a system, I need that change updated within the XML hash as well for dumping at the end of the script execution.

-- philip
We put the 'K' in kwality!


Comment on Re^4: Perl threading stability?
Re^5: Perl threading stability?
by BrowserUk (Pope) on Jul 25, 2005 at 21:36 UTC
    When thinking threading, I always viewed it in the eyes of user threading like in Java.

    The main limitation of java-style user threads is that there is no true concurrency. No matter how many cpus or cores the machine has, a Java app will only utilise one of them at a time no matter how many user threads it spawns. Therefore, you can never scale the app by moving it to a multi-cpu system.

    That same limitation is what allows Java threads to be very lightweight and efficient. As only one thread is ever truely running at any given moment, many of the consideration for locking required by apps utilising kernel threads simply do not arise.

    It's a swings and roundabouts argument. What you gain in one place, you loose in another.


    Your 'spec' is still pretty sketchy, but if I'm reading between the lines correctly, I would say that what you want to do is eminently doable with iThreads.

    To be sure, I would need further details including the number of systems you are talking to concurrently; the nature of the data you are sharing etc.

    Not only does it sound doable, it sounds pretty straight forward.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Java will use multiple CPUs for multiple threads depending on the platform and the thread library. The JVM may run multiple Java threads on top of a single kernel thread, but all the thread libraries I know about will do multiple kernel threads. My impression was that recent JVMs on Linux run directly on the pthread library and each thread is a separate kernel thread. The kernel threads on Linux are efficient enough that implementing threads in user space does not help.

        Indeed, it looks as though things have moved on--on Linux at least--from my last experience of using Java.

        About Linux Threads

        One major difference between developing on Linux from other Unix operating systems is the system threads library. In Java 2 releases prior to 1.3, the Java virtual machine uses its own threads library, known as green threads, to implement threads in the Java platform. The advantage here is that green threads minimize the Java virtual machine's exposure to differences in the Linux threads library and makes the port easier to complete. The downside is that using green threads means system threads on Linux are not taken advantage of and so the Java virtual machine is not scalable when additional CPUs are added.

        In Java 2 Release 1.3, the Hotspot virtual machine uses system threads to implement Java threads. Because Linux threads are implemented as a cloned process, each Java thread shows up in the process table if you run the ps command. This is normal behavior on Linux.

        ...

        In the above listing, the process ID 11712 shown in the left-most PID column is the invoked Java virtual machine. The other processes that show process ID 11712 in the PPID column have process ID 11712 as their parent process. These children to process ID 11712 are Java threads implemented by the Linux system threads library. Each Linux thread is created as a process clone operation, which leaves the scheduling of threads to be a task of the process scheduler.

        By comparison, on Solaris the Java threads are mapped onto user threads, which in turn are run on Lightweight processes (LWP). On Windows the threads are created inside the process itself. For this reason, creating a large number of Java threads on Solaris and Windows today is faster than on Linux. This means you might need to adjust programs that rely on platform-specific timing to take a little longer on startup when they run on Linux.

        I'd question your conclusion regarding the efficiency of pthreads, but as I do not have any great experience of them I can only base my judgements upon generic information. Having user threads map 1:1 to kernel threads seems like an ideal situation, but history shows that it has nearly as many drawbacks as advantages.

        Update: More generic information:

        Linux Thread Optimizations

        The first thing Java developers notice when running their application on Linux is that the ps command, used to display the list of processes, appears to show multiple copies of Java runtime environment running even though only one Java application was started.

        This is due to the implementation of the system threads library on Linux. Linux threads are implemented as a cloned process, that means each Java thread appears as a new Linux process. The advantage of this approach is that the threads implementation is simpler and stable, however the downside is that this also affects the performance of even a moderately threaded Java application on Linux.

        ...

        Update2: And yet another demonstration that one to one user/kernel thread mapping is invariably bad for performance.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Not only does it sound doable, it sounds pretty straight forward.

      I'm pretty sure it is pretty straight forward. My only concern is taking up too much resources. The current script (using max if 5 fork()'d children) already takes ~4-5 hours to run... Albeit some things aren't at it's most efficient, I just don't want it to take longer.

      From what I'm seeing with Perl threads is that it's really CPU/Memory intensive vs general fork(). That's a lot to ask for just for the ability to share data, my primary concern. Not to mention Perl's history of being flakey with Threads (although a post below has stated it's stablalized now).

      In the future I might see about writing this in Java, but at this time it's just not feasiable due to current data imports (standard Dumper dump of variables into a file). But that's not for at least 6+ months, until I can get XML inputs and data use stablized.

      If you want to chat more about specs, feel free to email me gp-at-gpcentre.net. I don't really want to clutter this post up with "my problem" per say. More use it to discusse perl threading in general.

      -- philip
      We put the 'K' in kwality!

        My only concern is taking up too much resources.

        With appropriate care, spawning a thread can take as little as 300/400k. It's just a case of ensuring that each thread only loads what it needs. Your main concern (reading between the lines still) is that sharing a large hash between many threads will cause large amounts of duplication--and your right, if you allow the default behaviour to take it's course, it will--but there are some fairly simple techniques for getting around that in most cases.

        It really comes down to only sharing what is required a given thread with that thread. Whilst your overall application may have need to retain a large volume of data in memory, for most applications, each thread only needs access to some small part of the overall dataset, at any given time.

        As for the spec, I'm really interested in seeing how iThreads can be used. I'll drop you a note and we can move on from there.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://477967]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (15)
As of 2014-07-25 13:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (172 votes), past polls