Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Looking for alternative for IPC::Shareable (or increase size)

by DomX (Novice)
on Aug 05, 2020 at 20:55 UTC ( [id://11120361]=perlquestion: print w/replies, xml ) Need Help??

DomX has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks!
I'm having a hard time today. I almost finished my program with over 9000 lines, but have one problem:

Communication between my worker (forked unit) and master (GUI unit) has a size issue: (worker fills shared array, master reads content into not shared array and cleans shared array.)
Length of shared data exceeds shared segment size at ./myprogram.pl line 1210.
Buffer size is declared to be "65536". (No idea of unit, I think it's bytes.) This happens on one array element, when the containing string is longer than 65536.

So, some data (just a few, but even losing one is one too much lost!), just strings, are longer than 65536 characters.
threads/threads::shared is deprecated ("Prior to Perl 5.8, 5005threads was available through the Thread.pm API. This threading model has been deprecated, and was removed as of Perl 5.10.0." https://perldoc.perl.org/threads.html) I've started using IPC::Shareable. For small tasks really perfect, but too-long strings are the hell here!
While the masters array still has space to be filled, the IPC::Shareable array is overflowing with only one element.

What I want to know:
- is there a possibility to increase the size?
- is there an alternative function, which allows me to share (bit long) text data between processes, without writing to disk? (Writing to Sqlite would decrease speed on this step extremely)

If you don't know something like this, the only idea I have left is to make a Sqlite database "transfer", with just one column "data", and each row contains one string of my array. Many forks writing to it (slow...) and master reading and cleaning this table, just to prepare it to be saved in correct table and correct columns...

Replies are listed 'Best First'.
Re: Looking for alternative for IPC::Shareable (or increase size)
by hippo (Bishop) on Aug 05, 2020 at 21:55 UTC
    threads/threads::shared is depreciated ("Prior to Perl 5.8, 5005threads was available through the Thread.pm API. This threading model has been deprecated, and was removed as of Perl 5.10.0." https://perldoc.perl.org/threads.html)

    The passage you have quoted explicitly refers to 5005threads. This is different from threads which is not deprecated, merely discouraged.


    🦛

Re: Looking for alternative for IPC::Shareable (or increase size)
by ikegami (Patriarch) on Aug 06, 2020 at 04:30 UTC

    Threads are not deprecated. Threads are fully supported.

    The docs say that threads are discouraged. When pressed for an explanation, the reason given for that message was that ...threads are hard and the people in some IRC channel don't want to answer questions about threads?

    Ok, that might not be exactly right, but the answer I got back was clear a mud. Read it for yourself.

    What's relevant to me is this:

    As one of several people who have maintained the various threads modules over the years, I regret that I missed being part of the original discussion that lead to the inclusion of the 'discouraged' message.

    The fact is that threads work, they are maintained, and they currently do not have any bugs preventing their use. I acknowledge that not all Perl modules are thread-safe, but there is sufficient documentation to that affect in the POD.

    There's also a performance penalty to adding thread support to a Perl, but that penalty applies to whether you use threads or not, and it's pretty common for the system perl to have thread support, so you're probably already suffering from that penalty.

      Dear ikegami,

      this sounds to me, my first steps with IPC I made with threads, was already good? This is some years ago, and I think this would cause me nearly building the whole program new, to change back to this system... I'll consider it if nothing else is satisfactorily...
      perl ithreads are not actual light weight OS threads that share the same common name (nobody actually says "ithreads" referring to perl "threads" from what I've seen) and many people look at them merely because of this unfortunate naming choice. Maybe OP wants to look at Coro.

      https://metacpan.org/pod/Coro#WINDOWS-PROCESS-EMULATION

        They're a bit expensive to start up, and there's a cost to sharing data between them. Neither of those things are issues most of the time.

        Coro is a co-operative multitasking system. Possibly useful, but not exactly threads.

Re: Looking for alternative for IPC::Shareable (or increase size)
by jcb (Parson) on Aug 05, 2020 at 22:35 UTC

    I have always used pipe, Storable, fork, and Tk's fileevent mechanism for passing results back to a GUI task from various worker tasks. The tricky bit is that you need two "uplink" pipes: one in nonblocking mode that you can give to Tk and read short text reports, and one in blocking mode that you can use with Storable to pass results back from workers. Pipes work even within the same process, so you should be able to easily do something similar with threads. (With forked children and Tk, you must also take care that the children call CORE::exit rather than Tk::exit, which will cause the parent to exit. If you value your sanity, the forked children do not touch the UI.)

      Dear jcb,

      Storable looks pretty good, BUT: the locking mechanism is unsatisfactory. If more than one child is going to manipulate the array, it following could happen: child A lock_retrive array, change array (!), lock_store array. At position of ! another child B could do the same... This means, I'd have to build my own locking mechanism... (Yes, there can be more than one child at my application. ^^' )
      The other way I see combining it with the hint of "Anonymous Monk": Generating separate name spaces for each child, and tell the parent only the name of the space when done... The actual problem here would be how many stores may I create until I reach any storage limits? (Multiple IPC::Shareables are limited, too...)

      Anyways: It sounds faster than using database, so I surely will consider it. (Especially it seems to be implemented in Perl already.)

        The solution I use is for the child processes to feed array updates back to the master process, which applies them to the array. This has worked for me because the worker processes I have needed thus far have always had fully-defined tasks at the moment they are forked — the only communication needed is to report their results back to the master process.

        I have used this technique to maintain Tk GUI responsiveness while issuing network queries in the background, but the data to scrape is known when the process forks and the child only needs to parse a reply and pass the "important bits" back up.

      So, I tested Storable and my result is unsatisfactory! :'-(
      It also stops at 2^16 characters. And more: It doesn't even tell you about the lost data as IPC::Shareable does...
      If I'm going to split it, I don't need another module. Anyway thank you very much! Going to use special solution for special case: via database.

        That is very strange and I have not had that problem.

        This sounds like you are not properly handling the stored objects. I use a single status pipe, where a child reports some identifier and an indication that a binary object is available, and a "return" pipe for each child, where the Storable data is actually written. The status pipe is non-blocking and monitored with Tk's fileevent mechanism, while the "return" pipes are used in blocking mode. I switched to this approach after trying to pass the Storable data back on the status pipe and finding that non-blocking mode caused problems, not to mention that the status pipe is shared, so the messages returned on it must be kept short enough to fit in the OS buffers to avoid messages from different children being interleaved.

        perhaps have a look at Sys::Mmap and File::Map before using a disk-based DB just because it has locking mechanism.

Re: Looking for alternative for IPC::Shareable (or increase size)
by bliako (Abbot) on Aug 06, 2020 at 09:29 UTC
    There is no pre-set limit to the number of processes that can bind to data; nor is there a pre-set limit to the complexity of the underlying data of the tied variables[2]. The amount of data that can be shared within a single bound variable is limited by the system's maximum size for a shared memory segment (the exact value is system-dependent).

    From IPC::Shareable. So it seems you hit an OS limit.

    You think you want *unlimited* shared memory segments. That can't happen. Unless you specify a reasonable upper-bound and then try negotiating that with the OS. Or modify the logic of your program. Multiple shared segments to store a single string perhaps?

      When your messages exceed the 64K limit why not communicate them in parts. Message 1 of 6, 2 of 6 etc. each 64K in size. The receiver through some metadata will know how to put them together.

Re: Looking for alternative for IPC::Shareable (or increase size)
by NERDVANA (Curate) on Aug 06, 2020 at 02:08 UTC
    You say that you are copying the data out before processing it.... This sounds to me like what you really want is a pipe. The only reason to share memory between two processes is if you need to perform random access or use it more than once. If the data is large, you will also save memory by transferring it over a pipe rather than allocating megabytes of shared memory just to use as a temporary storage area.

    Perhaps the problem you were solving was that with shared mem you will never get deadlocked waiting on the other process to finish writing the data? For that, there are event libraries that run a callback when data is available on a pipe, and you can use that to keep the data moving in the background and then fire off a callback of your own when it is done.

      Dear NERDVANA,

      No, I actually need to be able to access this array from (sometimes many) different children. Pipes are insufficient, also because of IPC::Shareables lock mechanism of course I never needed to think more about being deadlocked... ^^'
Re: Looking for alternative for IPC::Shareable (or increase size)
by stevieb (Canon) on Aug 06, 2020 at 00:55 UTC

    Can you please show some code that reproduces what you want to achieve, but is failing?

    I received permission last year to push new updates to IPC::Shareable (amongst a couple of other memory-sharing distributions). I have numerous updates in the queue, but I haven't uploaded a new version yet.

    Perhaps if you could give me a couple of examples, it might intrigue and interest me enough to finish the work I was doing, and incorporate what you want at the same time.

    If I knew some context and was able to see some of the critical code, I might be able to recommend an existing solution to your dilemma even.

    I used, was granted author permissions on, and have modified a few modules that allow sharing of data/variables etc between processes, threads and even over the network.

      Dear stevib,

      I already discovered some strings with a length of 84_217 characters, which I try to insert into the IPC::Shareable array @sender, but I do this very often, but only on this I get lists of arrays, I can't predict of length...

      This is a snipped:
      my $allowed_childs = 10; <other code> ## Main update agent my @agency = (); # Agents (forks) my $recID = &gen_random(4); # IPC key as $transID, but o +nly visible for childs of the update agent # Separate transfer array for agents (sub-forks of fork) my $update_knot = tie(my @receiver, 'IPC::Shareable', $recI +D, { create => 1, mode => 0600, destroy => 1, size => IPC::Shareable::SHM_BUFSIZ(), #size => 131072 * 2, }); # Array to store what was transfered through @receiver, becaus +e IPC-tied arrays are very small my @storage = (); # Transfer child elements to @storage my $_sub_gatherer = sub { my $allowed_childs = shift; if ( &wait_children($allowed_childs, \@agency) ) { my $knot = tie(my @cleaner, 'IPC::Shareable', $recI +D); $knot->shlock; push(@storage, splice(@cleaner, 0, scalar(@cleaner) - +1)); $knot->shunlock; return(1); } return(0); }; <other code> # Get region dependend type ids foreach my $region ( @regions ) { my $agent = fork; # Parent if ( defined($agent) && $agent ) { push(@agency, $agent); &debug_out("get_region_types(): Starte +d download agent $agent"); #Time::HiRes::sleep(0.01); # WO +RK Maybe no longer needed because of dl_lock() #&{$_sub_gatherer}(sprintf("%.0f", $ma +x_forks * 0.5)); &{$_sub_gatherer}(1); # Report to main window $perc_count += 15 / scalar(@regions +); $ac_knot->shlock; $agent_carrier{progress_percent} = +$perc_count / &{$_sub_percent_base}(); $ac_knot->shunlock; } # Child elsif ( defined($agent) ) { srand(); my @region_market_types = &download +er(0, "/markets/$region/types/"); if ( @region_market_types ) { my $knot = tie(my @sender, 'IPC +::Shareable', $recID); $knot->shlock; push(@sender, encode_json(\@region +_market_types)); # Here the failure occures: Length of shared data ex +ceeds shared segment size at ./myprogram.pl line 1210. $knot->shunlock; } exit(0); } else { &debug_out( "update_types(): Can't fork", ); &pprop_exit(); } } # Wait for children &wait_children(1, \@agency); <other code> sub wait_children { my $left = shift; my $childs = shift; my $waiting = 0; if ( $left < 1 ) { $left = 1; } while ( scalar(@{$childs}) >= $left ) { $waiting = 1; Time::HiRes::sleep(0.1); @{$childs} = grep { kill(0 => $_) } @{$childs}; } return($waiting); }

        The man page for shmctl(2) doesn't give a way to resize an existing shared memory segment, so you can't expand it with any of the shared memory modules, since it's a limitation of the OS.

        It might be possible with mmap(2), but if other process have already mapped the memory area, then they would have to be notified of the change and unmap/map the memory again. This implies that it would have to be a file on disk so that other processes can unmap/map the same data.

        Otherwise, it's has to be something that copies the data between the processes.

Re: Looking for alternative for IPC::Shareable (or increase size)
by Anonymous Monk on Aug 06, 2020 at 00:35 UTC
    The essential wisdom is this: that the sender should first store the complete message into a store that both of the parties have equal access to, then send only "the name of it" as the actual message.
      This isn't "wisdom", it's a desperate yet lame attempt by someone to establish some shred of relevance. Give it up, man.
      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11120361]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-06-14 10:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.