Maybe you're right. I added a tons of logging just next to each lock occurrence to see at which point my app fails. I found that each time it was crashing near calling this function:
sub UnshareHash {
my $reference = shift;
lock $reference if is_shared($reference);
given (ref $reference) {
when ('HASH') {
return { map UnshareHash($_), %{$reference} }
}
when ('ARRAY') {
return [ map UnshareHash($_), @{$reference} ]
}
when ('REF') {
return \UnshareHash($$reference)
}
default {
return $reference
}
}
}
I have a configuration object shared between threads which sometimes need to clone/unshare some part of it using the function above. I've changed the function to lock only the top-level structure:
sub UnshareHash {
my $reference = shift;
my $deep = shift;
lock $reference
if is_shared($reference) and not $deep;
given (ref $reference) {
when ('HASH') {
return { map UnshareHash($_, 1), %{$reference} }
}
when ('ARRAY') {
return [ map UnshareHash($_, 1), @{$reference} ]
}
when ('REF') {
return \UnshareHash($$reference, 1)
}
default {
return $reference
}
}
}
For now it looks promising: my app runs for about 40 straight hours now. Before that crash happened after few hours at most, sometimes after few minutes. But that may be just a coincidence, i'll have to wait some more time.
But if it happens to be true (i.e. UnshareHash() is the culprit) then i assume that recursive locking is the problem? That would be a bug in threads::shared, wouldn't it? |