Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^14: Memory leak question

by SBECK (Chaplain)
on Oct 06, 2010 at 15:16 UTC ( [id://863811]=note: print w/replies, xml ) Need Help??


in reply to Re^13: Memory leak question
in thread Memory leak question

I think you just became the first person in the world who knows Date::Manip parsing internals better than myself!

Thanks for everything. I think this gives me enough information to track it down, though it's going to take me some time to really digest everything you said, but I think that this is enough to get me a long ways towards fixing the problem.

Replies are listed 'Best First'.
Re^15: Memory leak question
by SBECK (Chaplain) on Oct 06, 2010 at 16:51 UTC
    I just found out a little bit about the leak.

    Using the original Date::Manip code, there's a line in the _parse_datetime_iso8601 function which looks like:
    ($y,$md,$d,...) = @+{qw(y m d ...)};
    where I just matched on the regexp from _iso8601_rx. If I comment this line out (and just set $y,$m,$d to some static values), there's no leaking. Note that I STILL match the regexp, I simply never refer to the %+ hash.

    Unfortunately, I wasn't able to reproduce this in a simple test script, so I still need to investigate further, but I think this is an interesting result.
Re^15: Memory leak question
by SBECK (Chaplain) on Oct 06, 2010 at 19:06 UTC
    I was able to reproduce the leak in a trivial script, and I think that I'm down to the most basic illustration.
    $a = '(?<a>\d)'; $b = '(?<b>\d)'; $rx = qr/(?:${a}${b}|${a}:${b})/; #$string = "12"; $string = "1:2"; while (1) { $string =~ $rx; @tmp = @+{qw(a b)}; }
    This leaks.

    If I modify $rx to include only one of the two choices, it doesn't leak. If I plug in a string which matches the first option (i.e. use the $string = "12" line), it doesn't leak. And if you comment out the @tmp = @+ line so you don't access %+, it doesn't leak.

    At this point, I guess I no longer believe that it is a Date::Manip problem. In other words, I don't think the above script is buggy... I think it points out a bug in perl itself. If you agree, I think I'll pass it on as a perl bug.

    Final (I hope) comment?
      If you agree, I think I'll pass it on as a perl bug.

      Yes, I absolutely agree. And your example demos the bug perfectly.

      Nice to know my instincts weren't too far off--I always suspect new features first. But try as hard as I might I couldn't arrive at the simple demo that leaked. Congratulations on that.

      The downside is you'll have to wait a while for the fix, but at least you now know.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Filed as perlbug #78266

      One final thought :)

      There is another way of doing "named captures", without using the construct or %+ or %-. It's a bit um ... obscure, and may be slower, but it might allow you to workaround the problem in the interim without radically altering your existing code.

      #! perl -slw use strict; use re qw[ eval ]; use Data::Dump qw[ pp ]; my $reY = '(\d{4})(?{ $match{ y } = $^N })';; my $reM = '(\d{2})(?{ $match{ m } = $^N })';; my $reD = '(\d{2})(?{ $match{ d } = $^N })';; my $reH = '(\d{2})(?{ $match{ h } = $^N })';; my $reMN = '(\d{2})(?{ $match{ mn } = $^N })';; my $reS = '(\d{2})(?{ $match{ s } = $^N })';; my $reDT = "$reY-$reM-$reD\\s+$reH:$reMN:$reS"; our %match = (); '2010-10-06 20:55:31' =~ $reDT; pp \%match;; __END__ c:\test>junk57.pl { d => "06", h => 20, "m" => 10, mn => 55, "s" => 31, "y" => 2010 }
      Or better still, cut out the middleman and put the captures straight into the names variables themselves (I wish named captures worked this way full stop) :
      #! perl -slw use strict; use re qw[ eval ]; use Data::Dump qw[ pp ]; my $reY = '(\d{4})(?{ $y = $^N })';; my $reM = '(\d{2})(?{ $m = $^N })';; my $reD = '(\d{2})(?{ $d = $^N })';; my $reH = '(\d{2})(?{ $h = $^N })';; my $reMN = '(\d{2})(?{ $mn = $^N })';; my $reS = '(\d{2})(?{ $s = $^N })';; my $reDT = "$reY-$reM-$reD\\s+$reH:$reMN:$reS"; local our( $y, $m, $d, $h, $mn, $s ); '2010-10-06 20:55:31' =~ $reDT; print "$y, $m, $d, $h, $mn, $s"; __END__ c:\test>junk57.pl 2010, 10, 06, 20, 55, 31

      Note: The variables referenced inside the (?{ code }) blocks have to be global, but judicious use of local and our makes it reasonably convenient. Also, I've had iffy results using qr// with this. Never really understood why.

      I realise that it would be considerable work to modify all your regex generators to use this method, but hey!

      You can always knock up a few regex to do it for you ;)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Thanks for the suggestion. I actually already tested this type of regexp (it was suggested in a previous question I posted about named captures: http://perlmonks.org/?node_id=810388). Unfortunately, the relatively simple tests I benchmarked weren't "a little slower". They were 7x slower!

        So, unfortunately, I think I'm going to have to leave the current code in place meaning that there will be a problem for anyone using Date::Manip in a persistant program. Sooner or later, the problem will be fixed. I'm going to add a note to this affect to the documentation, but for now, I think that's going to be the extent of it.

        Thanks again for an incredibly educational and fun (and frustrating... but not due to you) conversation!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://863811]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-20 02:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found