Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^12: Memory leak question

by SBECK (Pilgrim)
on Oct 06, 2010 at 12:24 UTC ( #863771=note: print w/ replies, xml ) Need Help??


in reply to Re^11: Memory leak question
in thread Memory leak question

I wasn't aware that Module::Build was such a problem on Windows. I do very little development on windows (basically just a small amount of debugging) and hadn't encountered that problem.

Don't waste your time doing a manual install... with over 900 timezone modules in Date::Manip, that wouldn't be a good use of your time. Instead, I have created a new bundle which uses Makefile.PL instead and put it at: http://sullybeck.com/Date-Manip-6.13a.tar.gz.

As a side note, the reason that Date::Manip currently uses Build.pl is that it gave me the flexibility to test what version of perl was running and then install either version 5 or version 6 of Date::Manip (Date::Manip 6 requires perl 5.10 or higher). I just threw away version 5 for the temporary bundle I created for you, so I didn't need that flexibility.

I was planning on changing this however. After playing with this for a few versions, I've decided that I want to simply install both versions for everyone and then have Date::Manip be a wrapper to load the appropriate version. This will mean being able to go back to providing both a Build.PL and Makefile.PL. This planned change was a bit lower on my priority list, but given your previous message, it has jumped up much higher, and I'm going to try to include that in the next release.

Thanks again.


Comment on Re^12: Memory leak question
Re^13: Memory leak question
by BrowserUk (Pope) on Oct 06, 2010 at 13:02 UTC

    Got it thanks. It downloaded and installed this time in 2 minutes.

    I left the M::B build running last night whilst I watched a film and 1 1/2 hours later it was still using 100% cpu and still had done nothing. The best as I can tell, it is just grepping around the build tree looking for POD, reading every file over & over and over. Dog only knows why!

    I;ve looked to try and fix it several times, but it is sooo (unnecessarily) complicated that I get nowhere.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^13: Memory leak question
by BrowserUk (Pope) on Oct 06, 2010 at 14:33 UTC

    I believe you are being bitten by regex engine leaks.

    Here's what I discovered.

    1. If I replace _iso8601rx() with the bare minimum to parse the date/time in the test, the memory leaks disappear completely.
      my %cache; sub _iso8601_rx { my($self,$rx) = @_; my $dmt = $$self{'tz'}; my $dmb = $$dmt{'base'}; return $cache{ $rx } if exists $cache{ $rx }; } $cache{cdate} = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)'; $cache{ctime} = '(?<h>\d\d):(?<mn>\d\d):(?<s>\d\d)'; $cache{fulldate} = "$cache{cdate}\\s+$cache{ctime}"; 1;
    2. However, if I change that to using the fully expanded regexes, it goes back to leaking like a sieve:

    I thought that it was maybe the use of (so many) named captures, but I tried very hard to make them leak. A single regex with 175,000 named captures; matching /g against a string that contained 10,000 matches for them; in a (v.slow) loop. It grew very arge, but once it maxed out, it didn't leak at all.

    So then I remembered that I'd seen the regex trie optimisation caused problems with large alternations, but disabling it didn't change things.

    Then I thought to try your monster regexes in a standalone script and run them directly on the sample date in a loop:

    #! perl use strict; my %cache = ( ctime => <<'RXA', cdtate => <<'RXB', fulldate -> <<'RXC' + ); ##... monster regex initialisation ellided; my $refull = qr[$cache{ fulldate }]x; my $rectime = qr[$cache{ ctime }]x; my $recdate = qr[$cache{ cdate }]x; for (1..100e6) { "2010-02-01 01:02:03" =~ $refull; "2010-02-01 01:02:03" =~ $rectime; "2010-02-01 01:02:03" =~ $recdate; }

    it doesn't leak at all. Not a jot.

    So, it's not just the monster regexes, but also how they're are being used, or the results are being used that triggers the leak.

    I'm kinda stuck for a direction in which to go now, but I hope that this will help you zero in on the cause. I'll keep looking.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I think you just became the first person in the world who knows Date::Manip parsing internals better than myself!

      Thanks for everything. I think this gives me enough information to track it down, though it's going to take me some time to really digest everything you said, but I think that this is enough to get me a long ways towards fixing the problem.
        I just found out a little bit about the leak.

        Using the original Date::Manip code, there's a line in the _parse_datetime_iso8601 function which looks like:
        ($y,$md,$d,...) = @+{qw(y m d ...)};
        where I just matched on the regexp from _iso8601_rx. If I comment this line out (and just set $y,$m,$d to some static values), there's no leaking. Note that I STILL match the regexp, I simply never refer to the %+ hash.

        Unfortunately, I wasn't able to reproduce this in a simple test script, so I still need to investigate further, but I think this is an interesting result.
        I was able to reproduce the leak in a trivial script, and I think that I'm down to the most basic illustration.
        $a = '(?<a>\d)'; $b = '(?<b>\d)'; $rx = qr/(?:${a}${b}|${a}:${b})/; #$string = "12"; $string = "1:2"; while (1) { $string =~ $rx; @tmp = @+{qw(a b)}; }
        This leaks.

        If I modify $rx to include only one of the two choices, it doesn't leak. If I plug in a string which matches the first option (i.e. use the $string = "12" line), it doesn't leak. And if you comment out the @tmp = @+ line so you don't access %+, it doesn't leak.

        At this point, I guess I no longer believe that it is a Date::Manip problem. In other words, I don't think the above script is buggy... I think it points out a bug in perl itself. If you agree, I think I'll pass it on as a perl bug.

        Final (I hope) comment?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://863771]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-09-15 03:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (145 votes), past polls