Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Is it File::Map issue, or another 'helpful' Perl regex optimization?

by vr (Monk)
on Mar 17, 2017 at 23:52 UTC ( #1185090=perlquestion: print w/replies, xml ) Need Help??
vr has asked for the wisdom of the Perl Monks concerning the following question:

I have a 50 Mb file:

perl -e "print 'x' x (50*1024*1024)" > x

Suppose I slurp it and do some matching:

use strict; use warnings; my $s = do { local ( @ARGV, $/ ) = 'x'; <> }; $s =~ /x/;
$ /usr/bin/time -f %M perl fmap.pl

Maximum resident set size reported as 53596 kbytes. Fair enough. Then I learn about File::Map, and do this:

use strict; use warnings; use File::Map qw/ map_file /; map_file my $s, 'x', '<'; $s =~ /x/;

105844. Twice as much memory consumed. Actually, I'd expect, quoting POD,

loading the pages lazily on access. This means you only 'pay' for the parts of the file you actually use.

-- match consumes a single byte, hence only a "page" was loaded, no? Not the whole file. Otherwise, what's the point of example in synopsis? OK, maybe I'm wrong and Perl's regex engine wants a string in RAM, physically. But, if match was unsuccessful, e.g. $s =~ /y/; then -- 54676. Looks like a copy is made on each successful match:

$s =~ /x/; $s =~ /x/; $s =~ /x/; $s =~ /x/; $s =~ /x/;

Then: 310784.

But not in a loop: $s =~ /x/ for 1 .. 5; Then, again, 105848.

That's all rather weird. Same happens on Windows, too. (There was another issue, on Windows -- it suddenly refused to map a 'merely' 1 Gb file, and it appears that CreateFileMapping expects a continuous block in virtual memory of required size -- which can either happen or not even during the same day. Doesn't look as usable to me. But perhaps it's not Perl issue.)

I'm asking, because at first I was enthusiastic about this patch. Now I'm not so sure.

Replies are listed 'Best First'.
Re: Is it File::Map issue, or another 'helpful' Perl regex optimization?
by dave_the_m (Prior) on Mar 18, 2017 at 09:06 UTC
    In modern perls, copy-on-write (COW) is used to make a "copy" of the string in the case of a successful match. This copy shares the same string buffer between two scalar values (but unshares them if either scalar value tries to modify its buffer). This avoids the old performance penalty that having $& etc anywhere in your script would impose upon all subsequent matches, while not crashing on something like eval '$&'.

    However, the type of string created by File::Map isn't suitable for being COWed, so perl copies the whole string instead.

    I'll add it to my list of "things to see if we can improve in COW".

    Dave.

Re: Is it File::Map issue, or another 'helpful' Perl regex optimization? (neither)
by Anonymous Monk on Mar 18, 2017 at 00:04 UTC

      Adding advise( $s, 'sequential' ); made no difference :( - I mean, in terms of consumed memory.

        :)

        the module seems vague on claims and evidence, but ,

        I just did some testing, and I get these numbers just loading a 51mb file I create

        You'll need memusage-workingset-virtualmemory.pl to run it yourself

        So, when you map, seems to signal to the OS this is how big the memory usage is going to go (WVM field), and then the working set slowly increases up to the size of the file as the regular expression advances through the whole file "line" by line

        Is this faster than something else? More memory efficien? I dunno

        I'm beginning to suspect this is how File::Map is supposed to work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1185090]
Approved by stevieb
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2017-05-22 20:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?