Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

match operator linear with PerlIO :mmap, exponential with Sys::Mmap

by daxim (Curate)
on Nov 18, 2018 at 12:20 UTC ( [id://1225986]=perlquestion: print w/replies, xml ) Need Help??

daxim has asked for the wisdom of the Perl Monks concerning the following question:

input file `00`
<87>Nov 18 09:43:43 rotechili.localdomain sudo[568]: pam_systemd(sudo: +session): Cannot create session: Already running in a session <30>Nov 18 09:43:45 rotechili.localdomain dbus-daemon[1270]: [system] +Successfully activated service 'org.freedesktop.locale1' <13>Nov 18 09:44:14 rotechili.localdomain [RPM][708]: Transaction ID 5 +bf1265e finished: 0 <86>Nov 18 09:44:19 rotechili.localdomain sudo[568]: pam_unix(sudo:ses +sion): session closed for user root
duplicate it a bunch, takes 1 GB altogether
cat 00 00 > 01 cat 01 01 > 02 cat 02 02 > 03 cat 03 03 > 04 cat 04 04 > 05 cat 05 05 > 06 cat 06 06 > 07 cat 07 07 > 08 cat 08 08 > 09 cat 09 09 > 10 cat 10 10 > 11 cat 11 11 > 12 cat 12 12 > 13 cat 13 13 > 14 cat 14 14 > 15 cat 15 15 > 16 cat 16 16 > 17 cat 17 17 > 18 cat 18 18 > 19 cat 19 19 > 20
the code, file `logparser`
use 5.028; use strictures; use autodie; use Sys::Mmap qw(mmap PROT_READ MAP_SHARED); my ($approach, $file) = @ARGV; my $str; if ('perlio' eq $approach) { open my $fh, '<:mmap', $file; local $/; $str = readline $fh; } else { open my $fh, '<', $file; mmap($str, 0, PROT_READ, MAP_SHARED, $fh) or die "mmap: $!"; }; my $mon = join '|', qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec +); my $pattern = qr" ^ <(?'pri' \d{1,3} )> (?'mon' $mon ) [ ] (?'day' (?: [ ]\d | \d\d ) ) [ ] (?'time' \d\d:\d\d:\d\d ) [ ] (?'host' [^ ]+ ) [ ] (?'msg' [^\n]+ ) $ "amosx; while ($str =~ /$pattern/g) { () = %+; # do something useful with the results };
linear run-time with PerlIO :mmap (when file size doubles, then run-time doubles)
$ perl -E'for (10..20) { print "file $_: "; system "time perl logparse +r perlio $_" }' file 10: 0.06user 0.00system 0:00.07elapsed 98%CPU (0avgtext+0avgdata +11480maxresident)k 0inputs+0outputs (0major+1298minor)pagefaults 0swaps file 11: 0.09user 0.00system 0:00.10elapsed 100%CPU (0avgtext+0avgdata + 12444maxresident)k 0inputs+0outputs (0major+1422minor)pagefaults 0swaps file 12: 0.17user 0.00system 0:00.17elapsed 99%CPU (0avgtext+0avgdata +14264maxresident)k 0inputs+0outputs (0major+1662minor)pagefaults 0swaps file 13: 0.28user 0.01system 0:00.29elapsed 99%CPU (0avgtext+0avgdata +17800maxresident)k 0inputs+0outputs (0major+1636minor)pagefaults 0swaps file 14: 0.56user 0.00system 0:00.57elapsed 100%CPU (0avgtext+0avgdata + 25432maxresident)k 0inputs+0outputs (0major+1582minor)pagefaults 0swaps file 15: 1.03user 0.01system 0:01.05elapsed 99%CPU (0avgtext+0avgdata +39760maxresident)k 0inputs+0outputs (0major+3007minor)pagefaults 0swaps file 16: 2.10user 0.02system 0:02.13elapsed 99%CPU (0avgtext+0avgdata +68880maxresident)k 0inputs+0outputs (0major+6871minor)pagefaults 0swaps file 17: 4.25user 0.05system 0:04.31elapsed 99%CPU (0avgtext+0avgdata +127220maxresident)k 0inputs+0outputs (0major+14606minor)pagefaults 0swaps file 18: 8.26user 0.09system 0:08.37elapsed 99%CPU (0avgtext+0avgdata +243520maxresident)k 0inputs+0outputs (0major+29051minor)pagefaults 0swaps file 19: 16.27user 0.15system 0:16.43elapsed 99%CPU (0avgtext+0avgdata + 476608maxresident)k 0inputs+0outputs (0major+59994minor)pagefaults 0swaps file 20: 32.82user 0.22system 0:33.08elapsed 99%CPU (0avgtext+0avgdata + 942436maxresident)k 0inputs+0outputs (0major+121878minor)pagefaults 0swaps
exponential run-time with Sys::Mmap
$ perl -E'for (10..20) { print "file $_: "; system "time perl logparse +r sysmmap $_" }' file 10: 0.16user 0.00system 0:00.16elapsed 99%CPU (0avgtext+0avgdata +12060maxresident)k 0inputs+0outputs (0major+1406minor)pagefaults 0swaps file 11: 1.44user 0.01system 0:01.45elapsed 99%CPU (0avgtext+0avgdata +12700maxresident)k 0inputs+0outputs (0major+1635minor)pagefaults 0swaps file 12: 4.68user 0.00system 0:04.69elapsed 99%CPU (0avgtext+0avgdata +14744maxresident)k 0inputs+0outputs (0major+2108minor)pagefaults 0swaps file 13: 24.50user 0.01system 0:24.56elapsed 99%CPU (0avgtext+0avgdata + 18164maxresident)k 0inputs+0outputs (0major+2025minor)pagefaults 0swaps file 14: 96.97user 0.02system 1:37.10elapsed 99%CPU (0avgtext+0avgdata + 25456maxresident)k 0inputs+0outputs (0major+2381minor)pagefaults 0swaps file 15: ^Z
Can you speculate why I get these results? Am I using the variant with :mmap correctly?

Replies are listed 'Best First'.
Re: match operator linear with PerlIO :mmap, exponential with Sys::Mmap
by bliako (Monsignor) on Nov 18, 2018 at 23:27 UTC

    Given that Sys::Mmap ties a scalar to the memory region returned by OS-level mmap() and that you get exponential run-time (and not just a constant increase) when using Sys::Mmap, my guess is that the regex seeks at the beginning of the string for each iteration of the loop but somehow/possibly does not mess results up.

    If you avoid using the tie'd variable, you should get equal performance. E.g., my $str1 = $str; and apply regex to $str1.

    Each time Sys::Mmap's tied scalar FETCH()es, it dereferences the original scalar ref (see https://metacpan.org/source/SWALTERS/Sys-Mmap-0.19/Mmap.pm). Perhaps the dereference of the mmap-ed string combined with regex-ing it in a loop via the g modifier causes the problem.

    bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1225986]
Approved by johngg
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-23 18:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found