Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Memory overhead of blessed hashes

by LanX (Sage)
on Feb 09, 2021 at 15:29 UTC ( #11128132=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I've just been in a discussion where several 100k of objects are causing a huge memory consumption.

Some colleagues want to refactor the objects (here blessed hashes) into plain hashes, I'm doubting that this will have a big impact.

From my understanding do hashes need a lot of memory, but the extra magic for blessing them shouldn't count much.

Question: Are there any other memory impacts when using objects apart from those displayed by Devel::Peek ?

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

PS: FWIW: I suggested to keep the objects and to switch to blessed arrays instead.

update

DISCLAIMER: Not my project ... hence no code available.

Replies are listed 'Best First'.
Re: Memory overhead of blessed hashes
by tobyink (Canon) on Feb 09, 2021 at 17:05 UTC

    Blessed arrayrefs will save a bit of memory. If the data being stored is simple enough that it can be serialized into strings, blessed references to strings are are also possible. They are likely to be the most memory-efficient representation of objects, but they can be a pain if your objects aren't very simple. Inside-out objects are also pretty memory-efficient if you have lots of objects but only a few fields, because it's one hash per field instead of one hash per object.

    Do check whether it's really the objects causing the issue though — it might be something like a memory leak caused by more references to the objects than expected. (Cyclical references; objects getting closed over by coderefs; global variables holding references to the objects.)

      I agree about the memory leak, alas not my code.

      > Inside-out objects are also pretty memory-efficient if you have lots of objects but only a few fields, because it's one hash per field instead of one hash per object.

      Good point! ++

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Memory overhead of blessed hashes
by choroba (Archbishop) on Feb 09, 2021 at 15:58 UTC
    I tried the four variants on my Linux box (uncomment the one you want):
    #!/usr/bin/perl use strict; use warnings; my @objs; for my $i (1 .. 1_000_000) { # push @objs, { id => $i, name => "Object of class", number => int + rand 10 }; # push @objs, bless { id => $i, name => "Object of class", number +=> int rand 10 }, 'My::Class'; # push @objs, [$i, 'Object of class', int rand 10]; push @objs, bless [$i, 'Object of class', int rand 10], 'My::Class +'; } print `ps -v | grep ^$$`;

    It seems bless doesn't really change anything, but a hash versus array does.

    23235 pts/0 S+ 0:00 0 2020 371111 359628 1.1 /usr/bin/pe +rl ./1.pl 23225 pts/0 S+ 0:00 0 2020 261035 249668 0.7 /usr/bin/pe +rl ./1.pl

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Well that's what I was expecting (tho I hoped for factor 10 gain for arrays).

      I was asking because I couldn't be sure if the runtime is not secretly doing any speed-for-memory bargain to improve method calls...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        I know that method lookups are cached, but I am quite certain (from the last time I read the documentation) that that cache is associated with the package STASH, rather than each object, since it logically applies to the class (package) and is shared between all objects of that class.

        I would have to dig into the code to be sure, but the memory overhead for bless appears to be very minimal — once an SV has been upgraded high enough to carry any kind of magic, it also has a STASH pointer, and AV and HV structures are complex enough that they always have those fields.

        Arrays in Perl do have considerably smaller overhead than hashes, although not a factor of 10. If you have a large number of some class of object, enough to cause memory usage problems, changing the internal representation to arrays and using constant to name the fields is likely to help. If encapsulation has been respected, this will require no changes outside of the object's implementation class.

Re: Memory overhead of blessed hashes
by bliako (Monsignor) on Feb 09, 2021 at 16:47 UTC

    Perl object memory overhead can be useful for benchmarking this in more depth.

    Probably not your case but if these objects/hashes are constantly created, used and deleted, then forgotten references to them can stop the garbage collector from cleaning them up. Typical memory leak scenario.

      I agree that a memory leak is the most likely cause.

      It's not my code and I don't wanna get deeper involved as long as the stakeholders keep insisting on their voodoo technologies°

      But while consulting I got into an argument with another very well respected member of the Perl community talking about "the non trivial overhead of bless magic" and I wanted to make sure I'm not missing any inner optimizations.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      °) aka say qw/agile try-and-error cargo-cult experience pragmatic/[ rand 5 ] for 1..3

Re: Memory overhead of blessed hashes
by ForgotPasswordAgain (Priest) on Feb 09, 2021 at 17:04 UTC

    I'm not sure exactly what your situation is, so a lot of this is probably irrelevant:

    In my experience, using objects versus unblessed hashes can cause things to be slower (CPU overhead of blessing them) but doesn't add much memory overhead.

    In some situations it might make more sense (financially) to add more RAM instead of spending time reworking the code.

    It's probably not relevant here if you're using objects normally, but with deeply-nested hashes (when doing aggregation, for example) you can save a bit of memory by flattening the hash and concatenating the keys with $;.

    If you're pulling data from a database, keeping the data "on the C side" can save a lot of memory. At least by default with DBD::mysql, it will download the data all at once into C structures, which uses a lot less memory than the same data "inflated" into Perl structures. So for example when aggregating (sum, avg, ...), if you can fetch individual rows instead of all at once, it can save memory. Depending on how (elegant, simple) your code is, it can be challenging to refactor it to do this, though.

    End of memory dump. :)

Re: Memory overhead of blessed hashes
by davido (Cardinal) on Feb 10, 2021 at 15:25 UTC

    Perhaps Devel::Size isn't telling the whole story, but...

    #!/usr/bin/env perl use strict; use warnings; use Devel::Size qw(total_size); my(@o, @h); print "Total starting size for \@o: ", total_size(\@o), "\n"; print "Total starting size for \@h: ", total_size(\@h), "\n"; for (1..100000) { push @o, bless {}, 'Foo'; } for (1..100000) { push @h, {}; } print "Total ending size for \@o: ", total_size(\@o), "\n"; print "Total ending size for \@h: ", total_size(\@h), "\n";

    This produces:

    Total starting size for @o: 64 Total starting size for @h: 64 Total ending size for @o: 15246272 Total ending size for @h: 15246272

    The fact that a hashref is blessed doesn't seem to have any bearing on the total memory consumption. If I use Devel::Peek to see inside a blessed hashref versus a hash, I see one field in the SV that changes: STASH = 0x25fc9d8 "Foo". (Of course, the address will be different on every run). The FLAGS change too, but I don't think that changes memory consumption:

    #!/usr/bin/env perl use strict; use warnings; use Devel::Peek; my $o = bless {}, 'Foo'; my $o2 = bless {}, 'Foo'; my $h = {}; Dump($o); Dump($o2); Dump($h);

    This produces:

    SV = IV(0x1e03ee8) at 0x1e03ef8 REFCNT = 1 FLAGS = (ROK) RV = 0x1de0358 SV = PVHV(0x1de5b70) at 0x1de0358 REFCNT = 1 FLAGS = (OBJECT,SHAREKEYS) STASH = 0x1dfa948 "Foo" ARRAY = 0x0 KEYS = 0 FILL = 0 MAX = 7 SV = IV(0x1e03e40) at 0x1e03e50 REFCNT = 1 FLAGS = (ROK) RV = 0x1de0508 SV = PVHV(0x1de60d0) at 0x1de0508 REFCNT = 1 FLAGS = (OBJECT,SHAREKEYS) STASH = 0x1dfa948 "Foo" ARRAY = 0x0 KEYS = 0 FILL = 0 MAX = 7 SV = IV(0x1e03e58) at 0x1e03e68 REFCNT = 1 FLAGS = (ROK) RV = 0x1dfa990 SV = PVHV(0x1de6130) at 0x1dfa990 REFCNT = 1 FLAGS = (SHAREKEYS) ARRAY = 0x0 KEYS = 0 FILL = 0 MAX = 7

    Possibly worth noting, the STASH value is the same for $o and $o2, so both instances of the object point to the same "Foo" stash, which is encouraging.

    I know it's not your code, but if the code is getting memory pinched, and it's not related to a memory leak, another strategy may be to allow the objects to be serialized / deserialized quickly, and maintain an index for where to find their serializations. Is it really necessary to hold 100k+ in memory at once? If so, if the primary attribute of each object were just a path where the serialization of the remainder of the object's guts can be found, you might save space. Anyway, that's just a thought. There could be constraints preventing that approach.


    Dave

      > I know it's not your code,

      I don't know what they did, and I want to avoid another "told you so" situation.˛

      Just fighting off FUD theories that bless had a memory impact and trying to educate myself.

      > Is it really necessary to hold 100k+ in memory at once?

      From my understanding: they are building complicated trees (well multi-trees°) within a short time window.

      > If so, if the primary attribute of each object were just a path where the serialization of the remainder of the object's guts can be found, you might save space.

      That's a good idea.

      Tho in my experience are Perl&OS pretty efficient in swapping unused hashes as long as they are small enough.

      Of course the performance depends on the frequency you need to access those, but the same applies to your serialization idea.

      Hmm ...

      Actually this is a good counter argument to insight-out-objects, because class-variables holding data for all objects can't be swapped.

      So it's sometimes better to keep rarely used "guts" data inside small hashes at lower nested levels.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      °) elements can have multiple parents (aggregation semantic)

      ˛) see also "Chuck Norris"-ing code

        Just fighting off FUD theories that bless had a memory impact and trying to educate myself.

        To be fair, bless does have a small memory impact: packages used for object classes have slightly greater overhead (per-package) and blessed scalars must be upgraded to carry magic (which also adds the STASH pointer), but the per-object overhead for blessed aggregates is zero — AV and HV structures are large enough that they always have the slot for the STASH pointer.

        Actually this is a good counter argument to insight-out-objects, because class-variables holding data for all objects can't be swapped.

        Virtual memory does not know about that — swapping occurs at page granularity regardless of larger structures. If the hash table is large enough, and accesses do not result in scanning the entire table, portions of the hash table can be swapped out by the OS, even if other parts of the table are held in memory due to frequent access. If one SV on a page is frequently accessed, everything else on that page is also kept in memory.

        So it's sometimes better to keep rarely used "guts" data inside small hashes at lower nested levels.

        Your problem here seems to be the fixed per-hash HV overhead, which is a consequence of the existence of many small hashes in your program, whether blessed or plain.

        If you have a relatively small tree node and search/index keys object with a relatively large and generally opaque "data payload" segment, you could use inside-out-objects to reduce the hash overhead for the search/index keys and DBI/SQLite to store the payloads, possibly in an in-memory database, but once you have eliminated the per-object HV overhead, simply serializing the payloads and storing them in one more hash will probably be comparable to using an in-memory SQLite database for much lower overhead. Unless, of course, you can actually move your entire data tree into SQLite and use SQL to access it, or the payloads really are a large part of the problem and SQLite allows you to move them out to disk while keeping the tree structure in the inside-out objects.

Re: Memory overhead of blessed hashes
by shmem (Chancellor) on Feb 10, 2021 at 09:35 UTC
Re: Memory overhead of blessed hashes
by kschwab (Vicar) on Feb 10, 2021 at 01:41 UTC

    Maybe Cache::FastMmap to hold any bulky per-object data instead? Not knowing what's being done, though, I can't tell if that helps.

Re: Memory overhead of blessed hashes
by Anonymous Monk on Feb 09, 2021 at 20:27 UTC

    It's my understanding that bless just populates a pointer-value in the data structure to allow -> semantics to be used against it.

    Although you don't say how large each object/hash is, "100k of them" does not sound like a problem unless you have a leak. Look for any point in the code where one object acquires a reference to another. If you need that, "weaken" those references. (Scalar::Util)

    Test::Memory::Cycle can also help you look for these problems – it's generally cheap enough that you can salt your code with calls to it and just leave them there.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11128132]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2021-12-06 03:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (31 votes). Check out past polls.

    Notices?