Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: [OT] Stats problem

by salva (Abbot)
on Feb 26, 2015 at 15:20 UTC ( #1117965=note: print w/replies, xml ) Need Help??


in reply to [OT] Stats problem

In any case, the probability of detecting a buffer overrun using a fixed magic value is already 1 - 2**-32.

Isn't that enough for you?

Replies are listed 'Best First'.
Re^2: [OT] Stats problem
by BrowserUk (Pope) on Feb 26, 2015 at 19:44 UTC
    In any case, the probability of detecting a buffer overrun using a fixed magic value is already 1 - 2**-32. Isn't that enough for you?

    The problem is that in real life the data written by overruns is rarely ever random.

    The problem I saw yesterday was that the over run occurred at both the source and destination of a memcpy().

    Running it under the debugger with a debug malloc() in force and what happened is that as both source and destination fields were equal sized and equally aligned, the debug telltale signature that followed both chunks was identical; so the over run copied the telltale following the source buffer exactly on top of the telltale following the destination buffer.

    So the heap check didn't detect the overrun!

    But data following the telltale following the source also got copied over the data following the telltale at the destination; which showed up later when an entirely unrelated variable contained an impossibly high value and crashed the program. And I spent a long time looking in the wrong place.

    Hence, I reasoned that if the telltales were all different the heap check would have found the error earlier and pointed me at the real culprit.

    But, how could you make the telltales all different and still know what to look for? And that's when the offset idea hit me.

    Even for 64-bit addresses, it is fast to derive a 32-bit offset from the nearest preceding 4GB boundary.

    And the only way the same problem could occur is if the source and destination pointers were exactly 4GB apart; which is considerably less likely than the 16/32/64-byte granularities often used by standard mallocs.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      BrowserUk:

      For that particular use case, you might try something I did when I had a similar problem: Put a matching marker just *before* the address you return in your malloc routine. (In mine, I had a counter inside of malloc, so the marker just happened to be the number of times malloc was called (plus the starting value). I bracketed the requested memory area with the marker, and filled the block with junk. I added the 'before' marker because I was tracking down a buffer overrun as well. (Another thing I did was to put the caller's address and the requested allocation size in the preamble as a debugging aid. Fit nicely in a 'paragraph' in 32-bit mode.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        That would certainly be advantageous in that it would allow you to definitively detect which address the overrun had occurred through.

        The downside is that it adds another 4 bytes to the overhead for every allocation. Probably not a problem for a debug malloc library -- although one of the problems I had was that when I enabled the debug malloc, the program which was designed to run close to the limits of my physical memory without going into swapping, moved into swapping and ran like it was swimming n molasses as a result. So much so that I had to modify the parameters to reduce the memory usage to 1/4 to avoid swapping; which in turn meant that the problem I was trying to track down never actually occured.

        The real advantage of the method I described is that it uses no extra space in actual allocation as it is only written to free space slots. Either when the virtual memory is first obtained; or when a pointer is freed. Of course, it will then only detect overruns if the happen to encounter a free slot; but with the advantage that they can remain enabled, at very little overhead, in production code.

        I also had the notion of using two interleaving, free space chains. The first would be used, leaving (at least one) free space between all allocations, until it filled. Only then would the second be used. That might afford early(er) and more accurate detection of overruns.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      If you don't mind the debug mode speed hit, you could use an alternate memcpy() and friends.

      The idea being, the memory functions will check the values they're copying, and assert that they're not writing the telltale value, since it should never be used as a value in your program.

        I was using a debug malloc(). That was the point.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      I see, what you propose makes sense.

      You can also mix both the offset and the magic marker using xor:

      mem[ix] = ix ^ magic_value;
        You can also mix both the offset and the magic marker using xor:

        I'm not sure there would be any advantage in that? I don't think the combined value would retain the rarity factor of the magic number; nor the uniqueness of the offset.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      As the 4GB offset repeats periodically, why not some unpredictable, non-periodic function that can be easily created and checked?
      • random value based on 64 bit address as seed (last 32 bits)
      • MD5 hash (last 32 bits)
      • XOR of upper and lower 32 bits of address

      I'm sure you can come up with better, faster functions.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        non-periodic function that can be easily created and checked?

        Any time you squeeze a 64-bit value (address) into a 32-bit pot, you are going to get repeats.

        The good thing with using the offset directly is that you know that the repeats are always going to be 4 billion bytes apart. And thus only occur if the program uses more than 4GB of heap; and on my 8GB only occur twice. (Not strictly true if I allowed my machine to go into swapping!)

        With any non-periodic function, the repeats will (must) still occur, the only difference is that the spacing will vary, and be less. It could even put then in adjacent memory slots; or certainly a lot closer together.

        Intuitively -- though as I observed elsewhere, there is nothing much that is intuitive about this -- the danger of the copy-over problem seems more likely the closer together they are.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1117965]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2020-06-06 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (41 votes). Check out past polls.

    Notices?