Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^4: Leap second coming up. Check your date handling code (Cloudflare DNS outage)

by Corion (Patriarch)
on Jan 02, 2017 at 14:39 UTC ( [id://1178805]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Leap second coming up. Check your date handling code
in thread Leap second coming up. Check your date handling code

In the aftermath of this leap second, Cloudflare experienced an outage and blogged about it. It seems the root cause was code that expected a monotonically ascending value for seconds, but the additional second was handled (by the Go library used) by letting time go backwards one second, which led to negative durations for some events, which finally were not handled gracefully.

I think this would not have been a problem for Cloudflare if they too had stretched the duration of a second, at least for their machines running RRDNS. Of course, this is literally Monday quarterbacking as I wasn't part of the decision process there. Also, knowing and understanding how time and durations are used within your code is not an easy thing if you don't explicitly analyze your code for the usage of both.

  • Comment on Re^4: Leap second coming up. Check your date handling code (Cloudflare DNS outage)
  • Download Code

Replies are listed 'Best First'.
Re^5: Leap second coming up. Check your date handling code (Cloudflare DNS outage)
by 1nickt (Canon) on Jan 02, 2017 at 16:41 UTC

    Very interesting, and quite surprising that they experienced such a problem. Speaks to the immaturity of Go I suppose.

    We are sorry that our customers were affected by this bug and are insp +ecting all our code to ensure that there are no other leap second sensitive uses of t +ime intervals.
    ... might have been nice to do that beforehand. Doesn't seem like it would have been too hard to spot:
    - if rttMax == 0 { + if rttMax <= 0 { rttMax = DefaultTimeout }
    ... if only they were coding in Perl and could use $rtt_max ||= $default_timeout;, LOL


    The way forward always starts with a minimal test.

      How would your approach of

      $rtt_max ||= $default_timeout;

      have worked out if $rtt_max was -1 ?

      Also, before tieing the maturity of a language to problems with its programs, have you checked if and whether time is monotonic?

      Personally, I would assume that the same problematic constellation would happen with Perl, and mocking both the time and the round trip times from DNS queries in a realistic manner only makes sense if your code base is small and you already suspect a problem in that location IMO.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1178805]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-19 02:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found