Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Leap second coming up. Check your date handling code

by Corion (Patriarch)
on Dec 27, 2016 at 12:36 UTC ( [id://1178526]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Leap second coming up. Check your date handling code
in thread Leap second coming up. Check your date handling code

It all depends on what you're using "time" for and what you're using as your ground truth.

As far as I understand, Google uses synchronized time for lots stuff like of query ordering and vector clocks etc and it is important for them to have one ground truth up to the point that they install their own GPS clocks in datacenters. As these are mostly for the use by Google, it's up to them to decide how they will handle the additional second, and I can understand from a risk assessment point of view that it's likely less risk to have slightly longer seconds instead of having one additional second with the number 60 and auditing your code for the parts where that becomes relevant.

The situation becomes interesting when the private use/decision of Google leaks out into the real world for (say) Google Cloud Engine users or whoever else relies on Google infrastructure and timekeeping.

Personally I can't imagine situations where the exact and synchronized duration of a second is important to you but you don't have your own synchronized clock(s), but you'll have to be prepared for an apparent one second gap when comparing the timestamps of Google infrastructure with the timestamps of your own infrastructure, and over time, the two kinds of timestamps will diverge until at 2017-01-01 00:00:00Z where they will suddenly converge again.

If this is your first time dealing with diverging clocks, it will be an interesting learning experience, especially if you did to GPS-based time exactly to avoid this situation.

  • Comment on Re^3: Leap second coming up. Check your date handling code

Replies are listed 'Best First'.
Re^4: Leap second coming up. Check your date handling code (Cloudflare DNS outage)
by Corion (Patriarch) on Jan 02, 2017 at 14:39 UTC

    In the aftermath of this leap second, Cloudflare experienced an outage and blogged about it. It seems the root cause was code that expected a monotonically ascending value for seconds, but the additional second was handled (by the Go library used) by letting time go backwards one second, which led to negative durations for some events, which finally were not handled gracefully.

    I think this would not have been a problem for Cloudflare if they too had stretched the duration of a second, at least for their machines running RRDNS. Of course, this is literally Monday quarterbacking as I wasn't part of the decision process there. Also, knowing and understanding how time and durations are used within your code is not an easy thing if you don't explicitly analyze your code for the usage of both.

      Very interesting, and quite surprising that they experienced such a problem. Speaks to the immaturity of Go I suppose.

      We are sorry that our customers were affected by this bug and are insp +ecting all our code to ensure that there are no other leap second sensitive uses of t +ime intervals.
      ... might have been nice to do that beforehand. Doesn't seem like it would have been too hard to spot:
      - if rttMax == 0 { + if rttMax <= 0 { rttMax = DefaultTimeout }
      ... if only they were coding in Perl and could use $rtt_max ||= $default_timeout;, LOL


      The way forward always starts with a minimal test.

        How would your approach of

        $rtt_max ||= $default_timeout;

        have worked out if $rtt_max was -1 ?

        Also, before tieing the maturity of a language to problems with its programs, have you checked if and whether time is monotonic?

        Personally, I would assume that the same problematic constellation would happen with Perl, and mocking both the time and the round trip times from DNS queries in a realistic manner only makes sense if your code base is small and you already suspect a problem in that location IMO.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1178526]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-24 01:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found