Counter Rollover Bites Boeing 787

This post originally appeared on Dr. Koopman’s Better Embedded System Software Blog. Reprinted with permission.

Counter rollover is a classic mistake in computer software.  And, it just bit the Boeing 787.

The Problem:

The Boeing 787 aircraft's electrical power control units shut down if powered without interruption for 248 days (a bit over 8 months). In the likely case that all the control units were turned on at about the same time, that means they all shut down at the same time -- potentially in the middle of a flight. Fortunately, the power is usually not left on for 8 continuous months, so apparently this has not actually happened in flight.  But the problem was seen in a long-duration simulation and could happen in a real aircraft. (There are backup power supplies, but do you really want to be relying on them over the middle of an ocean?  I thought not.) The fix is turning off the power and turning it back on every 120 days.

That's right -- the FAA is telling the airlines they have to do a maintenance reboot of their planes every 120 days.

(Sources: NY Times ; FAA)

 

Analysis:
Just for fun, let's do the math and figure out what's going on.

248 days * 24 hours/day * 60 minute/hour * 60 seconds/minute = 21,427,20

Hmmm ... what if those systems keep time as an 32-bit signed integer in hundredths of a second? The maximum positive value for such a counter would give:

0x7FFFFFFF = 2147483647 / (24*60*60) = 24855 / 100 = 248.55 days.

If they had used a 32-bit unsigned it would still overflow after twice as long = 497.1 days.

Bingo!

 

Other Examples:

This is not the first time a counter rollover has caused a problem.  Some examples are:

  • IBM: Interface adapters hang after 497 days of uptime [IBM]

  • Windows 95: hang after 49.7 days without reboot, counting in milliseconds [Microsoft]  

There are also plenty of date roll-over bugs:

  • Y2K: on 1 January 2000 (overflow of 2-digit year from 99 to 00)   [Wikipedia]

  • GPS: 1024 week rollover on 22 August 1999 [USCG]

  • Year 2038: Unix time will roll over on 19 January 2038 [Wikipedia]


There are also somewhat related capacity overflow issues such as 512K day for IPv4 routers.

If you want to dig further, there is a "zoo" of related problems on Wikipedia:  "Time formatting and storage bugs"

You can find more insights from Dr. Koopman on his blog.

EDGE CASE RESEARCH
100 43RD ST.
SUITE 208
PITTSBURGH, PA 15201

Copyright 2015, Edge Case Research, LLC. All rights reserved.