Reinforced metal plate

Design for High Reliability

Cloudbus has experience engineering both physical security and safety-of-life applications, where product failure or malfunction could result in theft, property damage, injury or even loss of life. We apply rigorous static-analysis, fuzzing and simulation techniques to our firmware to discover weaknesses and fix them before they can cause unexpected failure or be exploited by attackers.

Concept: Reliability by design

To be fully reliable, software must be designed for predictability at both the operational and source-code levels. The biggest risk to the quality of high-reliability systems is often the people who work on them, particularly those who maintain after the original authors have moved on. Cloudbus employs rigorous discipline in source code architecture to design for maximum predictability, readability and confidence, developing embedded firmware to dual-build for both the target device as well as PC simulators, stress- and fuzz-test fixtures and static analysis rigs. A wide array of run-time techniques are also employed, such as:

  • heap and stack canaries
  • trap pages/regions
  • multiple independent watchdogs
  • internal memory validation checksums
  • execution chokepoints
  • product-specific self-checking constraints

Case study: Trust, but verify

Designed by Cloudbus for [undisclosed client], 2014-2015

Part of the requirements for [a smart-home product] included developing firmware for one on-board microcontroller to run all safety-critical systems where the life or death of the user was at stake. Moreover, the firmware has to operate continuously for ten years with no provision for software update. Finally, the chip has to use the absolute minimum of energy to achieve this, and is responsible for switching on and off all other power domains of the product.

To be reliable, this product's firmware must completely distrust the hardware it's running on. This includes its internal RAM.

Cloudbus's solution included a deadline scheduler and a continuous series of self-checking assertions that drive a system-health arbiter. Too many failures in too short a time trigger a shutdown and safing of the system in question. All events are recorded and periodically uploaded to the web for analysis.

PC simulation, stress tests and fuzzing were used to verify integrity of the software systems. This solution met all the stability, reliability, performance and maintainability goals of the product.


Copyright © 2015 Cloudbus / Outbreak, Inc.