Testing Baremetal Firmware at nubix
We write our baremetal firmware in C. Our devices are extremely small and some need to use extremely low levels of power, so that they can run on a single battery for about 10 years. So we usually have no chance to debug the firmware on the device, there’s not enough room for a gdb-server nor JTAG, we need to be sure that it works beforehand. This requires good care in planning, writing and testing the firmware.
For testing we use besides the usual best practices some unusual testing strategies.
Best practices overview:
C Testing Frameworks
- manual: assert/longjmp
- GNU autotools (portable qemu/valgrind)
- cmake/ctest: enable_testing(), by return code only
- tap (ok 1, ok 2, finds i.e. concurrency flaws)
C++ Testing Frameworks
- Google gtest with gmock
- Boost Test
- dedicated HW
- LTP (Linux Testing Project)
- cbmc (Formal symbolic verification)
- unit tests
- integration tests
- regression tests
- format/style tests
- test coverage
Of course you can tests simple functions by themselves, on linux. You just need to link the module to a test driver, use valgrind or AddressSanitizer, call it with some input values and expect some output values. Almost everybody does that.
If you cannot extract a module from your firmware to be compiled on linux, you need to abstract away your hardware a bit. E.g. you might need a virtual UART, virtual interrupt handler vector, virtual memory mappings, virtual sensor input (ADC, GPIO, …), virtual networking and so on.
Or you can emulate your firmware in renode or qemu. I have quite a lot of stm32 patches for qemu on my github, but best works a primitive 8/16-bit AVR with qemu. Our devices are largely unpopular, because they are so small and cheap, and apparently very unfriendly for non-expert IoT developers. For every board we make (yes, we make our own board HW) I could prepare a special qemu and/or renode profile, but usually writing a simulator is easier. Having the luxury of a Rasberry Pi or other linux board is usually not given.
To find the most common C memory errors, unit-tests need to check for
- uninitialized memory reads
- use-after-free (aka dangling pointer)
- out-of-bounds (read and write, at heap, stack and global)
Optionally memory leaks and test functions with more than just the given test values. Either randomized, fuzzed or all possible ranges. Otherwise coverage is only for the given values.
A nice overview of the tools to cover these errors are at https://github.com/google/sanitizers/wiki/AddressSanitizerComparisonOfMemoryTools (without outdated mudflap and guard pages, and I added cbmc):
|ARCH||x86, ARM, …||x86, ARM, …||x86||all(?)||x86, ARM, …|
|UMR||no, use msan||yes||yes||no||some|
DBI: dynamic binary instrumentation
CTI: compile-time instrumentation
UMR: uninitialized memory reads
UAF: use-after-free (aka dangling pointer) AddressSanitizerExampleUseAfterFree
UAR: use-after-return AddressSanitizerExampleUseAfterReturn
– Heap OOB: AddressSanitizerExampleHeapOutOfBounds
– Stack OOB: AddressSanitizerExampleStackOutOfBounds
– Global OOB: AddressSanitizerExampleGlobalOutOfBounds
x86: includes 32- and 64-bit.
gperftools: various performance tools/error detectors bundled with TCMalloc.
Heap checker (leak detector) is only available on Linux.
Debug allocator provides both guard pages and canary values for more precise detection of OOB writes
“Testing can never show the absence of bugs, only the presence.” (Buxton, Randel 1970)
To check for more than the test values as in the unit-tests, one can also use a model-checker tool. Recommended is cbmc. cbmc does a formal verification of your firmware for all your code-paths with all the possible input values. This verifies the tests for all symbolized user-inputs for all possible input value ranges, not just the values choosen in the unit-tests. This is trivially implementable by replacing the compiler invocation command-line in the makefile with the “cbmc <cbmc_options>” command and symbolizing your input data. cbmc is the industry standard tool to do formal verification, with the limitation that it cannot prove termination. I.e. it has problems with longer loops (>256) or longer lists (>64). So with overlong linked-lists you need to check that by yourself, or use a better formal verifier, such as ATS, Isabelle or coq.
make verify is part of every test-run.
- llvm libfuzzer
are current state of the art, and simple to use. You just need a few days to set it up for some basic testing. llvm libfuzzer might be the easiest for clang as it finds the most bugs with the least amount of work. We usually use all 3 for a good coverage. But this needs debugging builds with asserts, and/or asan, which might need a
--static-asan build and/or a simulator. With a big enough chip you can try it on HW also, but then you need a backtracing library, which prints the backtrace via UART. With a formally verified firmware you won’t need a fuzzer, and you won’t find any errors, but you can try.
A simulator simulates the cpu, memory and in/output of your board on linux, usually simulated in a 32bit environment on AMD64 cross-compiled to i686, but arm7 is also possible. That’s why I had to switch from macOS back to Linux, because latest macOS has no 32bit support anymore.
The simple sensor boards need a complicated setup of a simulator loop to react on input events, and a parallel thread to send these events. Networking and UART/ADC/GPIO/I2C/SPI is trivially simulated, absolute pointers as typical in firmware or interrupts not so.
For our mesh-networking library I simulated a mesh of 200 virtual devices with point-to-point or broadcast events over the air, simulated by memcpy’ing the radio traffic from one device memory space to another. No need for native threads. This way you can trivially simulate realistic mesh topologies on a single machine, and the simulation matches the observed traffic and timing very well. These tests are of course added to our CI, if the simulation produces errors or unusually bad traffic overflows or deadlocks, the CI fails. I can also randomly add HW or radio failures to stress-test the algorithms
Usually a firmware is a simple nested state-machine, with not many interrupts breaking the usual control flow, and not many events triggered by it. With a simple design and proper abstraction of the private states of each module you can keep out of the typical spaghettiware as seen in most commercial firmware. If your state-machine gets too complicated or you need extra threads besides the main and modem tasks, you should think of formal verification of the parallel states to avoid concurrency bugs, such as deadlocks, livelock, races resource starvation, thread-safety, re-entrancy, priority inversion, …
Thanksfully we don’t have to rely on POSIX with its myriads of unsafe blocking calls and shared memory thread libraries, callbacks make concurrency problems much easier to handle. To formally verify these problems you’d need to use TLA+, starling, promela or a similar proof system. The starling syntax is very close to cbmc.
The usual application developer is told that he will never in his lifetime struggle with compiler bugs, all the bugs are from the application developer made by himself or a colleague. Well, in our cases we constantly struggle with libc or compiler bugs. For stm8 we used to use the sdcc, which creates horrible assembly code in some array access cases, so we have to rewrite some critical functions in ASM. With GCC we caught quite a lot of optimizer bugs, esp. with gcc-9, which is blacklisted, but also gcc-11 as of now. The avr-gcc is very fine though. clang is esp. nice with my the bounds checking library, much better than the builtin
_FORTIFY_SOURCE glibc attempts. We need to use similar safe -Oboring compiler switches as in the linux kernel, but we are safe to use
-flto link time optimizations to reduce the codesize across our modules. Every byte counts.
bolt optimized binaries are also in test for extreme optimization cases.
-Wall -Wextra -Werror -pedantic is mandatory.
Despite all the rumors compilers are still extremely badly written. E.g. they cannot deal with sparse arrays by themselves, as needed by e.g. unicode properties or IP ranges. switch/case is either handled by ifelse chains (linear lookup) or jump tables for dense arrays. But not yet logarithmic lookup via binary search or even constant lookup by minimal perfect hashes.
There’s now C11 _Generic but still no proper constexpr in C, and gcc throws when the constexpr is not const, so you cannot check if an expression is a constexpr. You can with clang though.
There is still no string library, when you consider strings not as glorified zero-terminated memory buffers, but strings according to the Unicode standard, esp. UTF-8. coreutils and most essential tools also still cannot find denormalized strings, wrap strings and such. We do have such tools and are working with the C++ and C standard comittees to get the situation improved. E.g. having identifiable identifiers or filenames, or getting better compilers or security. Well, security is a topic you should better avoid, as the situation is still unacceptable.
libc bugs and limitations
I also have maintain our own safe libc with the proper bounds-checking Annex K API, and a proper string API. Because most libc’s refuse to adopt the standard. And I also wrote a C++ container library for C, the CTL. This is cross-tested against the most popular C++ STL libraries, and of course finds all the STL bugs or limitations you didn’t know they exist. Of course I use formal verification in the test suite (as long as the lists are very short), and caught several API deficiencies, such as the 3-vs 2 way comparison callbacks for sort or search. My CTL compare argument supports both, the STL not. It also uses safe iterators, not the entirely unsafe STL iterators, which are just glorified pointers or indices, but not safe ranges. The hash table is via swiss-tables, about 10x faster and 2x smaller than your typical hash table, with security profiles nobody else cares about. People still believe that slow hash functions can protect you from DOS attacks. The ordered set is via B-Tree’s not just simple red-black trees as in C++.
For static containers you might want to use the etl or the SSTL.
Various firmware come in their SDK with own variants of a tiny C library, most of it is “dreck”. E.g. the default STM32 HAL does busy polling in their event-loop, which cannot be used with low-power devices. It would drain the battery in a few weeks. The SDK either comes with the sources, then you can improve it easily, or you have to reverse engineer it and replace it with your own fixed variants.
Point is, without proper testing you’ll never find such kind of problems.
Of course the coolest parts of embedded development are the integration tests, i.e. the hardware and firmware debugging and verification.
This uses a luxury STM32 development board with JTAG / openocd debugging via UART via USB. It helps a lot with younger developers to write proper drivers and firmware.
And this setup tests one of our single-chip NB-IoT sensor and modem boards with the oscilloscope. This is the only part which you cannot do in your home-office.
Reini Urban, nubix Software Design GmbH Dresden, Germany
For further information please contact
Mr. Andreas Petter
nubix Software-Design GmbH — Breitscheidstraße 36 — 01237 Dresden — Fon +49 351 4793813 — firstname.lastname@example.org — https://www.nubix.de/