Detecting data races in the Linux kernel

The Linux kernel is a highly optimized multi-core system. As a consequence, kernel developers frequently use relaxed synchronization models: instead of protecting shared variables with locks, the code is carefully engineered so that variables can be read outside critical sections while still providing a consistent view of the data. For instance, the kernel often allocates an object, initializes it, and then inserts it in a list. The inserts to the list are protected by a lock, but the list can be read without taking any lock. The code is thus only correct if the inserted object has been fully and correctly initialized before being inserted in the list. The kernel relies on memory barriers to order the initialization and the insert.


However, memory barriers are notoriously tricky to use correctly. A quick search in kernel commits found hundreds of bugs related to missing barriers. These bugs are tricky to understand and to fix.


In this project, we want to create a tool that would help developers get formal guarantees on code that relies on ordering and barriers for correctness. The key idea of the project is to use existing comments in the kernel code to figure out ordering constraints between variables or functions. Kernel developers often comment on the intent of the code, and informally document the ordering constraints of their code. We want to use these comments to figure out if the commented constraints are respected throughout the entire kernel.