Commit a4fefd7e authored by schneider's avatar schneider

Merge branch 'rahix/mutex' into 'master'

RFC: Proper Mutexes

See merge request card10/firmware!345
parents ced8c213 86c7418a
Mutex
=====
Ontop of FreeRTOS, we have our own mutex implementation. **Never use the
FreeRTOS mutexes directly! Always use this abstraction layer instead**. This
mutex implementation tries to make reasoning about program flow and locking
behavior easier. And most importantly tries to help with debugging possible
dead-locks.
Design
------
There are a few guiding design principles:
- Mutexes can only be used from tasks, **never** from interrupts!
- Timers can use mutexes, but only with :c:func:`mutex_trylock`, **never** with
:c:func:`mutex_lock` (Because they are not allowed to block).
- Locking can *never* fail (if it does, we consider this a fatal error ⇒ panic).
- No recursive locking.
- An unlock can only occur from the task which previously acquired the mutex.
- An unlock is only allowed if the mutex was previously acquired.
For a more elaborate explanation of the rationale behind these rules take a
look at the :ref:`mutex-design-reasons`.
Definitions
-----------
.. c:autodoc:: epicardium/modules/mutex.h
.. _mutex-design-reasons:
Reasons for this Design
-----------------------
Locking can *never* fail
^^^^^^^^^^^^^^^^^^^^^^^^
This might seem like a bold claim at first but in the end, it is just a matter
of definition and shifting responsibilities. Instead of requiring all code to
be robust against a locking attempt failing, we require all code to properly
lock and unlock their mutexes and thus never producing a situation where
locking would fail.
Because all code using any of the mutexes is contained in the Epicardium
code-base, we can - *hopefully* - audit it properly behaving ahead of time and
thus don't need to add code to ensure correctness at runtime. This makes
downstream code easier to read and easier to reason about.
History of this project has shown that most code does not properly deal with
locking failures anyway: There was code simply skipping the mutexed action on
failure, code blocking a module entirely until reboot, and worst of all: Code
exposing the locking failure to 'user-space' (Pycardium) instead of retrying.
This has lead to spurious errors where technically there would not need to be
any.
Only from tasks
^^^^^^^^^^^^^^^
Locking a mutex from an ISR, a FreeRTOS software timer or any other context
which does not allow blocking is complicated to do right. The biggest
difficulty is that a task might be holding the mutex during execution of such a
context and there is no way to wait for it to release the mutex. This requires
careful design of the program flow to choose an alternative option in such a
case. A common approach is to 'outsource' the relevant parts of the code into
an 'IRQ worker' which is essentially just a task waiting for the IRQ to wake it
up and then attempts to lock the mutex.
If you absolutely do need it (and for legacy reasons), software timers *can*
lock a mutex using :c:func:`mutex_trylock` (which never blocks). I strongly
recommend **not** doing that, though. As shown above, you will have to deal
with the case of the mutex being held by another task and it is very well
possible that your timer will get starved of the mutex because the scheduler
has no knowledge of its intentions. In most cases, it is a better idea to use
a task and attempt locking using :c:func:`mutex_lock`.
.. todo::
We might introduce a generic IRQ worker queue system at some point.
No recursive locking
^^^^^^^^^^^^^^^^^^^^
Recursive locking refers to the ability to 'reacquire' a mutex already held by
the current task, deeper down in the call-chain. Only the outermost unlock
will actually release the mutex. This feature is sometimes implemented to
allow more elegant abstractions where downstream code does not need to know
about the mutexes upstream code uses and can still also create a larger region
where the same mutex is held.
But exactly by hiding the locking done by a function, these abstractions make
it hard to trace locking chains and in some cases even make it impossible to
create provably correct behavior. As an alternative, I would suggest using
different mutexes for the different levels of abstraction. This also helps
keeping each mutex separated and 'local' to its purpose.
Only unlock from the acquiring task
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because of the above mentioned mutex locking semantics, there should never be a
need to force-unlock a forgein mutex. Even in cases of failures, all code
should still properly release all mutexes it holds. One notable exceptions is
``panic()``\s which will abort all ongoing operations anyway.
Only unlock once after acquisition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Justified with an argument of robustness, sometimes the :c:func:`mutex_unlock`
call is written in a way that allows unlocking an already unlocked mutex. But
robustness of downstream code will not really be improved by the upstream API
dealing with arguably invalid usage. For example, this could encourage
practices like unlocking everything again at the end of a function "just to be
sure".
Instead, code should be written in a way where the lock/unlock pair is
immediately recognizable as belonging together and is thus easily auditable to
have correct locking behavior. A common pattern to help with readability in
this regard is the *Single Function Exit* which looks like this:
.. code-block:: cpp
int function()
{
int ret;
mutex_lock(&some_mutex);
ret = foo();
if (ret) {
/* Return with an error code */
ret = -ENODEV;
goto out_unlock;
}
ret = bar();
if (ret) {
/* Return the return value from foo */
goto out_unlock;
}
ret = 0;
out_unlock:
mutex_unlock(&some_mutex);
return ret;
}
......@@ -52,6 +52,7 @@ Last but not least, if you want to start hacking the lower-level firmware, the
debugger
pycardium-guide
memorymap
epicardium/mutex
epicardium/sensor-streams
.. toctree::
......
......@@ -52,6 +52,8 @@
#define INCLUDE_vTaskDelay 1
#define INCLUDE_uxTaskGetStackHighWaterMark 1
#define INCLUDE_xTimerPendFunctionCall 1
#define INCLUDE_xSemaphoreGetMutexHolder 1
/* Allow static allocation of data structures */
#define configSUPPORT_STATIC_ALLOCATION 1
......
......@@ -25,15 +25,12 @@
#include "modules/log.h"
#include "modules/modules.h"
#include "api/common.h"
#include "modules/mutex.h"
#define SSLOG_DEBUG(...) LOG_DEBUG("fatfs", __VA_ARGS__)
#define SSLOG_INFO(...) LOG_INFO("fatfs", __VA_ARGS__)
#define SSLOG_ERR(...) LOG_ERR("fatfs", __VA_ARGS__)
#ifndef EPIC_FAT_STATIC_SEMAPHORE
#define EPIC_FAT_STATIC_SEMAPHORE 0
#endif
/* clang-format off */
#define EPIC_FAT_MAX_OPENED (1 << (EPIC_FAT_FD_INDEX_BITS))
#define EPIC_FAT_FD_GENERATION_BITS (31 - (EPIC_FAT_FD_INDEX_BITS))
......@@ -67,8 +64,6 @@ struct EpicFileSystem {
static const int s_libffToErrno[20];
static const char *f_get_rc_string(FRESULT rc);
static bool globalLockAccquire();
static void globalLockRelease();
static void efs_close_all(EpicFileSystem *fs, int coreMask);
/**
......@@ -97,11 +92,7 @@ static void efs_init_stat(struct epic_stat *stat, FILINFO *finfo);
static EpicFileSystem s_globalFileSystem;
#if (EPIC_FAT_STATIC_SEMAPHORE == 1)
static StaticSemaphore_t s_globalLockBuffer;
#endif
static SemaphoreHandle_t s_globalLock = NULL;
static struct mutex fatfs_lock = { 0 };
static void cb_attachTimer(void *a, uint32_t b)
{
......@@ -118,11 +109,7 @@ void fatfs_init()
assert(!s_initCalled);
s_initCalled = true;
#if (EPIC_FAT_STATIC_SEMAPHORE == 1)
s_globalLock = xSemaphoreCreateMutexStatic(&s_globalLockBuffer);
#else
s_globalLock = xSemaphoreCreateMutex();
#endif
mutex_create(&fatfs_lock);
s_globalFileSystem.generationCount = 1;
fatfs_attach();
......@@ -142,26 +129,24 @@ int fatfs_attach()
{
FRESULT ff_res;
int rc = 0;
if (globalLockAccquire()) {
EpicFileSystem *fs = &s_globalFileSystem;
if (!fs->attached) {
ff_res = f_mount(&fs->FatFs, "/", 0);
if (ff_res == FR_OK) {
fs->attached = true;
SSLOG_INFO("attached\n");
} else {
SSLOG_ERR(
"f_mount error %s\n",
f_get_rc_string(ff_res)
);
rc = -s_libffToErrno[ff_res];
}
}
globalLockRelease();
} else {
SSLOG_ERR("Failed to lock\n");
mutex_lock(&fatfs_lock);
EpicFileSystem *fs = &s_globalFileSystem;
if (!fs->attached) {
ff_res = f_mount(&fs->FatFs, "/", 0);
if (ff_res == FR_OK) {
fs->attached = true;
SSLOG_INFO("attached\n");
} else {
SSLOG_ERR(
"f_mount error %s\n", f_get_rc_string(ff_res)
);
rc = -s_libffToErrno[ff_res];
}
}
mutex_unlock(&fatfs_lock);
return rc;
}
......@@ -227,24 +212,12 @@ static const char *f_get_rc_string(FRESULT rc)
return p;
}
static bool globalLockAccquire()
{
return (int)(xSemaphoreTake(s_globalLock, FF_FS_TIMEOUT) == pdTRUE);
}
static void globalLockRelease()
{
xSemaphoreGive(s_globalLock);
}
int efs_lock_global(EpicFileSystem **fs)
{
*fs = NULL;
if (!globalLockAccquire()) {
return -EBUSY;
}
mutex_lock(&fatfs_lock);
if (!s_globalFileSystem.attached) {
globalLockRelease();
mutex_unlock(&fatfs_lock);
return -ENODEV;
}
*fs = &s_globalFileSystem;
......@@ -259,7 +232,7 @@ int efs_lock_global(EpicFileSystem **fs)
void efs_unlock_global(EpicFileSystem *fs)
{
(void)fs;
globalLockRelease();
mutex_unlock(&fatfs_lock);
}
static bool efs_get_opened(
......
#include "modules/log.h"
#include "modules/mutex.h"
#include "api/dispatcher.h"
#include "FreeRTOS.h"
#include "task.h"
#include "semphr.h"
#define TIMEOUT pdMS_TO_TICKS(2000)
TaskHandle_t dispatcher_task_id;
static StaticSemaphore_t api_mutex_data;
SemaphoreHandle_t api_mutex = NULL;
struct mutex api_mutex = { 0 };
void dispatcher_mutex_init(void)
{
api_mutex = xSemaphoreCreateMutexStatic(&api_mutex_data);
mutex_create(&api_mutex);
}
/*
......@@ -27,12 +26,9 @@ void vApiDispatcher(void *pvParameters)
LOG_DEBUG("dispatcher", "Ready.");
while (1) {
if (api_dispatcher_poll()) {
if (xSemaphoreTake(api_mutex, TIMEOUT) != pdTRUE) {
LOG_ERR("dispatcher", "API mutex blocked");
continue;
}
mutex_lock(&api_mutex);
api_dispatcher_exec();
xSemaphoreGive(api_mutex);
mutex_unlock(&api_mutex);
}
ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
}
......
......@@ -2,6 +2,7 @@
#include "modules/log.h"
#include "modules/modules.h"
#include "modules/config.h"
#include "modules/mutex.h"
#include "api/dispatcher.h"
#include "api/interrupt-sender.h"
#include "l0der/l0der.h"
......@@ -10,7 +11,6 @@
#include "FreeRTOS.h"
#include "task.h"
#include "semphr.h"
#include <string.h>
#include <stdbool.h>
......@@ -26,8 +26,7 @@
#define PYINTERPRETER ""
static TaskHandle_t lifecycle_task = NULL;
static StaticSemaphore_t core1_mutex_data;
static SemaphoreHandle_t core1_mutex = NULL;
static struct mutex core1_mutex = { 0 };
enum payload_type {
PL_INVALID = 0,
......@@ -99,10 +98,7 @@ static int do_load(struct load_info *info)
LOG_INFO("lifecycle", "Loading \"%s\" ...", info->name);
}
if (xSemaphoreTake(api_mutex, BLOCK_WAIT) != pdTRUE) {
LOG_ERR("lifecycle", "API blocked");
return -EBUSY;
}
mutex_lock(&api_mutex);
if (info->do_reset) {
LOG_DEBUG("lifecycle", "Triggering core 1 reset.");
......@@ -120,7 +116,7 @@ static int do_load(struct load_info *info)
*/
res = hardware_reset();
if (res < 0) {
return res;
goto out_free_api;
}
switch (info->type) {
......@@ -134,8 +130,8 @@ static int do_load(struct load_info *info)
res = l0der_load_path(info->name, &l0dable);
if (res != 0) {
LOG_ERR("lifecycle", "l0der failed: %d\n", res);
xSemaphoreGive(api_mutex);
return -ENOEXEC;
res = -ENOEXEC;
goto out_free_api;
}
core1_load(l0dable.isr_vector, "");
} else {
......@@ -149,12 +145,14 @@ static int do_load(struct load_info *info)
LOG_ERR("lifecyle",
"Attempted to load invalid payload (%s)",
info->name);
xSemaphoreGive(api_mutex);
return -EINVAL;
res = -EINVAL;
goto out_free_api;
}
xSemaphoreGive(api_mutex);
return 0;
res = 0;
out_free_api:
mutex_unlock(&api_mutex);
return res;
}
/*
......@@ -254,11 +252,7 @@ static void load_menu(bool reset)
{
LOG_DEBUG("lifecycle", "Into the menu");
if (xSemaphoreTake(core1_mutex, BLOCK_WAIT) != pdTRUE) {
LOG_ERR("lifecycle",
"Can't load because mutex is blocked (menu).");
return;
}
mutex_lock(&core1_mutex);
int ret = load_async("menu.py", reset);
if (ret < 0) {
......@@ -278,7 +272,7 @@ static void load_menu(bool reset)
}
}
xSemaphoreGive(core1_mutex);
mutex_unlock(&core1_mutex);
}
/* Helpers }}} */
......@@ -298,14 +292,9 @@ void epic_system_reset(void)
*/
int epic_exec(char *name)
{
if (xSemaphoreTake(core1_mutex, BLOCK_WAIT) != pdTRUE) {
LOG_ERR("lifecycle",
"Can't load because mutex is blocked (epi exec).");
return -EBUSY;
}
mutex_lock(&core1_mutex);
int ret = load_sync(name, true);
xSemaphoreGive(core1_mutex);
mutex_unlock(&core1_mutex);
return ret;
}
......@@ -318,13 +307,9 @@ int epic_exec(char *name)
*/
int __epic_exec(char *name)
{
if (xSemaphoreTake(core1_mutex, BLOCK_WAIT) != pdTRUE) {
LOG_ERR("lifecycle",
"Can't load because mutex is blocked (1 exec).");
return -EBUSY;
}
mutex_lock(&core1_mutex);
int ret = load_async(name, false);
xSemaphoreGive(core1_mutex);
mutex_unlock(&core1_mutex);
return ret;
}
......@@ -361,17 +346,14 @@ void return_to_menu(void)
void vLifecycleTask(void *pvParameters)
{
lifecycle_task = xTaskGetCurrentTaskHandle();
core1_mutex = xSemaphoreCreateMutexStatic(&core1_mutex_data);
if (xSemaphoreTake(core1_mutex, 0) != pdTRUE) {
panic("lifecycle: Failed to acquire mutex after creation.");
}
mutex_create(&core1_mutex);
mutex_lock(&core1_mutex);
LOG_DEBUG("lifecycle", "Booting core 1 ...");
core1_boot();
vTaskDelay(pdMS_TO_TICKS(10));
xSemaphoreGive(core1_mutex);
mutex_unlock(&core1_mutex);
/* If `main.py` exists, start it. Otherwise, start `menu.py`. */
if (epic_exec("main.py") < 0) {
......@@ -386,11 +368,7 @@ void vLifecycleTask(void *pvParameters)
while (1) {
ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
if (xSemaphoreTake(core1_mutex, BLOCK_WAIT) != pdTRUE) {
LOG_ERR("lifecycle",
"Can't load because mutex is blocked (task).");
continue;
}
mutex_lock(&core1_mutex);
if (write_menu) {
write_menu = false;
......@@ -406,6 +384,6 @@ void vLifecycleTask(void *pvParameters)
do_load((struct load_info *)&async_load);
xSemaphoreGive(core1_mutex);
mutex_unlock(&core1_mutex);
}
}
......@@ -14,6 +14,7 @@ module_sources = files(
'light_sensor.c',
'log.c',
'max30001.c',
'mutex.c',
'panic.c',
'personal_state.c',
'pmic.c',
......
......@@ -2,8 +2,8 @@
#define MODULES_H
#include "FreeRTOS.h"
#include "semphr.h"
#include "gpio.h"
#include "modules/mutex.h"
#include <stdint.h>
#include <stdbool.h>
......@@ -15,7 +15,7 @@ void panic(const char *format, ...)
/* ---------- Dispatcher --------------------------------------------------- */
void vApiDispatcher(void *pvParameters);
void dispatcher_mutex_init(void);
extern SemaphoreHandle_t api_mutex;
extern struct mutex api_mutex;
extern TaskHandle_t dispatcher_task_id;
/* ---------- Hardware Init & Reset ---------------------------------------- */
......
#include "modules/mutex.h"
#include <assert.h>
void _mutex_create(struct mutex *m, const char *name)
{
/* Assert that the mutex has not been initialized already */
assert(m->name == NULL);
/*
* The name is just the parameter stringified which is almost always a
* pointer. If it is, skip over the '&' because it adds no value as
* part of the name.
*/
if (name[0] == '&') {
m->name = &name[1];
} else {
m->name = name;
}
m->_rtos_mutex = xSemaphoreCreateMutexStatic(&m->_rtos_mutex_data);
}
void mutex_lock(struct mutex *m)
{
int ret = xSemaphoreTake(m->_rtos_mutex, portMAX_DELAY);
/* Ensure locking was actually successful */
assert(ret == pdTRUE);
}
bool mutex_trylock(struct mutex *m)
{
int ret = xSemaphoreTake(m->_rtos_mutex, 0);
return ret == pdTRUE;
}
void mutex_unlock(struct mutex *m)
{
/* Ensure that only the owner can unlock a mutex */
assert(mutex_get_owner(m) == xTaskGetCurrentTaskHandle());
int ret = xSemaphoreGive(m->_rtos_mutex);
/*
* Ensure that unlocking was successful; that is, the mutex must have
* been acquired previously (no multiple unlocks).
*/
assert(ret == pdTRUE);
}
TaskHandle_t mutex_get_owner(struct mutex *m)
{
return xSemaphoreGetMutexHolder(m->_rtos_mutex);
}
#ifndef _MUTEX_H
#define _MUTEX_H
#ifndef __SPHINX_DOC
/* Some headers are not recognized by hawkmoth for some odd reason */
#include <stddef.h>
#include <stdbool.h>
#include "FreeRTOS.h"
#include "semphr.h"
#else
typedef unsigned int size_t;
typedef _Bool bool;
typedef void *TaskHandle_t;
typedef void *SemaphoreHandle_t;
typedef int StaticSemaphore_t;
#endif /* __SPHINX_DOC */
/**
* Mutex data type.
*/
struct mutex {
/* Name of this mutex, kept for debugging purposes. */
const char *name;
/* FreeRTOS mutex data structures. */
SemaphoreHandle_t _rtos_mutex;
StaticSemaphore_t _rtos_mutex_data;
};
/**
* Create a new mutex.
*
* Call this function as early as possible, in an obvious location so it is easy
* to find. Mutexes should be defined statically so they stay alive for the
* entire run-time of the firmware.
*/
#define mutex_create(mutex) _mutex_create(mutex, #mutex)
void _mutex_create(struct mutex *m, const char *name);
/**
* Lock a mutex.
*
* If the mutex is held by another task, :c:func:`mutex_lock` will block the
* current task until the mutex is unlocked.
*
* .. warning::
*
* This function is **not** safe to use in a timer!
*/
void mutex_lock(struct mutex *m);
/**
* Try locking a mutex.
*
* If the mutex is currently locked by another task, :c:func:`mutex_trylock`
* will return ``false`` immediately. If the attmept to lock was successful, it
* will return ``true``.
*
* This funciton is safe for use in timers.
*/
bool mutex_trylock(struct mutex *m);
/**
* Unlock a mutex.
*
* You **must** call this function from the same task which originally locked
* the mutex.
*/
void mutex_unlock(struct mutex *m);
/**
* Get the current owner of the mutex.
*
* Returns the task-handle of the task currently holding the mutex. If the
* mutex is unlocked, ``NULL`` is returned.
*/
TaskHandle_t mutex_get_owner(struct mutex *m);
#endif /* _MUTEX_H */
#include <string.h>
#include "FreeRTOS.h"
#include "semphr.h"
#include "queue.h"
#include "epicardium.h"
#include "modules/log.h"
#include "modules/stream.h"
#include "modules/mutex.h"
/* Internal buffer of registered streams */
static struct stream_info *stream_table[SD_MAX];
/* Lock for modifying the stream info table */
static StaticSemaphore_t stream_table_lock_data;
static SemaphoreHandle_t stream_table_lock;
static struct mutex stream_table_lock;
int stream_init()
{
memset(stream_table, 0x00, sizeof(stream_table));
stream_table_lock =
xSemaphoreCreateMutexStatic(&stream_table_lock_data);
mutex_create(&stream_table_lock);
return 0;
}
int stream_register(int sd, struct stream_info *stream)
{
int ret = 0;
if (xSemaphoreTake(stream_table_lock, STREAM_MUTEX_WAIT) != pdTRUE) {
LOG_WARN("stream", "Lock contention error");
ret = -EBUSY;
goto out;
}
mutex_lock(&stream_table_lock);
if (sd < 0 || sd >= SD_MAX) {
ret = -EINVAL;
goto out_release;
goto out;
}
if (stream_table[sd] != NULL) {
/* Stream already registered */
ret = -EACCES;
goto out_release;
goto out;
}
stream_table[sd] = stream;
out_release:
xSemaphoreGive(stream_table_lock);
out:
mutex_unlock(&stream_table_lock);
return ret;
}
int stream_deregister(int sd, struct stream_info *stream)
{
int ret = 0;
if (xSemaphoreTake(stream_table_lock, STREAM_MUTEX_WAIT) != pdTRUE) {
LOG_WARN("stream", "Lock contention error");
ret = -EBUSY;
goto out;
}
mutex_lock(&stream_table_lock);
if (sd < 0 || sd >= SD_MAX) {
ret = -EINVAL;
goto out_release;
goto out;
}
if (stream_table[sd] != stream) {
/* Stream registered by someone else */
ret = -EACCES;
goto out_release;
goto out;