FreeRTOS Queues on STM32: Copy Semantics, the ISR Boundary, and the Anti-Patterns That Cause Silent Data Loss

FreeRTOS Queues on STM32: Copy Semantics, the ISR Boundary, and the Anti-Patterns That Cause Silent Data Loss

Most FreeRTOS queue tutorials stop at xQueueSend and xQueueReceive — the two calls that were never the hard part. They show you a producer task, a consumer task, a queue of integers, and call it a day. That part is easy, and it has never caused a field failure.

The failures come from what those tutorials skip: that a queue copies by value, not by reference — and that misunderstanding this causes silent data corruption on large structs. They come from calling the wrong API variant across the ISR boundary and watching your RTOS corrupt its own kernel structures. They come from sizing a queue by feel and discovering, six months into production, that a 50 ms burst fills it in 2 ms and drops every subsequent item without complaint. This article covers that ground.

Where This Sits in the Series

This lesson builds directly on FreeRTOS Tasks on STM32task creation, priorities, and the scheduler are assumed knowledge. If you haven’t read the Memory Management lesson, skim the static-allocation section first; every example here uses xQueueCreateStatic and avoids the heap entirely. The interrupt pattern post covers the general ISR-to-task handoff; here we apply that pattern specifically to queues. The next lesson covers semaphores and mutexes — once you understand what a queue guarantees at the ISR boundary, the mutex ownership model and its priority-inheritance semantics land much more cleanly.

The One Crux: A Queue Copies Data, Not References

This is the conceptual spine the entire article returns to. When you call xQueueSend(), FreeRTOS copies pvItemToQueue into the queue’s internal storage — byte for byte, uxItemSize bytes. When you call xQueueReceive(), FreeRTOS copies those bytes out into the buffer you supply. The queue owns its storage. After xQueueSend returns, you can overwrite, free, or corrupt the original variable and the queue item is unaffected. After xQueueReceive returns, modifying the received copy has zero effect on the queue.

This sounds obvious. In practice it generates a class of bugs that are invisible in unit tests and surface only under concurrent load: passing a pointer into a queue (thinking the queue “holds” the data), passing a pointer to a stack-allocated struct that goes out of scope before the receiver runs, or storing a pointer to a shared buffer and writing to that buffer before the receiver has processed the previous item. Every one of these compiles cleanly. Every one of them is wrong.

Internals: What the Queue Actually Is

A FreeRTOS queue is a circular buffer of fixed-size slots sitting in a contiguous memory region, plus two linked lists: one for tasks blocked waiting to send (queue full), one for tasks blocked waiting to receive (queue empty). The control structure Queue_t (defined in queue.c) holds head and tail indices, the current item count, and pointers into the storage area. The storage area is uxLength × uxItemSize bytes.

With static allocation — the only allocation strategy acceptable in certified or hard-real-time firmware — you provide both regions at creation time:

/* sensor_pipeline.h */
#define SENSOR_QUEUE_DEPTH   16U
#define SENSOR_ITEM_SIZE     sizeof(SensorFrame_t)

typedef struct {
    uint32_t timestamp_us;   /* captured from TIM2 at ADC EOC */
    uint16_t adc_raw[8];     /* 8-channel simultaneous sample */
    uint8_t  channel_mask;
    uint8_t  overrun_flag;
} SensorFrame_t;

/* sensor_pipeline.c */
static StaticQueue_t       s_queueStruct;
static uint8_t             s_queueStorage[SENSOR_QUEUE_DEPTH * SENSOR_ITEM_SIZE];
static QueueHandle_t       s_sensorQueue;

void SensorPipeline_Init(void)
{
    s_sensorQueue = xQueueCreateStatic(
        SENSOR_QUEUE_DEPTH,
        SENSOR_ITEM_SIZE,
        s_queueStorage,
        &s_queueStruct
    );
    configASSERT(s_sensorQueue != NULL);
}

s_queueStorage is 16 × 28 = 448 bytes of SRAM. At no point does FreeRTOS touch the heap. The configASSERT on the handle is non-negotiable — with static allocation the function should never return NULL, but an uxItemSize of zero or a null storage pointer will cause it to, and you want that caught at boot, not six hours into a soak test.

Thread-Safety Cost

Every queue operation that modifies the control structure is protected by a short critical section (a taskENTER_CRITICAL / taskEXIT_CRITICAL pair, which maps to BASEPRI manipulation on Cortex-M). The copy itself — memcpy of uxItemSize bytes — happens inside that critical section. This is the latency cost you’re paying per send or receive: one critical-section entry, one memcpy, one critical-section exit, plus potential scheduler invocation if a blocked task is unblocked. For small items (≤ 16 bytes) on an STM32H7 at 480 MHz the critical section is measured in single-digit microseconds. For a 256-byte struct it becomes tens of microseconds — long enough to matter for a 1 kHz control loop.

Real-Time Behavior: Blocking, Priority, and the ISR Boundary

Blocking and Unblocking

When a task calls xQueueReceive on an empty queue with a non-zero timeout, it is moved to the blocked state and placed on the queue’s receive-wait list, ordered by priority (highest priority first). The scheduler immediately runs the next ready task. When a sender (task or ISR) posts an item, FreeRTOS walks the receive-wait list, removes the highest-priority waiter, copies the item directly into that task’s receive buffer, and moves the task to the ready list. If the unblocked task has higher priority than the current task, a context switch happens immediately (for task senders) or is deferred to the end of the ISR (for ISR senders, via portYIELD_FROM_ISR).

The symmetric case applies for senders: a task calling xQueueSend on a full queue blocks and waits on the send-wait list. An ISR must never block — calling xQueueSend (non-ISR variant) from an ISR is undefined behavior, covered in detail in the anti-patterns section.

The ISR Boundary

FreeRTOS maintains two sets of queue APIs: the task-context variants (xQueueSend, xQueueReceive, etc.) and the ISR-safe variants (xQueueSendFromISR, xQueueReceiveFromISR, etc.). The ISR variants differ in three critical ways:

  • They take a BaseType_t *pxHigherPriorityTaskWoken output parameter instead of a timeout.
  • They use taskENTER_CRITICAL_FROM_ISR / taskEXIT_CRITICAL_FROM_ISR (which save/restore BASEPRI) rather than the task-context variants.
  • They never block. If the queue is full (SendFromISR) or empty (ReceiveFromISR), they return errQUEUE_FULL / errQUEUE_EMPTY immediately.

The pxHigherPriorityTaskWoken parameter is how the ISR requests a context switch without calling the scheduler directly. If posting to the queue unblocks a task with higher priority than the currently interrupted task, FreeRTOS sets *pxHigherPriorityTaskWoken = pdTRUE. You must then call portYIELD_FROM_ISR(*pxHigherPriorityTaskWoken) at the end of the ISR — which, on Cortex-M, pends a PendSV interrupt to do the actual switch after the ISR returns. Skipping this call is legal syntax; it just means the high-priority task waits until the next scheduler tick instead of running immediately, destroying the latency guarantee you built the queue for.

Complete STM32H7 Example: ADC EOC ISR → Processing Task

The scenario: an STM32H743 runs ADC1 in DMA mode, 8 channels, triggered by TIM3 at 1 kHz. The DMA-complete (or ADC EOC) ISR fires at 1 kHz, assembles a SensorFrame_t, and hands it to a processing task. The task must not run inside the ISR. Peripheral init is elided — the pattern is the same regardless of trigger source.

/* ----------------------------------------------------------------
 * sensor_pipeline.c  —  STM32H743, FreeRTOS, static allocation only
 * Peripheral init elided; ADC1 DMA configured externally.
 * ---------------------------------------------------------------- */
#include "FreeRTOS.h"
#include "queue.h"
#include "task.h"
#include "sensor_pipeline.h"
#include "stm32h7xx_hal.h"

/* ---------- Queue ----------------------------------------------- */
#define SENSOR_QUEUE_DEPTH   16U

typedef struct {
    uint32_t timestamp_us;
    uint16_t adc_raw[8];
    uint8_t  channel_mask;
    uint8_t  overrun_flag;
} SensorFrame_t;

static StaticQueue_t  s_queueStruct;
static uint8_t        s_queueStorage[SENSOR_QUEUE_DEPTH * sizeof(SensorFrame_t)];
static QueueHandle_t  s_sensorQueue;

/* ---------- Task ------------------------------------------------- */
#define PROCESS_TASK_STACK  512U
static StaticTask_t        s_processTaskTCB;
static StackType_t         s_processTaskStack[PROCESS_TASK_STACK];
static TaskHandle_t        s_processTask;

/* ---------- DMA receive buffer (written by DMA, read by ISR) ----- */
/* Placed in AXI SRAM — DMA-accessible, cache-coherent with MPU config */
static __attribute__((section(".axi_sram")))
       uint16_t  s_dmaBuf[8];

/* ----------------------------------------------------------------
 * ISR: called when DMA transfer to s_dmaBuf completes (1 kHz)
 * ---------------------------------------------------------------- */
void DMA1_Stream0_IRQHandler(void)
{
    BaseType_t higherPriorityTaskWoken = pdFALSE;

    /* Acknowledge DMA interrupt — peripheral-specific, elided */
    __HAL_DMA_CLEAR_FLAG(&hdma_adc1, DMA_FLAG_TCIF0_4);

    /* Build frame entirely on ISR stack — no heap, no shared pointer */
    SensorFrame_t frame;
    frame.timestamp_us  = __HAL_TIM_GET_COUNTER(&htim2); /* free-running µs timer */
    frame.channel_mask  = 0xFFU;
    frame.overrun_flag  = 0U;
    for (uint8_t i = 0U; i < 8U; i++) {
        frame.adc_raw[i] = s_dmaBuf[i];   /* copy out of DMA buffer */
    }

    /* Post to queue — ISR variant, never blocks */
    BaseType_t result = xQueueSendFromISR(s_sensorQueue,
                                          &frame,
                                          &higherPriorityTaskWoken);
    if (result != pdTRUE) {
        /* Queue full: increment a diagnostic counter, never block or assert */
        SensorPipeline_IncrementOverrunCount();
    }

    /* If posting unblocked a higher-priority task, request a context switch */
    portYIELD_FROM_ISR(higherPriorityTaskWoken);
}

/* ----------------------------------------------------------------
 * Processing task: runs at high priority, blocks on empty queue
 * ---------------------------------------------------------------- */
static void ProcessingTask(void *pvParameters)
{
    (void)pvParameters;
    SensorFrame_t frame;   /* receive buffer on task stack */

    for (;;) {
        /* Block indefinitely — this task has no other work */
        if (xQueueReceive(s_sensorQueue, &frame, portMAX_DELAY) == pdTRUE) {
            /* frame is a private copy; safe to read/write freely */
            ProcessSensorFrame(&frame);   /* elided — filtering, scaling, etc. */
        }
    }
}

/* ----------------------------------------------------------------
 * Init: call before vTaskStartScheduler()
 * ---------------------------------------------------------------- */
void SensorPipeline_Init(void)
{
    s_sensorQueue = xQueueCreateStatic(
        SENSOR_QUEUE_DEPTH,
        sizeof(SensorFrame_t),
        s_queueStorage,
        &s_queueStruct
    );
    configASSERT(s_sensorQueue != NULL);

    s_processTask = xTaskCreateStatic(
        ProcessingTask,
        "SensorProc",
        PROCESS_TASK_STACK,
        NULL,
        configMAX_PRIORITIES - 2U,  /* high priority, below only critical tasks */
        s_processTaskStack,
        &s_processTaskTCB
    );
    configASSERT(s_processTask != NULL);
}

Walkthrough

  • DMA buffer is separate from the queue item. s_dmaBuf is written by DMA hardware. The ISR copies its contents into a local SensorFrame_t on the ISR stack, then posts that struct. By the time the next DMA transfer overwrites s_dmaBuf, the queue already holds its own copy.
  • ISR stack frame, not a static global. frame is declared inside the ISR. It exists only for the duration of the ISR invocation. Because xQueueSendFromISR copies by value before returning, this is safe. A static local would serialize across ISR invocations — don't do it.
  • Return value checked, not ignored. On queue full, a diagnostic counter increments. In a shipping product this counter feeds a health monitor. Silently discarding the error hides overrun conditions that only appear under load.
  • portYIELD_FROM_ISR is always called. Even when higherPriorityTaskWoken is pdFALSE, the macro expands to nothing on Cortex-M in that case — zero cost, full correctness.
  • Processing task uses portMAX_DELAY. With no other work, sleeping on the queue is the right pattern. CPU utilization of this task is exactly proportional to the rate at which frames arrive, with no polling overhead.
  • Static allocation throughout. No heap touchpoint. The RAM footprint is fixed and known at link time: 448 bytes for the queue storage, 2 KB for the task stack, plus the control structures.

Memory and Latency: The Numbers to Reason About

ParameterTypical Value (STM32H7 @ 480 MHz)Notes
Critical section entry/exit~3–5 cycles (~10 ns)BASEPRI write + ISB
xQueueSend, 4-byte item~0.5 µsIncludes critical section + memcpy
xQueueSend, 28-byte struct~1–2 µsmemcpy dominates
xQueueSend, 256-byte struct~10–20 µsConsider pointer-to-pool instead
Context switch on unblock~1–3 µsPendSV latency on Cortex-M7
ISR→task latency (queue)~3–8 µsISR post + PendSV + task entry

The practical sizing rule: compute your worst-case burst — how many items can arrive before the consumer runs? A 1 kHz ISR, a consumer task that can be preempted for up to 5 ms by a higher-priority task: worst case is 5 items in flight. Multiply by 3–4 as a margin and you have your queue depth. A depth of 16 for a 1 kHz source is conservative but reasonable. A depth of 2 is an overrun waiting to happen.

For items larger than ~64 bytes where copy latency matters, consider a memory pool (FreeRTOS pvPortMalloc with a fixed-block allocator, or a hand-rolled static pool) and queue a pointer to the pool block instead. The queue then holds a single pointer (4 bytes), the copy is trivial, and the large buffer is transferred by pointer ownership. This requires careful ownership semantics — the sender must not reuse the block until the receiver has consumed and released it — but the latency profile is dramatically better.

Anti-Patterns: Bugs That Compile, Pass a Smoke Test, and Fail in the Field

1. Queuing a Pointer to a Stack-Allocated Struct

/* WRONG */
void SomeTask(void *pv)
{
    for (;;) {
        SensorFrame_t frame = BuildFrame();
        xQueueSend(s_queue, &frame, 0);  /* sends pointer TO frame... */
        /* ...but the queue copies the POINTER VALUE, not frame itself */
        /* Wait, no — the queue copies sizeof(SensorFrame_t *) bytes   */
        /* if uxItemSize was set to sizeof(SensorFrame_t *) at creation */
    }
}

This one requires a setup mistake to trigger: the queue was created with uxItemSize = sizeof(SensorFrame_t *) instead of sizeof(SensorFrame_t). The queue stores a pointer. The receiver dereferences it. By the time the receiver runs, the stack frame is gone. You get whatever garbage now lives at that address. This works perfectly in a low-load test (sender blocks, receiver runs immediately, stack frame still intact) and corrupts data in production (sender continues running, stack frame is overwritten).

2. Calling the Task-Context API from an ISR

/* WRONG — will corrupt the kernel */
void TIM3_IRQHandler(void)
{
    SensorFrame_t frame = BuildFrame();
    xQueueSend(s_queue, &frame, 0);  /* NOT the FromISR variant */
}

xQueueSend calls taskENTER_CRITICAL, which on Cortex-M modifies BASEPRI. Inside an ISR, BASEPRI is already managed by the hardware; calling the task-context variant disrupts the kernel's interrupt-masking state. In the best case you get an immediate fault. In the worst case — if the queue isn't full and no task is waiting — it appears to work until the first time a context switch interleaves with the corrupted state. Always use the FromISR suffix in interrupt context.

3. Ignoring the Return Value of FromISR

/* WRONG — silent data loss */
void ADC_IRQHandler(void)
{
    SensorFrame_t frame = BuildFrame();
    BaseType_t woken = pdFALSE;
    xQueueSendFromISR(s_queue, &frame, &woken);  /* return value discarded */
    portYIELD_FROM_ISR(woken);
}

If the queue is full, xQueueSendFromISR returns errQUEUE_FULL and the frame is dropped. With no diagnostics, the application runs normally — just silently missing samples. In a sensor pipeline this means corrupted data. In a command pipeline it means lost commands. Always check the return value and maintain an overrun counter.

4. Skipping portYIELD_FROM_ISR

/* WRONG — latency bug, not a crash */
void DMA_IRQHandler(void)
{
    SensorFrame_t frame = BuildFrame();
    BaseType_t woken = pdFALSE;
    xQueueSendFromISR(s_queue, &frame, &woken);
    /* portYIELD_FROM_ISR(woken) omitted */
}

The processing task is unblocked — it's in the ready list — but it won't run until the scheduler's next tick (1 ms at configTICK_RATE_HZ = 1000). If your system is designed for sub-millisecond ISR-to-task latency, this silently breaks that guarantee. It passes every functional test. It only surfaces when you measure latency with a logic analyzer or when a real-time deadline is missed.

5. Undersized Queue Depth

A queue of depth 1 or 2 for a 1 kHz ISR source will overrun the moment a higher-priority task holds the CPU for more than 1–2 ms — which is normal operation in any non-trivial firmware. The failure mode is intermittent overruns that correlate with unrelated activity (USB enumeration, flash write, long SPI transaction) and produce data gaps that are nearly impossible to reproduce in a test environment. Size queues based on worst-case consumer preemption time, not average-case throughput.

6. Sending a Pointer to a Shared Buffer Without Ownership Transfer

/* WRONG — race condition on shared buffer */
static SensorFrame_t s_sharedFrame;

void ADC_IRQHandler(void)
{
    s_sharedFrame = BuildFrame();              /* write to shared buffer */
    BaseType_t woken = pdFALSE;
    xQueueSendFromISR(s_queue, &s_sharedFrame, &woken);  /* copies pointer value */
    /* Next ISR fires 1 ms later, overwrites s_sharedFrame               */
    /* Receiver hasn't run yet — it will read the overwritten version     */
    portYIELD_FROM_ISR(woken);
}

Here the queue was created with uxItemSize = sizeof(SensorFrame_t *) — it stores a pointer. The ISR writes to the shared buffer and queues a pointer to it. Before the receiver runs, the ISR fires again and overwrites the buffer. The receiver sees the latest write, not the one it was queued for. Works in tests (one ISR, one receive, no overlap). Fails under load. The fix is to either queue by value (copy the full struct) or use a proper memory pool where each slot is independently owned.

7. Using xQueuePeek as a Synchronization Mechanism

xQueuePeek copies the front item without removing it. A common misuse: one task peeks to check if data is ready, then another task receives it. Between the peek and the receive, any other task or ISR can drain the queue. This is a classic TOCTOU (time-of-check to time-of-use) race. If you need to "look before you receive," use a mutex-protected flag or redesign so only one consumer exists for that queue. See the upcoming event-driven architecture post for correct single-consumer patterns.

Best Practices

  • Always use static allocation. xQueueCreateStatic gives you a fixed, known RAM footprint. In certified firmware, dynamic allocation during normal operation is typically prohibited.
  • Always configASSERT the handle immediately after creation. A null handle from a static-allocation failure means a coding error (wrong sizes, null pointers) that must die at boot, not corrupt data silently at runtime.
  • Use the FromISR variants exclusively in interrupt context. If you're not sure whether you're in an ISR, you have a design problem — ISR vs. task context should always be unambiguous at the call site.
  • Always call portYIELD_FROM_ISR. The cost when pxHigherPriorityTaskWoken is pdFALSE is zero. The cost of skipping it when it's pdTRUE is a latency regression that may violate real-time requirements.
  • Check every return value. Queue full and queue empty are legitimate runtime conditions, not errors. Handle them explicitly (diagnostic counter, drop policy, back-pressure) rather than ignoring them.
  • Size queues for worst-case burst, not average rate. Compute the maximum number of items that can arrive while the consumer is preempted by higher-priority work, then add margin.
  • Keep item size small. For structs larger than ~64 bytes, consider a static memory pool and queue a pointer to a pool block. Profile the critical-section duration on your actual target before committing to large-item queues in timing-critical paths.
  • One consumer per queue. Multiple consumers from a single queue require careful analysis of which consumer receives which item. In almost every case, this is better modeled as separate queues or a different primitive.

Interview Questions

Q1: A queue is created with uxItemSize = sizeof(MyStruct_t) and depth 8. The sender posts a pointer to a local variable: xQueueSend(q, &localVar, 0). Is this correct?

Yes — xQueueSend copies sizeof(MyStruct_t) bytes from the address of localVar into the queue's internal storage. The local variable's scope is irrelevant after the call returns. The bug would be if the queue were created with uxItemSize = sizeof(MyStruct_t *) and a pointer were passed — then the queue stores the pointer value, and the pointer becomes dangling when the local variable goes out of scope.

Q2: Why does FreeRTOS provide separate FromISR variants instead of auto-detecting ISR context?

Auto-detection is possible (ARM's IPSR register reveals ISR context) but has problems: it adds overhead on every call, it doesn't map cleanly to the yield mechanism (you need the pxHigherPriorityTaskWoken output parameter regardless), and it hides what should be an explicit design decision at the call site. Explicit ISR variants make the ISR boundary visible in code review and static analysis.

Q3: Two tasks both call xQueueReceive with portMAX_DELAY on the same queue. An ISR posts one item. Which task receives it?

The highest-priority task waiting on the queue. FreeRTOS maintains the blocked-on-receive list in priority order. If both tasks have equal priority, the one that has been waiting longest (FIFO within a priority level).

Q4: xQueueSendFromISR returns pdTRUE and sets *pxHigherPriorityTaskWoken = pdTRUE. The ISR then does additional work before calling portYIELD_FROM_ISR. Is there a problem?

No. portYIELD_FROM_ISR merely pends PendSV; it does not immediately switch context. PendSV fires after the ISR returns (it has the lowest hardware priority on Cortex-M). Doing additional work after setting the yield is fine. The switch happens when the ISR exits and PendSV is serviced.

Q5: A queue of depth 10, item size 4 bytes, is shared between three ISRs and one consumer task. The task runs at high priority. Under what condition does the queue overflow?

If the aggregate post rate from all three ISRs exceeds the consumer's drain rate for long enough to fill all 10 slots. This is most likely when the consumer task is preempted by a task at higher priority, or when all three ISRs fire in close succession (burst condition). The depth of 10 provides a burst buffer; the critical question is whether the worst-case burst fits in 10 slots before the consumer can drain them.

Q6: You're seeing intermittent data loss in a sensor pipeline — roughly 1 in 500 frames is missing — but only when a USB stack task is running. Queue depth is 4. How do you diagnose and fix it?

The USB task is preempting the sensor consumer at the exact wrong time, allowing the ISR to fill the 4-slot queue before the consumer drains it. Diagnosis: add an overrun counter to the ISR's errQUEUE_FULL branch and log it. Fix: increase queue depth to cover the worst-case USB task preemption window (measure USB task CPU time, multiply by ISR rate), or lower the USB task priority below the sensor consumer.

Summary

FreeRTOS queues are a copy-by-value FIFO with two blocked-task lists and a short critical section protecting every operation. The API split between task-context and ISR-context variants is a hard boundary: cross it in the wrong direction and you corrupt kernel state. The copy semantics are a guarantee and a cost — guarantee that the sender and receiver own independent copies; cost that large items extend the critical-section duration. Queue depth is a real-time design decision, not a guess — compute it from worst-case burst and consumer preemption time. And the anti-patterns that actually cause field failures are not conceptual misunderstandings of the API; they are copy-vs-pointer confusion, wrong-context API calls, and silent discard of overrun conditions.

What's Next

The next lesson covers semaphores and mutexes — and the crux there is the ownership model. A semaphore has no owner; a mutex does, and that ownership is what enables priority inheritance. Once you understand how FreeRTOS queues work at the ISR boundary (which you now do), the mutex priority-inheritance mechanism — where a low-priority task temporarily inherits the priority of a high-priority blocker — fits naturally into the same mental model of blocked-task lists and scheduler-driven unblocking. That's also where the classic priority inversion scenario lives, and why getting it wrong in production is a category of failure that doesn't show up until you're running at full task load on a board with realistic interrupt rates.

Leave a Reply

Your email address will not be published. Required fields are marked *