STM32-Cube IDE- FreeRTOS Memory Management part 2.

STM32-Cube IDE- FreeRTOS Memory Management part 2.

Task

Tasks are implemented as C functions. The only thing special about them is their prototype, which must return void and take a void pointer parameter.

An application can consist of many tasks. If the processor running the application contains a single core, then only one task can be executing at any given time. This implies that a task can exist in one of two states, Running and Not Running. This simplistic model is considered first—but keep in mind that it is an over simplification. Later in the chapter it is shown that the Not Running state actually contains a number of sub-states.

When a task is in the Running state the processor is executing the task’s code. When a task is in the Not Running state, the task is dormant, its status having been saved ready for it to resume execution the next time the scheduler decides it should enter the Running state. When a task resumes execution, it does so from the instruction it was about to execute before it last left the Running state.

Thread State & State Transition

Thread states

Threads can be in the following states:

  • RUNNING: The thread that is currently running is in the RUNNING state. Only one thread at a time can be in this state.
  • READY: Threads which are ready to run are in the READY state. Once the RUNNING thread has terminated, or is BLOCKED, the next READY thread with the highest priority becomes the RUNNING thread.
  • BLOCKED: Threads that are blocked either delayed, waiting for an event to occur or suspended are in the BLOCKED state.
  • TERMINATED: When osThreadTerminate is called, threads are TERMINATED with resources not yet released (applies to joinable threads).
  • INACTIVE: Threads that are not created or have been terminated with all resources released are in the INACTIVE state.

FreeRTOS Memory allocation

FreeRTOS manages own heap for:

FreeRTOS uses a region of memory called Heap (into the RAM) to allocate memory for tasks, queues, timers , semaphores, mutexes and when dynamically creating variables. FreeRTOS heap is different than the system heap defined at the compiler level.

  • When FreeRTOS requires RAM instead of calling the standard malloc it calls PvPortMalloc(). When it needs to free memory it calls PvPortFree() instead of the standard free().
  • FreeRTOS offers several heap management schemes that range in complexity and features. It includes five sample memory allocation implementations, each of which are described in the following link: http://www.freertos.org/a00111.html
  • The total amount of available heap space is set by configTOTAL_HEAP_SIZE which is defined in FreeRTOSConfig.h.
  • The xPortGetFreeHeapSize() API function returns the total amount of heap space that remains unallocated (allowing the configTOTAL_HEAP_SIZE setting to be optimized). The total amount of heap space that remains unallocated is also available with xFreeBytesRemaining variable for heap management schemes 2 to 5.
ItemBytes Used
Scheduler Itself236 bytes (can easily be reduced by using smaller data types).
For each queue you create, add76 bytes + queue storage area (see FAQ Why do queues use that much RAM?)
For each task you create, add64 bytes (includes 4 characters for the task name) + the task stack size.

Task Control Block

  • Each created task (including the idle task) requires a Task Control Block (TCB) and a stack that are allocated in the heap.
  • The TCB size in bytes depends of the options enabled in the FreeRTOSConfig.h.
  • With minimum configuration the TCB size is 24 words i.e 96 bytes.
  • if configUSE_TASK_NOTIFICATIONS enabled add 8 bytes (2 words)
  • if configUSE_TRACE_FACILITY enabled add 8 bytes (2 words)
  • if configUSE_MUTEXES enabled add 8 bytes (2 words).
  • The task stack size is passed as argument when creating at task. The task stack size is defined in words of 32 bits not in bytes.
  • osThreadDef(Task_A, Task_A_Function, osPriorityNormal, 0, stacksize );
  • FreeRTOS requires to allocate in the heap for each task :
    • number of bytes = TCB_size + (4 x task stack size)
  • configMINIMAL_STACK_SIZE defines the minimum stack size that can be used in words. the idle task stack size takes automatically this value

The necessary task stack size can be fine-tuned using the APIuxTaskGetStackHighWaterMark() as follow:

  • Use an initial large stack size allowing the task to run without issue (example 4KB)
  • The API uxTaskGetStackHighWaterMark() returns the minimum number of free bytes (ever encountered) in the task stack. Monitor the return of this function within the task.
  • Calculate the new stack size as the initial stack size minus the minimum stack free bytes.
  • The method requires that the task has been running enough to enter the worst path (in term of stack consumption)

Queue , Timer & Semaphore Memory Details

  • FreeRTOS requires to allocate in the heap for each message queue:
    • number of bytes = 76 + queue_storage_area.
    • queue_storage_area (in bytes) = (element_size * nb_elements) + 16
  • When Timers are enabled (configUSE_TIMERS enabled) , the scheduler creates automatically the timers service task (daemon) when started. The timers service task is used to control and monitor (internally) all timers that the user will create. The timers task parameters are set through the fowling defines :
    • configTIMER_TASK_PRIORITY : priority of the timers task
    • configTIMER_TASK_STACK_DEPTH : timers task stack size (in words)
  • To save heap size (i.e RAM footprint) it is recommended to disable the define “configUSE_TIMERS” when timers are not used by the application
  • The scheduler also creates automatically a message queue used to send commands to the timers task (timer start, timer stop …)
  • The number of elements of this queue (number of messages that can be hold) are configurable through the define:
    • configTIMER_QUEUE_LENGTH
  • Each semaphore declared by the user application requires 88 bytes to be allocated in the heap.
  • Each mutex declared by the user application requires 88 bytes to be allocated in the heap.
  • To save heap size (i.e RAM footprint) it is recommended to disable the define configUSE_MUTEXES when mutexes are not used by the application (task TCB static size being reduced)

How can we reduce the amount of RAM used?

  • FreeRTOS+Trace can trace memory allocation and memory free events, and so be useful in analysing and therefore optimising memory usage.
  • In most cases direct to task notifications can be used in place of binary semaphores. Unlike binary semaphores, which are generic objects that have to be created, direct to task notifications are sent directly to a task and do not use any RAM.
  • Each flag (bit) in an event group can be used as a binary semaphore, so replace multiple binary semaphores with a single event group.
  • Use the uxTaskGetStackHighWaterMark() function to see which tasks can be allocated a smaller stack.
  • Use the xPortGetFreeHeapSize() and (where available) the xPortGetMinimumEverFreeHeapSize() API functions to see how much FreeRTOS heap is being allocated but never used, and adjust accordingly.
  • If heap_1.c, heap_2.c, heap_4.c or heap_5.c are being used, and nothing in your application is ever calling malloc() directly (as opposed to pvPortMalloc()), then ensure the linker is not allocated a heap to the C library because it will never get used.
  • Set configMAX_PRIORITIES and configMINIMAL_STACK_SIZE (found in portmacro.h) to the minimum values acceptable to your application.
  • Recover the stack used by main(). The stack used upon program entry is not required once the RTOS scheduler has been started (unless your application calls vTaskEndScheduler(), which is only supported directly in the distribution for the PC and Flashlite ports, or uses the stack as an interrupt stack as is done in the ARM Cortex-M and RX ports). Every task has its own stack allocated so the stack allocated to main() is available for reuse once the RTOS scheduler has started.
  • Minimise the stack used by main(). The idle task is automatically created when you create the first application task. The stack used upon program entry (before the RTOS scheduler has started) must therefore be large enough for a nested call to xTaskCreate() (or xTaskCreateStatic()). Creating the idle task manually can half this stack requirement. To create the idle task manually:
    1. Locate the function prvInitialiseTaskLists() in Sourcetasks.c.
    2. The idle task is created at the bottom of the function by a call to xTaskCreate(). Cut this line from Sourcetasks.c and paste it into main().
  • Rationalise the number of tasks. The idle task is not required if:
    1. Your application has a task that never blocks, and …
    2. Your application does not make any calls to vTaskDelete().
  • Reduce the data size used by the definition BaseType_t (this can increase execution time).
  • There are other minor tweaks that can be performed (for example the task priority queues don’t require event management), but if you get down to this level – you need more RAM!

How is RAM allocated to tasks?

If a queue is created using the xQueueCreate() API function then the RAM required by the queue is allocated inside the xQueueCreate() API function from the FreeRTOS heap.

If a task is created using the xTaskCreateStatic() API function then the RAM required by the task is provided by the application writer, and no memory allocation occurs.

If a task is created using the xTaskCreate() API function then the RAM required by the task is allocated inside the xTaskCreate() API function from the FreeRTOS heap.

The stack used by main() is not used by tasks, but (depending on the port) may be used by interrupts.

How is RAM allocated to queues?

If a queue is created using the xQueueCreateStatic() API function then the RAM required by the queue is provided by the application writer, and no memory allocation occurs.

How big should the stack be?

Tasks can be created using either the xTaskCreate() or xTaskCreateStatic() API function. The function’s usStackDepth parameter specifies the size of the stack that will be allocated to the task being created (in words, not bytes!). It is common for people to ask how to determine the usStackDepth value, but, except in one way described below, there is little difference between determining how much stack is required when using an RTOS than when writing a bare metal application (an application that does not use an operating system).

Exactly as when writing a bare metal application, the amount of stack required is dependent on the following application specific parameters:

  • The function call nesting depth
  • The number and size of function scope variable declarations
  • The number of function parameters
  • The processor architecture
  • The compiler
  • The compiler optimization level
  • The stack requirements of interrupt service routines – which for many RTOS ports is zero as the RTOS will switch to use a dedicated interrupt stack on entry to an interrupt service routine.

The processor context is saved onto a task’s stack each time the scheduler temporarily stops running the task in order to run a different task. The saved processor context is then popped off the task’s stack the next time the task runs. The stack space required to save the processor context is the only addition to a task’s stack requirement that comes from the RTOS itself.

Creating STM32 executable projects steps are available on this link , to know more about FreeRTOS Tasks & Memory management. now we will start with some memory optimization & analysis technique

Use of uxTaskGetStackHighWaterMark()

Go to Middle ware > FreeRTOS > Include parameters > uxTaskGetStackHighWaterMark > Enable

Keep your Heap size as default (3072 Bytes) , create two tasks myTask02 , & default task.

uxTaskGetStackHighWaterMark setting

2. Create myTask02

Priority: osPriorityLow
Stack Size: 128 Words
Entry Function: StartTask02
Code Generation: Default
Parameter: NULL
Allocation: Dynamic

Default task & Task2 with 128 word stack size
/* USER CODE END Variables */
osThreadId defaultTaskHandle;
osThreadId myTask02Handle;

/* Private function prototypes -----------------------------------------------*/
/* USER CODE BEGIN FunctionPrototypes */

void MX_FREERTOS_Init(void) {
  /* USER CODE BEGIN Init */

  /* USER CODE END Init */

  /* Create the thread(s) */
  /* definition and creation of defaultTask */
  osThreadDef(defaultTask, StartDefaultTask, osPriorityNormal, 0, 128);
  defaultTaskHandle = osThreadCreate(osThread(defaultTask), NULL);

  /* definition and creation of myTask02 */
  osThreadDef(myTask02, StartTask02, osPriorityIdle, 0, 128);
  myTask02Handle = osThreadCreate(osThread(myTask02), NULL);

  /* USER CODE BEGIN RTOS_THREADS */
  /* add threads, ... */
  /* USER CODE END RTOS_THREADS */
}

  /* USER CODE BEGIN RTOS_THREADS */
  /* add threads, ... */
  /* USER CODE END RTOS_THREADS */
/* USER CODE BEGIN Header_StartDefaultTask */
/**
  * @brief  Function implementing the defaultTask thread.
  * @param  argument: Not used
  * @retval None
  */
/* USER CODE END Header_StartDefaultTask */
void StartDefaultTask(void const * argument)
{
  /* USER CODE BEGIN StartDefaultTask */
  	UBaseType_t DefaultTaskWaterMark;
	DefaultTaskWaterMark = uxTaskGetStackHighWaterMark(myTask02Handle);
  /* Infinite loop */
  for(;;)
  {
    osDelay(1000);
    DefaultTaskWaterMark = uxTaskGetStackHighWaterMark(myTask02Handle);
  }
  /* USER CODE END StartDefaultTask */
}

/* USER CODE BEGIN Header_StartTask02 */
/**
* @brief Function implementing the myTask02 thread.
* @param argument: Not used
* @retval None
*/
/* USER CODE END Header_StartTask02 */
void StartTask02(void const * argument)
{
  /* USER CODE BEGIN StartTask02 */
	UBaseType_t TaskTwoWaterMark;
	TaskTwoWaterMark = uxTaskGetStackHighWaterMark(myTask02Handle);

  /* Infinite loop */
  for(;;)
  {
    osDelay(1000);
    TaskTwoWaterMark = uxTaskGetStackHighWaterMark(myTask02Handle);
  }
  /* USER CODE END StartTask02 */
}

Use of xPortGetFreeHeapSize()

Keep your Heap size as default (3072 Bytes) , Allocate 624 bytes stack size to default & task2 , while calling xPortGetFreeHeapSize(), we will get remaining heap size 1824 bytes.

/* USER CODE BEGIN Header_StartDefaultTask */
/**
  * @brief  Function implementing the defaultTask thread.
  * @param  argument: Not used
  * @retval None
  */
/* USER CODE END Header_StartDefaultTask */
void StartDefaultTask(void const * argument)
{
  /* USER CODE BEGIN StartDefaultTask */
  //Read remaning heapsize 
  	uint32_t sizeofheap = xPortGetFreeHeapSize();
  /* Infinite loop */
  for(;;)
  {
    osDelay(1);
  }
  /* USER CODE END StartDefaultTask */
}

other way to verify remaining heap size is

Go to Middle ware > FreeRTOS >FreeRTOS Heap Usage >

Heap Usage in FreeRTOS

Stack Overflow Detection

Go to Middle ware > FreeRTOS > Config paramters > check for stack over flow > Option1/Option2

Stack Overflow Detection – Method 1

It is likely that the stack will reach its greatest (deepest) value after the RTOS kernel has swapped the task out of the Running state because this is when the stack will contain the task context. At this point the RTOS kernel can check that the processor stack pointer remains within the valid stack space. The stack overflow hook function is called if the stack pointer contain a value that is outside of the valid stack range.

This method is quick but not guaranteed to catch all stack overflows. Set configCHECK_FOR_STACK_OVERFLOW to 1 to use this method.

Stack Overflow Detection – Method 2

When a task is first created its stack is filled with a known value. When swapping a task out of the Running state the RTOS kernel can check the last 16 bytes within the valid stack range to ensure that these known values have not been overwritten by the task or interrupt activity. The stack overflow hook function is called should any of these 16 bytes not remain at their initial value.

This method is less efficient than method one, but still fairly fast. It is very likely to catch stack overflows but is still not guaranteed to catch all overflows.

Set configCHECK_FOR_STACK_OVERFLOW to 2 to use this method.

Enable Stack over flow callback function

After generating the code, below callback function will be available for user.

/* USER CODE BEGIN 4 */
__weak void vApplicationStackOverflowHook(xTaskHandle xTask, signed char *pcTaskName)
{
   /* Run time stack overflow checking is performed if
   configCHECK_FOR_STACK_OVERFLOW is defined to 1 or 2. This hook function is
   called if a stack overflow is detected. */
}
/* USER CODE END 4 */

Use of Stack Analyzer

The STM32CubeIDE Static Stack Analyzer calculates the stack usage based on the built program. It analyzes the .su files, generated by gcc, and the elf file in detail, and presents the resulting information in the view. The view contains two tabs, the List and Call Graph tabs.
The List tab is populated with the stack usage for each function included in the program. The tab lists one line per function, each line consisting of the Function, Local cost, Type, Location and Info columns.

As we know we have created two Tasks, allocated 624 bytes to each, default heap size is 3072 Bytes now if we read stack analyzer, using below analyzer window we can identify that max usage of stack area is 144 bytes

Max cost of each task stack is 144 bytes
/* USER CODE END Header_StartDefaultTask */
void StartDefaultTask(void const * argument)
{
  /* USER CODE BEGIN StartDefaultTask */
  //Read remaning heapsize 
  /* Infinite loop */
  for(;;)
  {
    osDelay(1);
  }
  /* USER CODE END StartDefaultTask */
}

/* USER CODE END Header_StartTask02 */
void StartTask02(void const * argument)
{
  /* USER CODE BEGIN StartTask02 */
  /* Infinite loop */
  for(;;)
  {
    osDelay(1000);
  }
  /* USER CODE END StartTask02 */
}

Now we are going to add 50 bytes of local buffer inside Task2 , if we verify stack analyzer max cost value is increase to 200 Bytes.

Max cost of each task2 is increased to 200 bytes

Use of Semaphore & Event log

Each flag (bit) in an event group can be used as a binary semaphore, so replace multiple binary semaphores with a single event group.

for demonstration purpose we are going to create one event group & one binary semaphore.

size of event group flags is 32 bytes & binary semaphore is 88 bytes.

Binary Semaphore memory utilization is 88 bytes
Even flags memory utilization is 32 bytes

Reference:

  1. freertos.org
  2. st.com
Software Tools:
  1. STM32CubeIDE
  2. STM32CubeMx
Conclusion:

Successfully demonstrated various way of memory optimization technique.

If you enjoyed this article, share your feedback.

Similar topics:
  1. FreeRTOS Tasks Creations
  2. FreeRTOS Memory Management

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: