Boost the Speed of your STM32 Microcontrollers by 31% Using Core-Coupled Memory

Boost the Speed of your STM32 Microcontrollers by 31% Using Core-Coupled Memory


When working on projects with computation-intensive routines and (or) near real-time performance requirements, having a “lightning-fast” RAM is usually a good thing for developers. This is one of the reasons while STMicro included the Core Coupled Memory (CCM) RAM  in a good number of its STM32 microcontroller series, and Dim Tass recently demonstrated how to use it, in a blog post on his website.

Core Coupled Memory (CCM), unlike flash storage, offers high performance and a zero wait-state that allows the execution of instructions at a fraction of the time it takes when running the firmware from flash storage.  According to STMicro, it is was included in the microcontrollers for use in scenarios that involve “real-time and computation-intensive routines [including] digital power conversion control loops (switch-mode power supplies, lighting), field-oriented 3-phase motor control, [and] real-time DSP (digital signal processing)”.

Describing CCM, Tass referred to it as potentially one of the features used by STM to set the microcontrollers with it, apart. In his words, “Vendors need to make themselves stand out from their competitors and this is done in many different ways. Of course, the most important is the price, but some times that’s not enough, because even the low price doesn’t mean that the controller fits your project”.

For the demo showcasing how developers can use the CCM, Tass made use of an STM32F303CC development board, which has 256kB of flash storage, 40kB of static RAM (SRAM) and 8kB of Core Coupled Memory(CCM) RAM. For the firmware, he adopted the LZ4 compression algorithm as a benchmark, along with a custom CMake that allows execution on flash  SRAM, and CCM RAM. Executing the LZ4 compression algorithm at different clock speeds on the flash, the SRAM, and the CCM. At the default board clock speed of 72MHz and a block size of 8k, executing the LZ4 algorithm from the flash took between 279 and 304 milliseconds. Moving to the SRAM dropped the runtime further to 251ms, but switching to CCM lowered it still further to 172ms. To further test the limits, Tass overclocked the device to get a clock speed of 128MHz and tested the performance of all three memories again. At the new clock speed with the same block size as before, execution time dropped to between 156-171ms on flash memory, 141 on the SRAM, 97ms on the CCM.

Speaking on the performance of the CCM, Tass said,

“I was expecting that it would be a bit faster, but I didn’t expect that the difference would be that great. 31% faster is a lot of performance gain, you can’t ignore this, especially in time-critical code.”

The project, including all the files, is documented on Tass’s Website. 

About Emmanuel Odunlade

Hardware Design Engineer | #IoT Consultant |All things #ML | Entrepreneur | Serial Writer | Passionate about Innovation and technology as tools for solving problems in developing countries. Spare time is spent around writing and advocacy for the growth of the Maker/DIY Culture in Africa.

view all posts by emmanuel
Notify of

Inline Feedbacks
View all comments
Subscribe to Blog via Email

After subscribing you can choose how often you will receive our updates:

Join 97,500 other subscribers