Shrink Your MCU code size with GCC ARM Embedded 4.7
September 11, 2013
GNU Tools for ARM embedded processors, or GCC ARM Embedded for short, version 4.7 is now available.
The previously released version, 4.6, had more than 30,000 downloads. As well as new features such as MAC OS hosting, GDB enhancement, and other optimizations, the most exciting feature in version 4.7 is the reduction in generated code size.
Why code size?
The reason, and something that most MCU software developers already know, lies in the extreme resource limitation and cost sensitivity of MCU programming. For those who haven't experienced this, here are some quotations from some of our users:
"Please please please remember that we are seeing more and more memory limited parts in this world - for example, 4KB flash, 1KB RAM - and every word of "stack space" used, never mind the flash size consumed by code."
"If the total code size exceeds the internal flash memory of the MCU (as in my case) I must ..."
GCC ARM Embedded 4.7 reduces code size by optimizing the compiler and associated libraries
Optimizing the compiler for generated code size is nothing new. GCC with optimization level Os will generate code that is smaller in size. But most of the active development on GCC is more focused on performance at the moment, and this leaves more room for size optimization to catch up. GCC ARM Embedded 4.7 includes the latest code size optimizations committed by ARM compiler team.
Among the many code size optimizations, there is basic block reordering for size, which reorders the basic blocks to reduce long jumps. Also, there is hoisting enhancement, which attempts to extract as many common expressions as possible to a common predecessor while keeping register pressure reasonably low. Other optimizations include more hard register copying and less use of ARM higher 8 core registers (refer to ARMv6-M Architecture Reference Manual). Measured on an ARM Cortex-M0 processor with code size benchmarks, version 4.7 with Os generates 2% less code when compared to previous versions.
The diet plan for libraries
Libraries also need optimizing, because the libraries included in GCC ARM Embedded were not actually designed for MCU programming. Newlib, the C library in the toolchain, implements printf functions that are so complicated they require about 37K bytes of FLASH and 5K bytes of RAM to run a simple hello-world program. That's far too large for MCU programming where you might need printf functionality for debugging and logging purposes. The good news is that there is plenty of unnecessary "fat" in libraries that can be cut.
The diet plan for libraries is to cut the unnecessary features, re-implement features with simpler logic, and build while optimizing for size. It results in a set of new libraries called newlib-nano. Namely based on newlib, but with a much smaller size.
Newlib-nano cuts some features that were added after C89, which are believed to be rarely used in MCU programming. By limiting the format converter to the C89 standard, format string processing code in printf is greatly reduced. By removing the iov buffering, all IO function sizes are again significantly reduced. Removal of wide char support in non-wide char function further squeezes string IO logic. Newlib-nano also extensively uses the weak symbol technique to exclude features that are rarely used in normal MCU programs. For example, referencing floating point IO and atexit as weak functions dramatically cuts the size of printf() and exit().
Newlib-nano also re-implements memory allocation functions, to replace the original ones that have overall better performance but with lots of complex logic which increases code size. The so called nano-allocator uses simple and native algorithms to handle allocation, de-allocation, and defragmentation. It works effectively when the total memory that can be allocated is small. More importantly, it is only about one sixth of the original size.
Newlib-nano is built with optimization level Os. This results in smaller memcpy and memset because newlib chooses a simple version of these functions when it finds them built with Os. It also discards some optimizations in C++ libraries that are large. An additional build flag for newlib-nano is -fno-exception, which disables the exception handling of libraries. This is acknowledged to be acceptable by some MCU C++ developers.
Conclusion
To summarize, the newlib-nano can cut the size of hello-world programs by around 80%. In extreme cases for C++ programs, the size reduction could exceed 90%.
It is easy to use newlib-nano in real projects with GCC ARM Embedded 4.7. Normally, it is only necessary to specify one additional linker option. Driver specifications in the toolchain will link with newlib-nano libraries instead of normal libraries.
Patches included in this release are either already in mainline, or on the way to the mainline. It will take some time to upstream aggressive changes to newlib-nano.
Overall, GCC ARM Embedded 4.7 represents a big leap in the open source Cortex-M development toolchain. Why not check it out yourself below?