Wednesday, August 27, 2014

ARM Cortex-A53 core emerges as a viable solution for a wide range of performance targets

Cortex-A53 seeing strong adoption for upcoming SoCs

We have already seen that ARM's Cortex-A7, an extremely low cost in-order pipeline CPU core optimized for power efficiency, has been a perfect match for current high-performance mainstream 28nm process technology, and multi-core Cortex-A7 CPUs currently drive the vast majority of smartphones in the market, except the premium segment.

At the same time, the new Cortex-A53 is seeing very strong adoption for new SoCs for which production is currently ramping up as well as upcoming platforms. The Cortex-A53, an in-order pipeline CPU that is basically an extension of the Cortex-A7 to ARM's 64-bit ARMv8 instruction set architecture with somewhat higher performance, has recently been adopted for mainstream platforms by leading smartphone SoC providers Qualcomm and MediaTek, in products spanning virtually the entire spectrum from entry-level to premium-level.

Issues remain with ARM's 32-bit performance cores such as Cortex-A17

Meanwhile, the latest revision of ARM's established high-performance 32-bit core, the Cortex-A15 r3p3, seems to have finally become mature enough to be used in a leading smartphone platform, the Samsung Galaxy Alpha, in the form of the big.LITTLE-based Exynos 5430 manufactured using Samsung's leading-edge 20nm process. The more efficient heterogeneous multi-processing (Global Task Swiching) implementation of big.LITTLE also seems have reached maturity and actually usability in practice. However, the power efficiency of any Cortex-A15 core, even after numerous optimizations, is still likely to be mediocre.

Doubts still abound the newer 32-bit ARM cores that were supposed to fix the high power consumption of the Cortex-A15, namely the Cortex-A12 and Cortex-A17. Few new chips using these cores have reached stable volume production yet, with suggestions of continuing power consumption issues and a learning curve to achieve stable volume production. Leading SoC companies like MediaTek, Allwinner and Rockchip earlier announced SoCs with these cores in a big.LITTLE configuration or as a straightforward quad-core, but the path to market arrival already appears to be longer than expected for most of these platforms, with several of them not likely to ship commercially at all or only into low volume non-mobile applications where power consumption is less of an issue.

Cortex-A53 applicable to a wide range of performance points

The issues with cores such as the Cortex-A17 seem to have accelerated the move towards the Cortex-A53, ironically not because of the Cortex-A53's support for the 64-bit ARMv8 architecture, but much more because it is conveniently positioned as as faster version of the proven Cortex-A7, well suited to the latest process nodes, allowing it to be clocked higher than Cortex-A7 in addition to being intrinsically faster. The Cortex-A53 is likely to used mainly as a 32-bit CPU in the short term (being fully compatible with the 32-bit ARMv7-A instruction set architecture).

It can provide high performance with relatively low power consumption in configurations with a substantial number of cores (such as eight), making it suitable even for the premium devices, while being suitable for lower segments in configurations with a smaller number of cores (such as four). Clock frequency targets can be adjusted for a particular segment (ARM offers specific support for optimizing a particular core either for more speed or for better power efficiency) . At the same time, because the die size is not much greater than the Cortex-A7 (which has a very small die size), and several times smaller than more performance-oriented CPU cores, a high number of Cortex-A53 cores can be used without serious implications for manufacturing cost. In effect, the move to  Cortex-A53-based architectures for performance-oriented SoCs is likely to dramatically lower cost while also greatly improving power efficiency.

Leading platforms using Cortex-A53 targeting volume production in 2H 2014

Already, several configuration types of SoC using only Cortex-A53 cores have been announced, spanning much of the performance spectrum:

Quad-core Cortex-A53:
  • MediaTek MT6732, speed quoted as 1.5 GHz, targeting entry-level devices, expected to arrive this year.
  • Qualcomm Snapdragon 410 (MSM8916), 1.4 or 1.2 GHz, was scheduled to be sampling in Q2 and likely already in production. As of the IFA trade show early September, numerous new smartphones using this platform have already been announced and are starting to become commercially available.
  • Qualcomm Snapdragon 610 (MSM8396), 1.8 GHz, sampling planned for Q3 2014.
Octa-core Cortex-A53 (symmetric, all cores can clock up to same maximum speed):
  • MediaTek MT6752, 1.7 GHz, targeting mainstream devices, also expected to arrive this year.
  • MediaTek MT6795, speed quoted as 2.2 GHz, targeting premium devices, scheduled for a Q4 2014 introduction. It does not look like a coincidence that the model number of this chip is similar to the big.LITTLE MT6595 using Cortex-A17 and Cortex-A7 cores that was anounced much earlier, strongly suggesting that the MT6595 will have relatively limited viability and that the MT6795 is in fact the smarter, cheaper and more power-efficient replacement for it.
Octa-core Cortex-A53 in a pseudo-big.LITTLE configuration (four cores clocked higher, four clocked lower):
  • Qualcomm Snapdragon 615 (MSM8939), 1.8 GHz x4 + 1.0 GHz x4, sampling planned for Q3 2014.
Qualcomm has also announced SoCs using the Cortex-A53 in combination with higher-performance Cortex-A57 cores in a big.LITTLE configuration:
  • Qualcomm Snapdragon 810 (MSM8994), quad-core Cortex-A57 + quad-core Cortex-A53, sampling 2H 2014.
  • Qualcomm Snapdragon 808 (MSM8992), dual-core Cortex-A57 + quad-core Cortex-A53, sampling 1H 2015.
However, it remains to be seen whether the Cortex-A57 will improve upon the performance characteristics and suitability for leading-edge processes of cores such as the Cortex-A17, on which it is likely to be based. If we make the assumption that the two CPU cores inside the Apple A7 (manufactured at 28nm) have characteristics that are close to the Cortex-A57, which is not at all illogical, then that would mean that the Cortex-A57 is unlikely to be significantly more power efficient than preceding cores like Cortex-A15 and A17, although the jury would still be out for the potential improvement on a more advanced process such as 20nm.

Impact on multi-threading in major mobile operating systems

It is already becoming clear that many-core (for example octa-core) CPU configurations consisting of low-to-medium-performance, but very power-efficient cores like the Cortex-A7 or Cortex-A53 is the most cost-effective and power-efficient way to pursue higher peformance levels in modern mobile devices. In that respect, MediaTek's MT6592 octa-core Cortex-A7-based SoC released at the end of 2013, far from deserving some of the scolding it received from certain competitors and other observers, was in fact a revolutionary design and a sign of things to come.

In the Android OS, the presence of more CPU cores has a significant positive effect on performance and usability, without increasing power consumption and in practice actually facilitating low-power devices. To what extend an increase in the number cores from quad-core to octa-core contributes to performance has been debated.

The are suggestions that symmetrical (not big.LITTLE) octa-core CPU configurations, such as implemented in the MT6592, can in fact provide significant performance benefits for common use-cases. For example, although SoCs typically have a specific, proprietary video decoding/encoding core (VPU) to accelerate video playback with minimal power consumption by the CPU, the set of video standards supported by the VPU is often limited, and variations within a specific media format or the playback window configuration may require a full or partial fall-back to software decoding by the CPU. On the Android platform, the ffmpeg/libav platform used for software video decoding can readily take advantage of the extra cores, essentially doubling processing capacity, which can easily make difference between smooth or unacceptably stuttering video playback. Another example is a multi-window UI as offered with certain Android platforms, allowing multiple applications to run in the "foreground" concurrently, each potentially using several threads/cores. Finally, the ubiquitous Chrome browser, a very common use-case, is inherently multi-threaded.

For other operating systems, the situation may be less clear. For example, there is likely to be much less emphasis on multi-threaded applications in Apple's iOS, based on a long legacy of application processors limited to one or two cores with increasing performance, the latest of which was the Apple A7. If the Apple A8 in the upcoming iPhone 6 and other upcoming Apple devices in fact uses Cortex-A53 "class" CPU cores in a many-core configuration (which I would not rule out at all), then that would have repercussions for iOS application development by stimulating the use of a much higher degree of multi-threadedness to take better advantage of the new processor.

Sources: Wikipedia (Snapdragon (system on chip)), Wikipedia (MediaTek), ARM (Cortex-A53), ARM (Cortex-A7), ARM (Cortex-A17), ARM (POP IP), DigiTimes (64-bit AP shipments growing fast in 2H 2014)

Updated September 19, 2014.
Updated January 9, 2015 (rephrase statement about viability of MT6595).

No comments: