Friday, June 5, 2015

Smartphone platforms migrate to 64-bit (AArch64) mode

Recently, most existing and new mobile SoCs have started to become available configured in native 64-bit mode (AArch64) in conjuction with a 64-bit version of Android 5. Although SoCs targeting premium-level devices that are already shipping were the first to support AArch64 (including Tegra K1-64, Exynos 7420 and Snapdragon 810), recent entries in the Geekbench results database show that cost-sensitive platforms are also migrating to native 64-bit mode in upcoming smartphones.

This move involves Cortex-A53-based platforms such as MediaTek's MT6735, MT6752, MT6753 and MT6795, Qualcomm's Snapdragon 615 (MSM8939) as well as a new Snapdragon 410 (MSM8916) platform (which was previously limited to ARMv7), and HiSilicon's Kirin 620 and Kirin 930.

Initial ARMv8 platforms used hybrid AArch32 mode


Several ARMv8 based SoCs have been shipping for some time, but most have been using AArch32 mode, a hybrid mode which takes advantage of some of the architectural improvements in ARMv8 but does not expose native 64-bit mode to applications. Snapdragon 410 did not even take any advantage of ARMv8, running in 100% ARMv7 mode.

One reason why full AArch64 mode has not been adopted right away is that is does come with a performance penalty due to the increased storage requirements for program code and pointers, which puts greater demands on the memory subsystem of the SoC. Cost-sensitive smartphone models are especially sensitive to this due to a lower amount of RAM and smaller on-chip CPU caches. A decrease in the price of RAM chips has allowed the amount of RAM in cost-sensitive models to increase (e.g. more devices shipping with 2GB RAM), making AArch64 mode more appealing.

AArch64 also has benefits, in particular for floating point and data-intensive applications that use NEON vector instructions.

Comparison of CPU benchmark results


The migration to AArch64 mode across the board makes it easier to compare CPU benchmarks of different SoCs, which was previously made more difficult by the fact that some SoCs used AArch64 mode while others were still limited to AArch32.

In the following sections, I will return to Geekbench CPU test results and try to make apples-to-apples comparison for different groups of SoCs.

Quad-core Cortex-A53 SoCs


Quad-core SoCs included are MT6732, MT6735 and Snapdragon 410. Note that the version of Snapdragon 410 tested most likely reflects a newer silicon revision that has not yet widely appeared in end devices, since previous versions of Snapdragon 410 (MSM8916) were always limited to ARMv7 mode (seemingly being unable to run in AArch32 mode).

The following table shows selected integer tests results from Geekbench entries for the mentioned SoCs, running in AArch64 mode.

SoC        Geekbench  Clock  JPEG Compress (int)      Lua (int)
           ref        speed  Single IPC   Multi Par   Single IPC   Multi Par

MT6732     2705430    1.50    783   1.36  3108  3.97   795   1.29  3017  3.79
MT6735     2650175    1.30    646   1.36  2604  4.03   656   1.23  2047  3.12
MSM8916-64 2708213    1.21    626   1.34  2481  3.96   615   1.24  1280  2.08

The table below shows selected floating point and memory results.

SoC        Geekbench  Clock  Mandelbrot (float)       Stream Copy (memory)
           ref        speed  Single IPC   Multi Par   Single Multi

MT6732     2705430    1.50    631   1.23  2490  3.95  1030   1156
MT6735     2650175    1.30    526   1.19  2091  3.98   901    965
MSM8916-64 2708213    1.21    508   1.23  1969  3.88   447    505

The "IPC" value as shown in the tables is an index calculated from a comparison with the performance of common Cortex-A7-based SoCs, normalized to the same clock speed. The parallelism value ("Par") is the performance scaling from single-core to multi-core for the specific Geekbench subtest.

The IPC values are fairly consistent, as would be expected from the same CPU core (Cortex-A53) running the same ISA (instruction set architecture). When scaling to multiple cores, MT6732 does best, as shown by the scaling in the Lua benchmarks. This is not surprising as MT6732 is not an entry-level SoC given its cost structure, being better described as belonging to the mid-range segment. It is likely to have a better memory subsystem (in particular, a larger and faster L2 cache) than the other chips.

MediaTek's new entry-level chip, MT6735, apart from running at a somewhat higher clock speed (1.3 GHz vs 1.2 GHz), outperforms the 64-bit version of Snapdragon 410 when normalized to the same clock speed, which is especially evident in the Lua multi-core test and memory tests. The Lua results could be a reflection of L2 cache size and/or speed. Memory performance (based on the Stream Copy subtest) of both MediaTek chips is roughly double that of Snapdragon 410 (something which was already evident in the respective 32-bit platform results).

Mid-range octa-core Cortex-A53-based SoCs


The octa-core Cortex-A53-based SoCs targeting the mid-range segment include MediaTek's performance-oriented MT6752, the recent cost-reduced MT6753, Qualcomm's Snapdragon 615 (MSM8939), and HiSilicon's Kirin 620 (Hi6210).

These SoCs use different CPU clock speed configurations. MediaTek's MT6752 and MT6753 run all cores at the same maximum clock speed, 1.66 GHz for MT6752 and (at least in the tested device) seemingly only about 1.1 GHz for MT6753, even though Geekbench reports a maximum clock speed of 1.3 GHz. HiSilicon's Kirin 620 can run all cores up to a maximum speed of 1.2 GHz.

Qualcomm's Snapdragon 615 uses a pseudo-big.LITTLE, hierarchical architecture with one performance cluster of four cores running up to 1.65 GHz in the most recent version of the platform (previous versions ran up to 1.5 GHz), with the other power-efficient cluster running at a significantly lower clock speed. MediaTek's annnouncement of the MT6755 (Helio P10) shows that MediaTek is also transitioning to a hierarchical CPU clusters for new chips, similar to Snapdragon 615.

Having one power-optimized CPU cluster helps power efficiency for low CPU demand scenarios such as smartphone standby or light usage. The fact that Snapdragon 615 is not very power efficient, despite the low-clocked cluster, in mostly due to the low-performance 28LP manufacturing process used.

The following table shows selected integer tests results from Geekbench entries for the mentioned SoCs, running in AArch64 mode.

SoC        Geekbench  Clock  JPEG Compress (int)      Lua (int)
           ref        speed  Single IPC   Multi Par   Single IPC   Multi Par

MSM8939    2704276    1.65    837   1.32  4269  5.10   789   1.16   667  0.85
MT6752     2709869    1.69    890   1.37  6719  7.55   907   1.31  6531  7.20
MT6753     2699665    1.10?   572   1.35  4298  7.51   587   1.30  4282  7.29
Hi6210     2704356    1.20    630   1.36  3473  5.51   626   1.27  2156  3.44

The table below shows selected floating point and memory results.

SoC        Geekbench  Clock  Mandelbrot (float)       Stream Copy (memory)
           ref        speed  Single IPC   Multi Par   Single Multi

MSM8939    2704276    1.65    661   1.17  4019  6.08    512   569
MT6752     2709869    1.69    714   1.24  5637  7.89   1024  1158
MT6753     2699665    1.10?   463   1.23  3597  7.77    802   958
Hi6210     2704356    1.20    506   1.24  3419  6.76    833  1030

IPC values are fairly consistent for MT6752, Hi6210 and MT6753 (when a likely clock speed of 1.1 GHz is assumed), but Snapdragon 615 consistently shows somewhat lower IPC, possibly related to the earlier revision (r0p1) of the Cortex-A53 core used. It is also possible that, similar to what seems to be the case for the MT6753 entry used (Meizu M2 note), the actual maximum CPU clock speed is lower than the one advertised and reported to Geekbench.

Multi-core performance scaling approaches 8.0 for the MediaTek chips, which can be expected due to the symmetrical CPU cluster configuration. Multi-core scaling for Kirin 620 is lower than expected for the integer tests, especially Lua, possibly due to L2 cache performance constraints.

Snapdragon 615, due to half the cores being clocked at a lower clock speed, shows a lower scaling factor, however the Lua scaling is particularly low, the benchmark score in fact often being worse than the single-core result, while being only modestly higher in other cases. This could be due to L2 cache constraints for one of the clusters and associated synchronisation issues in the multi-threading implementation used by the Geekbench test.

Looking at memory performance, MT6752 has the highest performance, closely followed by MT6753 and Hi6210. Qualcomm's Snapdragon 615 is well behind, probably due to the older/slower interconnect bus used.

MT6753 benchmark results suggests performance issue


Even though a clock speed of 1.30 GHz is reported to Geekbench by the operating system in the MT6753-equipped Meizu M2 Note, actual Geekbench subtest results are not consistent with a Cortex-A53 core running at that clock speed. There is variability in the results between different runs, which could be caused by thermal throttling. Many of the results seem to correspond to an effective clock speed of approximately 1.10 GHz, although for some runs the score of certain tests (including JPEG Compress) does approach the level expected for a clock speed of 1.3 GHz. Most of the time however, performance is significantly lower than expected, as if the clock speed is throttled to around 1.1 GHz for long periods of time.

The lower than expected performance could be related to the manufacturing process. The MT6753 was designed with cost-reduction in mind, and may use TSMC's 28LP process which has low cost but lower performance. Qualcomm's Snapdragon 410 and 615 also use this process, limiting their performance (and in the case of Snapdragon 615 resulting in heat production). MT6753 was announced as supporting a clock speed up to 1.5 GHz, and the lower-than-expected attainable clock speed may force MediaTek to adjust the specifications for the chip if the issue is not resolved.

Sources: Geekbench browser

Updated 6 June 2015.

Thursday, June 4, 2015

MediaTek announces Helio P10 and MT6753 arrives in shipping devices

MediaTek has announced Helio P10 (MT6755), a performance mid-range smartphone SoC that is the successor of MT6752. Featuring an octa-core Cortex-A53 configuration, Helio P10 improves upon MT6752 by using TSMC's new 28HPC+ manufacturing process, which delivers power efficiency and performance improvements while remaining relatively cost-effective. It can reach a higher maximum CPU clock speed up to 2 GHz and upgrades the GPU to a Mali-T860 MP2. It is expected to be commercially available in end devices by the end of 2015.

Features shared with Helio-X10


The new SoC  incorporates a few features from Helio X10 (MT6795), MediaTek's current high-end offering, including dual ISPs with 21MP camera support and improved capture capability, as well as improved audio quality.

Otherwise, the SoC has significant similarities to MediaTek's MT6752 which it succeeds, most likely including a 32-bit external memory interface, which keeps SoC cost and phone PCB cost down. With MT6752, MediaTek already demonstrated the ability to achieve memory performance adequate for a 1080p device within the constraints of a 32-bit memory interface.

The 28HPC+ process is an upgrade of the existing 28HPC (high-performance compact) process (which is also relatively new, used by Allwinner's A83T and other SoCs), which improves performance and cost relative to the established 28HPM (high-performance mobile) process. Existing MediaTek chips like MT6752 and MT6795 most likely use 28HPM, which is established and has also been used for previous-generation SoCs such as MT6592 and Snapdragon 801/805.

MediaTek migrating to big.LITTLE CPU configurations in new SoCs


A significant departure from existing octa-core MediaTek SoCs such as MT6752 and Helio X10 (MT6795) is the pseudo-big.LITTLE CPU configuration, whereby one cluster of four Cortex-A53 cores is clocked at a higher frequency (up to 2 GHz in this case), while the second of cluster Cortex-A53 cores is optimized for lower frequencies, being clocked at a lower maximum frequency (1.1 GHz according to AnandTech).

Together with the previously announced high-end Helio X20 (MT6797) and tablet/Chromebook-oriented chips such as MT8173, Helio P10 marks a migration to (pseudo-)big.LITTLE, hierarchical CPU designs at MediaTek. While symmetrical octa-core designs such as MT6752 and MT6795 reach very high multi-core processing power by allowing all cores to run at the maximum frequency, there are signs that this configuration impacts power efficiency for tasks that require less CPU power, which can be run on power-optimized low-frequency cores.

In practice, this may be reflected in somewhat mediocre standby battery life for smartphones using MT6752 or MT6795, even though power efficiency for demanding tasks that utilize all cores is likely to be pretty good.

Budget mid-range MT6753 reaches end-market


Meanwhile, MediaTek's previously announced MT6753, which is a cost-effective budget mid-range SoC, has arrived in commercially shipping device in the form of Meizu M2 Note. Despite the name chosen by Meizu, the new model actually has lower performance than the existing Meizu M1 Note, because the MT6753  is a less costly, lower end chip when compared to to the MT6752 inside the M1 Note, with considerably slower maximum CPU speeds for the eight CPU cores, as well as a lower performance GPU. There are also signs that the memory interface and the actual memory frequency used by the M2 Note is slower. The lower cost of the MT6753 platform is reflected in the low selling price of the Meizu M2 Note.

MT6753 implements several cost-reducing features, including a lower maximum clock speed (reported to be 1.3 GHz for the M2 Note), most likely associated with a cheaper manufacturing process (either 28LP or 28HPC) than the 28HPM process of the MT6752. A significant factor for lower performance is likely to be a reduced size of the L2 CPU cache inside the MT6753. MT6753 is likely to become a significant volume driver in MediaTek's 4G product line.

However, early Geekbench entries for the Meizu M2 Note suggest that the CPU cores of the MT6753 SoC used in this model are mostly unable to reach the planned clock frequency. The Geekbench results are mostly consistent with an average maximum CPU clock speed of about 1.1 GHz, significantly lower than the 1.3 GHz reported by the OS and the 1.5 GHz mentioned when the MT6753 was originally announced a few months ago. My following blog article about the use of AArch64 provides more details on this subject.

MT6753 has lower-performance GPU than MT6752


MT6753 also has a significantly lower-performance and smaller GPU (Mali-T720 MP3), compared to the Mali-T760 MP2 inside MT6752. MT6753 marks the first Mali implementation with three pixel processing cores; previous Mali GPUs either had one, two, four, six or eight pixel processing cores, Most likely, Mali-T720 does not have the memory bandwidth usage optimization that are present in Mali-T760, which together with the more limited pixel processing throughput means that devices with a 1080p display such as the Meizu M2 Note may be impacted in terms of 1080p game performance and power efficiency for graphics-intensive operations.

World modem support in new MediaTek platforms


All new MediaTek SoCs (including Helio P10 (MT6755), MT6753, the low-end quad-core MT6735 and the announced high-end Helio X20 (MT6797)) have world-modem support, facilitating compatibility with more cellular networks used worldwide, including legacy CDMA networks in the US and other countries. This makes MediaTek SoCs more attractive to smartphone manufacturers targeting multiple or worldwide markets.

Sources: MediaTek (Helio P10 announcement), AnandTech (Helio P10 article)

Updated 6 June 2015.