Showing posts with label Samsung. Show all posts
Showing posts with label Samsung. Show all posts

Friday, June 5, 2015

Smartphone platforms migrate to 64-bit (AArch64) mode

Recently, most existing and new mobile SoCs have started to become available configured in native 64-bit mode (AArch64) in conjuction with a 64-bit version of Android 5. Although SoCs targeting premium-level devices that are already shipping were the first to support AArch64 (including Tegra K1-64, Exynos 7420 and Snapdragon 810), recent entries in the Geekbench results database show that cost-sensitive platforms are also migrating to native 64-bit mode in upcoming smartphones.

This move involves Cortex-A53-based platforms such as MediaTek's MT6735, MT6752, MT6753 and MT6795, Qualcomm's Snapdragon 615 (MSM8939) as well as a new Snapdragon 410 (MSM8916) platform (which was previously limited to ARMv7), and HiSilicon's Kirin 620 and Kirin 930.

Initial ARMv8 platforms used hybrid AArch32 mode


Several ARMv8 based SoCs have been shipping for some time, but most have been using AArch32 mode, a hybrid mode which takes advantage of some of the architectural improvements in ARMv8 but does not expose native 64-bit mode to applications. Snapdragon 410 did not even take any advantage of ARMv8, running in 100% ARMv7 mode.

One reason why full AArch64 mode has not been adopted right away is that is does come with a performance penalty due to the increased storage requirements for program code and pointers, which puts greater demands on the memory subsystem of the SoC. Cost-sensitive smartphone models are especially sensitive to this due to a lower amount of RAM and smaller on-chip CPU caches. A decrease in the price of RAM chips has allowed the amount of RAM in cost-sensitive models to increase (e.g. more devices shipping with 2GB RAM), making AArch64 mode more appealing.

AArch64 also has benefits, in particular for floating point and data-intensive applications that use NEON vector instructions.

Comparison of CPU benchmark results


The migration to AArch64 mode across the board makes it easier to compare CPU benchmarks of different SoCs, which was previously made more difficult by the fact that some SoCs used AArch64 mode while others were still limited to AArch32.

In the following sections, I will return to Geekbench CPU test results and try to make apples-to-apples comparison for different groups of SoCs.

Quad-core Cortex-A53 SoCs


Quad-core SoCs included are MT6732, MT6735 and Snapdragon 410. Note that the version of Snapdragon 410 tested most likely reflects a newer silicon revision that has not yet widely appeared in end devices, since previous versions of Snapdragon 410 (MSM8916) were always limited to ARMv7 mode (seemingly being unable to run in AArch32 mode).

The following table shows selected integer tests results from Geekbench entries for the mentioned SoCs, running in AArch64 mode.

SoC        Geekbench  Clock  JPEG Compress (int)      Lua (int)
           ref        speed  Single IPC   Multi Par   Single IPC   Multi Par

MT6732     2705430    1.50    783   1.36  3108  3.97   795   1.29  3017  3.79
MT6735     2650175    1.30    646   1.36  2604  4.03   656   1.23  2047  3.12
MSM8916-64 2708213    1.21    626   1.34  2481  3.96   615   1.24  1280  2.08

The table below shows selected floating point and memory results.

SoC        Geekbench  Clock  Mandelbrot (float)       Stream Copy (memory)
           ref        speed  Single IPC   Multi Par   Single Multi

MT6732     2705430    1.50    631   1.23  2490  3.95  1030   1156
MT6735     2650175    1.30    526   1.19  2091  3.98   901    965
MSM8916-64 2708213    1.21    508   1.23  1969  3.88   447    505

The "IPC" value as shown in the tables is an index calculated from a comparison with the performance of common Cortex-A7-based SoCs, normalized to the same clock speed. The parallelism value ("Par") is the performance scaling from single-core to multi-core for the specific Geekbench subtest.

The IPC values are fairly consistent, as would be expected from the same CPU core (Cortex-A53) running the same ISA (instruction set architecture). When scaling to multiple cores, MT6732 does best, as shown by the scaling in the Lua benchmarks. This is not surprising as MT6732 is not an entry-level SoC given its cost structure, being better described as belonging to the mid-range segment. It is likely to have a better memory subsystem (in particular, a larger and faster L2 cache) than the other chips.

MediaTek's new entry-level chip, MT6735, apart from running at a somewhat higher clock speed (1.3 GHz vs 1.2 GHz), outperforms the 64-bit version of Snapdragon 410 when normalized to the same clock speed, which is especially evident in the Lua multi-core test and memory tests. The Lua results could be a reflection of L2 cache size and/or speed. Memory performance (based on the Stream Copy subtest) of both MediaTek chips is roughly double that of Snapdragon 410 (something which was already evident in the respective 32-bit platform results).

Mid-range octa-core Cortex-A53-based SoCs


The octa-core Cortex-A53-based SoCs targeting the mid-range segment include MediaTek's performance-oriented MT6752, the recent cost-reduced MT6753, Qualcomm's Snapdragon 615 (MSM8939), and HiSilicon's Kirin 620 (Hi6210).

These SoCs use different CPU clock speed configurations. MediaTek's MT6752 and MT6753 run all cores at the same maximum clock speed, 1.66 GHz for MT6752 and (at least in the tested device) seemingly only about 1.1 GHz for MT6753, even though Geekbench reports a maximum clock speed of 1.3 GHz. HiSilicon's Kirin 620 can run all cores up to a maximum speed of 1.2 GHz.

Qualcomm's Snapdragon 615 uses a pseudo-big.LITTLE, hierarchical architecture with one performance cluster of four cores running up to 1.65 GHz in the most recent version of the platform (previous versions ran up to 1.5 GHz), with the other power-efficient cluster running at a significantly lower clock speed. MediaTek's annnouncement of the MT6755 (Helio P10) shows that MediaTek is also transitioning to a hierarchical CPU clusters for new chips, similar to Snapdragon 615.

Having one power-optimized CPU cluster helps power efficiency for low CPU demand scenarios such as smartphone standby or light usage. The fact that Snapdragon 615 is not very power efficient, despite the low-clocked cluster, in mostly due to the low-performance 28LP manufacturing process used.

The following table shows selected integer tests results from Geekbench entries for the mentioned SoCs, running in AArch64 mode.

SoC        Geekbench  Clock  JPEG Compress (int)      Lua (int)
           ref        speed  Single IPC   Multi Par   Single IPC   Multi Par

MSM8939    2704276    1.65    837   1.32  4269  5.10   789   1.16   667  0.85
MT6752     2709869    1.69    890   1.37  6719  7.55   907   1.31  6531  7.20
MT6753     2699665    1.10?   572   1.35  4298  7.51   587   1.30  4282  7.29
Hi6210     2704356    1.20    630   1.36  3473  5.51   626   1.27  2156  3.44

The table below shows selected floating point and memory results.

SoC        Geekbench  Clock  Mandelbrot (float)       Stream Copy (memory)
           ref        speed  Single IPC   Multi Par   Single Multi

MSM8939    2704276    1.65    661   1.17  4019  6.08    512   569
MT6752     2709869    1.69    714   1.24  5637  7.89   1024  1158
MT6753     2699665    1.10?   463   1.23  3597  7.77    802   958
Hi6210     2704356    1.20    506   1.24  3419  6.76    833  1030

IPC values are fairly consistent for MT6752, Hi6210 and MT6753 (when a likely clock speed of 1.1 GHz is assumed), but Snapdragon 615 consistently shows somewhat lower IPC, possibly related to the earlier revision (r0p1) of the Cortex-A53 core used. It is also possible that, similar to what seems to be the case for the MT6753 entry used (Meizu M2 note), the actual maximum CPU clock speed is lower than the one advertised and reported to Geekbench.

Multi-core performance scaling approaches 8.0 for the MediaTek chips, which can be expected due to the symmetrical CPU cluster configuration. Multi-core scaling for Kirin 620 is lower than expected for the integer tests, especially Lua, possibly due to L2 cache performance constraints.

Snapdragon 615, due to half the cores being clocked at a lower clock speed, shows a lower scaling factor, however the Lua scaling is particularly low, the benchmark score in fact often being worse than the single-core result, while being only modestly higher in other cases. This could be due to L2 cache constraints for one of the clusters and associated synchronisation issues in the multi-threading implementation used by the Geekbench test.

Looking at memory performance, MT6752 has the highest performance, closely followed by MT6753 and Hi6210. Qualcomm's Snapdragon 615 is well behind, probably due to the older/slower interconnect bus used.

MT6753 benchmark results suggests performance issue


Even though a clock speed of 1.30 GHz is reported to Geekbench by the operating system in the MT6753-equipped Meizu M2 Note, actual Geekbench subtest results are not consistent with a Cortex-A53 core running at that clock speed. There is variability in the results between different runs, which could be caused by thermal throttling. Many of the results seem to correspond to an effective clock speed of approximately 1.10 GHz, although for some runs the score of certain tests (including JPEG Compress) does approach the level expected for a clock speed of 1.3 GHz. Most of the time however, performance is significantly lower than expected, as if the clock speed is throttled to around 1.1 GHz for long periods of time.

The lower than expected performance could be related to the manufacturing process. The MT6753 was designed with cost-reduction in mind, and may use TSMC's 28LP process which has low cost but lower performance. Qualcomm's Snapdragon 410 and 615 also use this process, limiting their performance (and in the case of Snapdragon 615 resulting in heat production). MT6753 was announced as supporting a clock speed up to 1.5 GHz, and the lower-than-expected attainable clock speed may force MediaTek to adjust the specifications for the chip if the issue is not resolved.

Sources: Geekbench browser

Updated 6 June 2015.

Thursday, May 21, 2015

Battery performance based on Geekbench battery test results

A while ago, Primate Labs added a battery performance test to the Geekbench benchmark suite, which has been frequently used on this blog and elsewhere to analyze CPU processing peformance. The battery performance test gives the opportunity to better gauge the power efficiency of different CPU architectures, especially for the type of workload that the Geekbench battery test represents.

Battery test overview


The battery test is intended to be run starting from a fully loaded battery until the battery is completely run down. It appears to target a certain fixed level of CPU processing that is sustained throughout the test. In the test results, a duty cycle parameter is given for several time points, which more or less represents CPU utilization. Slower CPU cores (such as quad-core Cortex-A7-based SoCs) have a higher duty cycle percentage, while high-performance "big" cores such as Cortex-A57 and Krait-400 show a lower percentage.

In practice, most battery test results in the Geekbench database were terminated early in the benchmark process and do not give useful information. The test runs that completed a full run-down from 100% to close to 0% battery do give a usable indication of battery efficiency. The benchmark expresses battery performance as a number, similar to Geekbench CPU performance scores. This score is correlated with the duration and duty cycle using a certain formula, reflecting the amount of CPU work done and the battery running time. The score is heavily influenced by the actual capacity of the battery used in the device.

Overview of results for common SoCs


The following table shows Geekbench approximate battery test scores for common SoCs used in smartphone models for which a battery capacity specification is available. The table is ordered by SoC model name.


Device                    SoC              Score      Capacity  Duration    Score /
                                           (Range)    (mAh)     (hrs:min)   mAh

Apple iPhone 5S           Apple A7         1220-2090  1560      2:00-3:30   0.78-1.34
Apple iPhone 6            Apple A8         1550-2360  1810      2:35-4:00   0.86-1.30
Apple iPhone 6 Plus       Apple A8         2580-3250  2915      4:20-5:25   0.89-1.11
Meizu MX Pro              Exynos 5430      2080-2730  3350      7:45-10:10  0.62-0.81
Samsung Galaxy Alpha      Exynos 5430      1850-2710  1860      4:30-5:00   0.99-1.46
Samsung Galaxy Note 4     Exynos 5433      3190-3650  3220      5:20-6:00   0.99-1.13
Samsung Galaxy S6 Edge    Exynos 7420      4100-4600  2600      7:00-7:45   1.58-1.77
Huawei Honor 6            Kirin 920        1580-2080  3100      2:40-3:30   0.51-0.67
Huawei Mate 7 (MT7-L09)   Kirin 925        2470-2820  4100      4:05-4:20   0.60-0.69
Huawei P8 (GRA-L09)       Kirin 930        3270-4150  2680      5:30-7:00   1.22-1.55
Lenovo A5000              MT6582           3740       4000      14:00       0.94
Xiaomi Redmi Note         MT6592           2850-3560  3200      7:30-9:00   0.89-1.11
Huawei G750-U10           MT6592           2960-3430  3000      7:45-9:00   0.99-1.14
Meizu MX4                 MT6595           2540-2780  3100      6:20-6:55   0.82-0.90
Lenovo A7000-A            MT6752M          4550-4950  2900      8:16-8:50   1.57-1.71
Meizu M1 Note             MT6752           4900-6310  3140      8:10-10:30  1.56-2.01
HTC Desire 820s           MT6752           3580-3730  2600      6:15-6:30   1.38-1.43
HTC One E9+               MT6795           3370       2800      6:00        1.20
Moto G                    MSM8226 (SD400)  1600-2000  2070      6:00-7:30   0.77-0.97
Xiaomi Redmi 1S           MSM8226 (SD400T) 1485       2000      5:30        0.74
Lenovo A6000              MSM8916 (SD410)  2700       2300      6:50        1.17
HTC Desire 826            MSM8939 (SD615)  1800       2600      4:25        0.69
Xiaomi Mi 4i              MSM8939          2520-2810  3120      5:50-7:30   0.81-0.90
HTC One M8                MSM8974 (SD801)  2500-3300  2600      4:20-5:50   0.96-1.27
Xiaomi Mi 4               MSM8974          3150       3080      7:45        1.02
Samsung Galaxy Note 4     APQ8084 (SD805)  2500-3550  3220      4:10-6:15   0.78-1.10
LG G4                     MSM8992 (SD808)  2500-3260  3000      4:15-5:30   0.89-1.09
HTC One M9                MSM8994 (SD810)  1400-2580  2840      2:20-4:20   0.49-0.91

Devices with low processing power but long battery life may be penalized by having to power the screen and wireless connectivity for a longer period during the test.

The ratio of the battery score and the battery capacity (in mAh) gives a very rough indication of the efficiency of a particular CPU architecture, although the comparison may be skewed by several factors.

Results by SoC type


The previous generation of Cortex-A7-based SoCs such as Snapdragon 400 and MT6582 shows long running time due the effiency of the Cortex-A7 core, but the battery score appears to be affected by the limited CPU power. Snapdragon 410 does relatively well despite (or perhaps thanks to) being limited to ARMv7 mode.

SoCs with previous generation Cortex-A15 cores for performance in a big.LITTLE configuration, such as Kirin 920/925, show relatively low efficiency, as is to be expected given the relatively high power consumption Cortex-A15 is known for. Exynos 5430, which is manufactured on a relatively advanced 20 nm process, generally does better.

Octa-core mid-range: MediaTek does well


Among octa-core mid-range SoCs such as the Cortex-A53-based MT6752 and Qualcomm's Snapdragon 615 and MediaTek's previous-generation Cortex-A7-based MT6592, both the MT6752 and MT6592 make a strong showing, with MT6752 getting particularly high scores.

MT6752 has an optimized memory architecture with a 32-bit memory interface and is manufactured on TSMC's 28HPM process, which helps performance relative to Snapdragon 615. Although not tested by Geekbench, reports indicate that wireless standby power efficiency is not as great as the CPU efficiency for this SoC. It is possible that due to the CPU cores being optimized for relatively heavy CPU loads (not big.LITTLE so no cores optimized for low power consumption at low frequencies), which includes the Geekbench battery test, a low load scenario (such as reflected in standby time) produces less optimal power consumption.

Qualcomm's Snapdragon 615 (MSM8939) does relatively poorly, which can largely be explained by the assymmetric CPU configuration and lower-performance 28LP manufacturing process.

Performance segment SoCs


The poor performance of Snapdragon 810 (as illustrated by the HTC One M9) is apparent, with significant worse battery efficiency than the previous generation Snapdragon 801 and 805. Snapdragon 808, which uses a later revision Cortex-A57 core and is used inside the LG G4, does somewhat better.

Largely due to the relatively advanced manufacturing process (14 nm FinFET for Exynos 7420), Samsung's latest SoCs do well, particularly Exynos 7420 used inside the Galaxy S6. Even Samsung's previous generation Exynos 5433 appears to be well ahead of Snapdragon 810 in terms of efficiency.

A limited number of results is available for two Cortex-A53-based performance SoCs (characterized by a wide memory interface and more powerful GPU than mid-range solutions), MediaTek's MT6795 (Helio-X10) and HiSilicon's Kirin 930. Kirin 930 shows relatively good efficiency in this benchmark, possibly ahead of MediaTek's MT6795. Kirin 930 has a two-level hierarchy in which one cluster of Cortex-A53 cores is optimized for a higher and the other for a lower frequency, while in MT6795 all cores can reach the maximum frequency.

Source: Geekbench Browser (Battery search)

Updated 28 May 2015.

Thursday, April 30, 2015

More details emerge about Cortex-A72 CPU core

Recently, more details have become available about the performance improvements implemented in ARM's Cortex-A72 core, which is a replacement for the high-performance Cortex-A57 core. Apart from the gains from using a more advanced process such as 14/16 nm FinFET, Cortex-A72 also implements fairly significant micro-architectural improvements affecting performance per cycle and power efficiency. AnandTech has published a detailed overview of these improvements.

Cortex-A57 based on Cortex-A15 and not fully optimized for power-efficiency


The Cortex-A57 CPU core, which was announced in 2012, has significant similarities to Cortex-A15, ARM's long-standing high-performance 32-bit CPU core, which has been known for relatively high power consumption. As such, it is not unexpected that improvements on the Cortex-A57 architecture (in the form of the Cortex-A72) have proven to be possible. Cortex-A57-based SoCs  such as Snapdragon 810 have been known to throttle, being forced to reduce the clock speed due to excessive heat production and power use, resulting in reduced sustained performance. Apple's A7 and A8 processors use CPU cores that most likely have strong similarities with Cortex-A57, but which exhibit little throttling due to a lower maxium clock speed, a lower number of cores and other factors related to the the chip design.

Increased level of sustained performance


ARM has made available a number of slides detailing the improvements in sustained performance and power efficiency in Cortex-A72 over Cortex-A57. On a 28 nm process and similar clock speed, ARM's charts indicate a roughly 20% improvement in power reduction. 

Sustained performance is expected to be higher than Cortex-A57, implementations of which (such as Snapdragon 810 and Exynos 5433, and to a lesser degree Exynos 7420) have suffered from an inability to maintain high clock speeds and throttle back to a relatively low speed due to heat production and associated power consumption. ARM gives a figure of sustained 750 mW operation per core on a 16FF+ process with a clock speed around 2.5 GHz.

In terms of IPC (instructions per cycle), ARM's information shows improvements in all instruction-level performance segments, with a 1.16x improvement for "analytics", 1.38x for cryptography, 1.50x for memory, 1.26x for floating point and 1.16 for integer compute. The increase in memory performance appears to be significant.

Improved single-core performance evident in early Geekbench results


Early Geekbench results for the MT8173 SoC from MediaTek, which includes two Cortex-A72 cores, give an indication of practical peformance of the Cortex-A72 core, although the exact clock speed the Cortex-A72 cores are running at is hard to determine. The following table shows single-core performance from a recent MT8173 Geekbench entry, comparing it to Exynos 7420 as used in the Samsung Galaxy S6. Both use 64-bit AArch64 mode.

SoC                        JPEG   Dijkstra  Lua   Mandelb. Stream SGEMM SFFT
                           Compr.                          Copy
28nm? MT8173 (Cortex-A72)  1429    1287     1675  1750     2217    979  1345
14nm Exynos 7420           1475    1082     1409  1147     1993    954  1379
The MT8173 easily matches the single-core performance of Exynos 7420, while showing significant improvements in the Mandelbrot floating point subtest and the memory-intensive Dijkstra subtest, and also the Lua subtest. Memory subtest (Stream Copy) performance is also better than Exynos 7420, despite the likely much wider memory interface of the latter, providing clear evidence of the improved memory performance (largely due to smarter prefetching) in Cortex-A72. Overall, since the MT8173 results reflects a SoC using 28 mn or perhaps 20 nm process technology, while Exynos 7420 uses Samsung's leading-edge 14 nm FinFET process, the ability of the MT8173 to beat Exynos 7420 in single-core performance while using a less advanced process is impressive and illustrates the performance improvements in the Cortex-A72 core.

Reduced silicon area results in lower cost


Cortex-A72 has a silicon area that is 10% smaller than Cortex-A57 on an equivalent process, while delivering improvements in performance and power efficiency. Already SoCs have been announced or described that utilize Cortex-A72 cores, such as MediaTek's MT8173 for tablets, Qualcomm's Snapdragon 618 and 620 for smartphones, and MediaTek's MT6797 (Helio-X20) for smartphones.

There seems to be a clear trend of using just two Cortex-A72 cores (instead of the four cores used in many Cortex-A57 implementations), reducing cost and maximum power consumption. These are cores are augmented by low-power, small-area Cortex-A53 cores running at a lower frequency. MT8173, Snapdragon 618 and Helio-X20 all use such as configuration.

Use of Cortex-A72 may be more effective than high-clocked Cortex-A53 cores


There are indications that Cortex-A53 cores running at a high frequency (such as implemented in MediaTek's MT6752 and MT6795 (Helio-X10), HiSilicon's Kirin 930 and to a lesser degree in Snapdragon 615 and the announced Snapdragon 415 and 420) run into a power efficiency bottleneck at higher clock speed, due the relatively steep increase in power consumption as the clock speed of the Cortex-A53 core increases above 1.3-1.5 GHz. Solutions that combine a small number of Cortex-A72 with lower-clocked, power efficient Cortex-A53 cores may prove to be a sweet spot in terms of practical performance and power efficiency for mid-range SoCs.

Source: AnandTech (Cortex-A72 Architecture Details article), Geekbench Browser

Tuesday, March 24, 2015

TSMC's 16 nm FinFET sees adoption by Qualcomm and Apple, competes with Samsung

TSMC will receive majority of Apple A9 business


According to reports, TSMC will receive the majority of Apple A9 SoC orders, which includes the A9 for next-generation iPhones and A9X for iPads. According to sources quoted by EE Times, Apple had originally planned to give Samsung a majority of the Apple A9 orders, but has recently shifted orders to TSMC, most likely using a 16 nm FinFET process.

Because ramping up production of a similar chip from a second source with different foundry technology is challenging and complicated, I believe it is likely that A9 production will be overwhelmingly (and perhaps exclusively) concentrated at TSMC. A parallel can be drawn with various reports from last year, which for a long time continued to echo incorrect projections that Samsung would serve a significant portion of the production of Apple's A8 generation SoCs, which has not turned out not to be the case.

In the mean time, TSMC's revenues continue to be a relatively high level despite Q1usually being seasonally down, with strong demand for 20 nm production, most likely reflecting continuing demand from Apple, which is offsetting weakness from Qualcomm for leading-edge processes. There have been rumours about an upcoming iPhone 6S and a lower cost iPhone 6C model which may involve substantial unit volumes. Apple's iPhone unit shipments have also been boosted by strong demand in China.

Low yield at Samsung and Exynos ramp contribute to TSMC orders


According to a source quoting sources in South Korea, TSMC's yield rate for its 16 nm FinFET process is better than that of Samsung's 14 nm process. Moreover, Samsung is seeing strong upcoming demand for it flagship Galaxy S6 smartphone, which uses the Exynos 7420 SoC produced on its 14 nm FinFET process, and most likely needs all capacity it can get to ramp up production of this SoC. Samsung also increasingly uses Exynos 7420 and other internally-developed SoCs for other product lines, such as other smartphone models as well as tablets.

Qualcomm said to have limited-time exclusive use of TSMC's 16FF+ technology


According a report by EETimes from a semiconductor industry conference in January, Qualcomm is likely to have locked up exclusive use of TSMC's 16FF+ process technology for about six months. The article appears to quote sources affiliated with Qualcomm that state that Qualcomm feels competitors such as MediaTek took advantage of previous-generation process technology (28HPM) that Qualcomm helped develop at TSMC, without having made the development investment that Qualcomm made.

However, this policy would be contrary to the principles based on which TSMC has operated for a long time, although the initial ramp of 20 nm at TSMC last year also seemed to be locked-up by another company (Apple). Its seems corporate pressure from these giant companies, backed by billions of dollars of cash, is forcing TSMC into these kinds of commitments.

The article mentions that the later access to 16FF+ won't affect MediaTek's mainstream products serving the mid-range to entry-level segments, because 28 nm technologies will continue to be used for such products in the market.

Leaked power consumption graphs suggest increased power efficiency


Power consumption graphs of current and upcoming high-end Qualcomm SoCs running a 3D game at high detail settings suggest power consumption and heat production of Qualcomm's unannounced Snapdragon 815 processor will be considerably lower than that of the Snapdragon 801 and Snapdragon 810, with Snapdragon 810 showing particularly unfavourable characteristics, as confirmed by widespread reports and reviews of Snapdragon 810-based devices.

Snapdragon 815 is unannounced and few details are known about it, with some reports suggesting the use of a next-generation Krait CPU core. Use of ARM Cortex-A72 processor cores appears to be not unlikely, since this core seems to be close to actual production. Most likely, the decreased heat production, which is likely to be associated with lower power consumption, is made possible by the use of the next-generation 16 nm FinFET process at TSMC.

Similar improvements in power consumption were observed for Snapdragon 620, which uses Cortex-A72 cores, when compared to the mid-range Snapdragon 615 SoC, which is reported to also have heating issues. Snapdragon 620, which has been announced, is also likely to have significantly higher CPU performance than Snapdragon 615 due to the use of Cortex-A72 cores, versus Cortex-A53 for Snapdragon 615, while also likely being produced on a much more efficient process (possibly  TSMC's 16FF+), since Snapdragon 615 is manufactured on a low-efficiency 28LP process.

Sources: EE Times (ISS 2015 conference report), EE Times (Apple A9 orders article), STJS Gadgets Portal (Snapdragon heat production graphs)

Updated 25 March 2015 (Add comments about 20 nm Apple production at TSMC).

Thursday, March 19, 2015

Qualcomm releases new variant of Snapdragon 410 that supports ARMv8, targeting tablets and other applications

Qualcomm recently made announcements of products and reference designs based on the APQ8016 SoC, a new modem-less quad-core Cortex-A53-based SoC branded as Snapdragon 410. The chip is targeted at IoT applications, development boards and probably also Wi-Fi-only tablets, supporting Linux, Android and Windows 10. Although branded as Snapdragon 410, the chip is a new design that is likely to fix most of the performance deficiencies of the first-generation MSM8916 Snapdragon 410 SoC that has been targeted at smartphones. For example, the original Snapdragon 410 SoC appears not to support ARMv8 at all, while the new chip is clearly targeted at 64-bit platforms.

Development board released


Qualcomm recently announced the DragonBoard 410c, a development board with support for Linux and Android. It features a quad-core 1.2 GHz Cortex-A53 processor with Adreno 306 GPU, 533 MHZ LPDDR2/LPDDR3 SDRAM, HDMI output and several I/O interfaces. The HDMI output is limited to 30fps at 1080p.

The board is designed to compatible with the 96Boards initiative from Linaro, the non-profit engineering organization developing open source software for the ARM architecture.

With 64-bit support and a maximum clock speed of 1.2 GHz, the APQ8016 SoC that is used on the board most likely uses a more recent version of the Cortex-A53 core than the original Snapdragon 410 processor for smartphones, while being manufactured using the same 28LP process at TSMC.

New SoC probably targets tablets as volume driver


There are indications that the new chip will be used in Wi-Fi-only tablets, such as recently announced Samsung Galaxy Tab A series. There have also been indications that Qualcomm is stepping up its efforts to target Chinese tablet manufacturers.

Qualcomm and MediaTek support mainline Linux kernel with open-source drivers for selected SoCs


Whereas in the past major smartphone SoC companies kept their closed-source drivers separate from the open-source Linux community, more recently companies such as Qualcomm and MediaTek have started releasing open source contributions for the Linux kernel to support selected SoC products. Both companies have also recently joined Linaro, the engineering organization developing open source software for the ARM architecture.

For both companies, the SoCs supported in the mainline Linux kernel are applications processors without an integrated modem. Qualcomm is supporting the APQ8016 mentioned above while MediaTek has contributed code for the MT8173 tablet processor.

Sources: Qualcomm (Dragonboard announcement), Qualcomm (Windows 10 IoT platform announcement), CNXSoft (DragonBoard 410c article)

Tuesday, March 10, 2015

Qualcomm's Snapdragon 808 fixes flaws of Snapdragon 810

Snapdragon 808 (MSM8992) is a performance-oriented SoC that Qualcomm announced last year together with Snapdragon 810. It has similarities to Snapdragon 810 (MSM8994), including the use of ARM Cortex-A57 CPU cores and Cortex-A53 cores in a big.LITTLE configuration. Snapdragon 808 appears to fix some of the performance flaws that are apparent in Snapdragon 810, especially the memory subsystem, while being significantly less costly.

Snapdragon 808 features


Features and differences with Snapdragon 810 include:

  • Snapdragon 808 has only two Cortex-A57 cores (revision r1p2) compared to four Cortex-A57 cores (revision rp1p1) for Snapdragon 810. Both contain four Cortex-A53 cores.
  • Snapdagon 808 has a more economical dual-channel LPDDR3 memory interface, compared to the LPDDR4 interface of Snapdragon 810.
  • Snapdragon 808 has an Adreno 418 GPU, compared to Adreno 420 in Snapdragon 810, presumably with somewhat lower performance.
  • Manufactured on TSMC's 20 nm process, the same as Snapdragon 810.
  • 4K resolution video playback (H.264/H.265), on-device display resolution up to 2560x1600 (Snapdragon 810 theoretically supports 4K on-device display resolution, but all currently announced smartphones using Snapdragon 810 are limited to a resolution of 1920x1080).

 

Early benchmark results suggest Snapdragon 808 fixes performance flaws of Snapdragon 810


Early benchmarks for Snapdragon 808 have already appeared on the Geekbench Browser. We can compare Snapdragon 808's single-core performance with Snapdragon 810 and Exynos 7420, all of which run in AArch64 mode in the published benchmark results.

To reduce the impact of thermal throttling, the best Geekbench subtest results for a given device have been collected and combined in the table below. I have made an attempt to estimate the actual maximum clock speed of the Cortex-A57 cores during the benchmarks, partly based on the maximum frequency reported by Geekbench when it appears to apply to the "big" cores and not the "LITTLE" cores.

SoC          "big" CPU                    Arch     JPEG (int)  Lua (int)   Mandelb. (float)
                                                   Comp. IPC         IPC         IPC

MSM8992      2 x 1.69? GHz Cortex-A57r1p2 AArch64  1257  1.96  1385  1.99  1031  1.79
MSM8994      4 x 1.8? GHz Cortex-A57r1p1  AArch64  1358  1.96  1283  1.73  1100  1.79
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  1486  1.96  1409  1.74  1198  1.78

MT6795       8 x 1.95 GHz Cortex-A53r0p2  AArch64  1026  1.37  1053  1.31   823  1.24
MT6795T      8 x 2.16 GHz Cortex-A53r0p2  AArch64  1128  1.36  1173  1.32   912  1.24

The IPC figures are calibrated on the Cortex-A7 core, whose IPC is fixed at 1.00. Fixing the maximum cock speed to 1.8 GHz for the MSM8994 (Snapdragon 810) results (based on HTC One M9 entries) and at 1.69 GHz for the MSM8992 (Snapdragon 808) produces similar IPC figures for the JPEG Compress integer test and the Mandelbrot floating point test, making them reasonably plausible. The best Lua subtest result for the MSM8992 shows a higher IPC, which may reflect improved L2 cache performance in the MSM8992, which uses a later revision of the Cortex-A57 core.

The single-core CPU performance results show no suprises, with Snapdragon 808 showing good performance that is slightly lower than Snapdragon 810, proportional to the lower maximum clock frequency in the tested devices. However, the Lua test shows higher performance with Snapdragon 808, which is especially true for the multi-core test (results not shown), where Snapdragon 810 seems to be limited to a score of about 1200 with little gain when compared to single-core performance, while Snapdragon 808 consistently scores in the region of 4000.

Memory subsystem performs much better than Snapdragon 810


The following table lists Geekbench scores for some memory-dependent tests. 

SoC          "big" CPU                    Arch     Stream Copy  SGEMM SFFT  SGEMM SFFT
                                                   Single Multi             Multi Multi
MSM8992      2 x 1.69? GHz Cortex-A57r1p2 AArch64  1527   1733   767  1126  1678  2946
MSM8994      4 x 1.8? GHz Cortex-A57r1p1  AArch64  1428   1838   741  1009  1870  3649
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  2003   2622   957  1363  2888  5014

MT6795       8 x 1.95 GHz Cortex-A53r0p2  AArch64  1356   2068   484   618  1542  4764
MT6795T      8 x 2.16 GHz Cortex-A53r0p2  AArch64  1350   2140   529   694  1659  5333

Notably, Snapdragon 808 delivers memory performance similar to Snapdragon 810 at much lower cost, despite using only a regular LPDDR3 memory interface, as compared to the Snapdragon 810's LPDDR4 memory interface which in theory delivers almost twice the bandwidth. This provides clear evidence that the Snapdragon 810's memory interface is still flawed, while that of Snapdragon 808 is much more optimized. Snapdragon 808 even beats Snapdragon 810 in the single-core SGEMM and SFFT test, despite running at a lower clock speed, which probably also reflects a more optimized and functional memory controller. Even in the multi-core SGEMM and SFFT tests, Snapdragon 808 is not much behind Snapdragon 810 despite having only half the number of CPU cores.

Comparison with MT6795


In the marketplace, Snapdragon 808 may compete with MediaTek's MT6795 (Helios X10), which is a cost-effective performance-segment SoC that only uses Cortex-A53 cores. Comparing Geekbench subtest results, MT6795 scores signficantly lower than Cortex-A57-based SoCs such as Snapdragon 808 in single-core benchmarks, although the gap is not very large except in the SFFT benchmark. The MT6795 does relatively well in multi-core benchmarks, where it beats the Cortex-A57-based Snapdragon 808 and Snapdragon 810 in most cases by a considerable margin, especially in the JPEG Compress, Lua and Mandelbrot tests which are sensitive to the number of CPU cores (multi-core scores have not been listed for these tests in the tables above). As an example, MT6795 scores 8167 in the multi-core JPEG Compress test, twice the score of Snapdragon 808 and almost 40% higher than Snapdragon 810.

Conclusion


Snapdragon 808 appears to be a much more optimized, less flawed SoC product than Snapdragon 810 that may perform similarly or even better than Snapdragon 810 in practical use cases due to the performance flaws present in Snapdragon 810. At the same time, Snapdragon 808 is likely be considerably cheaper. The only caveat is the question of whether excessive heat production makes thermal throttling necessary to the same degree as Snapdragon 810. With only two Cortex-A57 cores, the SoC should be less problematic in this regard.

Source: Geekbench Browser (MSM8992 results), Geekbench Browser (MSM8994 results), Qualcomm (MSM8992 specifications)

Updated 15 March 2015.

Early benchmarks appear for Cortex-A72-based SoC

ARM recently announced the new Cortex-A72 processor core, which is an improved version of the existing high-performance Cortex-A57 processor core.

Alongside the Cortex-A72 CPU core, ARM also announced the CCI-500 interconnect technology as well as the high-end Mali-T880 GPU. Devices incorporating the combination of these technologies are expected to become available in 2016.

However, SoCs using the Cortex-A72 CPU are likely to become available earlier. Qualcomm and MediaTek have both announced SoCs using the Cortex-A72 core with commercial availability in the second half of 2015, suggesting that the CPU core itself is at an advanced stage of introduction. Already, early benchmarks for MediaTek's MT8173 tablet SoC that incorporates the Cortex-A72 have become available.

Cortex-A72 appears to be enhanced version Cortex-A57 optimized for next-generation processes


In its announcement press release from 3 February 2015, ARM claims that more than ten partners have already licensed Cortex-A72, including HiSilicon, MediaTek and Rockchip. Cortex-A72 is based on ARM's ARMv8-A instruction set architecture, and can be combined with the existing Cortex-A53 in a big.LITTLE configuration. Cortex-A72 seems to be positioned as a replacement for Cortex-A57. The similarities with Cortex-A57 are very apparent, for example in the identically sized L1 instruction and data caches, and a feature set that is otherwise very similar.

On a 16 nm FinFET process, the core can sustain operation at speeds up to 2.5 GHz within the constraints of a mobile power envelope (e.g. smartphones), with scalability to higher speeds for larger form-factor devices. However, the first announced devices, such as MediaTek's MT8173, appear to use older processes such as the tried-and-trusted 28 nm HPM process at TSMC, so they are likely to have a lower maximum clock speed.

ARM claims increased performance and power efficiency, although these claims seem to be based on implementation on next-generation processes such as 16 nm FinFET that deliver a significant intrinsic improvement in these metrics. ARM mentions micro-architectural improvements that result in enhancements in floating point, integer and memory performance. When implemented on a 16 nm FinFET process, ARM expects Cortex-A57 to provide 85% higher performance when compared to the Cortex-A57 core on a 20 nm process within a similar smartphone power budget.

Overall, the differences with Cortex-A57 appear to be relatively minor, so that Cortex-A72 is best viewed as an enhanced version of Cortex-A57 that is optimized for next-generation processes such as 16 nm FinFET. Nevertheless, the first SoCs to use the Cortex-A72 core will be manufactured using a less advanced process.

Benchmarks appear for MediaTek's MT8173


MediaTek's MT8173 is a mid-range tablet processor mainly targeting Wi-Fi-only tablets, since it does not have an integrated modem. It has two Cortex-A72 cores and two Cortex-A53 cores in a big.LITTLE configuration. Probably manufactured using the established 28HPM process at TSMC, the maximum clock speed of the Cortex-A57 cores is likely to be lower that the target for 16 nm FinFET, although MediaTek claims a clock speed up to 2.4 GHz, while a much lower frequency is apparent in early benchmarks results.

The chip also features a PowerVR GX6250 GPU, which delivers higher performance than the G6200 GPU used inside MediaTek's existing MT8135 and MT6795.

Recently, early benchmarks for a MT8173 development board have appeared both in the Geekbench Browser and in the results database of GFXBench. The first Geekbench results already appeared in December 2014. The latest set of Geekbench results date from the end of February 2015, although they do show a certain amount variation that may reflect thermal throttling.

Single-core performance good, but not spectacular


As expected, the Geekbench results show good single-core performance, albeit not spectacular. As shown in the following table, singe-core performance is in line with Cortex-A57-based SoCs such as Exynos 5433 and Exynos 7420. It should be noted that the MT8173 test SoC is most likely manufactured at 28 nm with a corresponding relatively low maximum CPU clock speed, while Exynos 5433 and 7420 are manufactured using smaller leading edge processes at Samsung.


SoC          "big" CPU                    Arch     JPEG (int)  Lua (int)   Mandelb. (fp)
                                                   Comp. IPC         IPC         IPC
MT8173       2 x 1.6? GHz Cortex-A72      AArch32  1310  2.13  1380  2.10  1064  1.95
Exynos 5433  4 x 1.80 GHz Cortex-A57r1p0  AArch32  1456  2.10  1397  1.89  1174  1.91
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  1481  1.97  1409  1.74  1198  1.92

In this table, to determine the IPC index I have made an educated guess about the actual clock speed of MT8173 when running the benchmarks. Geekbench reports a 1.40 GHz clock speed (which probably applies to the Cortex-A53 cores), 1.6 GHz seems to be a good match, providing just a little better IPC than Cortex-A57. Note that Exynos 7420 runs in AArch64 mode, which skews direct IPC comparisons.

Practical implications unclear


Without knowing the exact clock speed of the Cortex-A72 cores, it is hard to draw conclusions about the actual IPC improvement over Cortex-A57. If the MT8173 uses a 28 nm process, the ability to approach the single-core performance of Samsung's Exynos 7420 manufactured using 14 nm FinFET process is impressive. However, although MediaTek demonstrated the MT8173 in an actual tablet at MWC, it is unclear what kind of device the Alps development board in the benchmark entries actually represents, so it remains to be seen whether the benchmarks actually reflect the power budget of a tablet.

The multi-core performance reported is not very impressive, as expected because of the relatively small number of CPU cores. The JPEG Compress multi-core score shows CPU scaling factor of 2.72, which is good and implies utilization of the Cortex-A53 cores. The Mandelbrot floating point benchmark shows similar scaling.

However, the Lua integer benchmark has a very low multi-core scaling factor of 1.41, which is lower than expected, even when allowing for the limited number of cores. For example, MediaTek's MT6795 achieves multi-core scaling of 7.5 in this benchmark, and the Exynos chips range from 3.9 to 5.0. Other chips with a low multi-core scaling factor for Geekbench's Lua subtest include Snapdragon 810 (Cortex-A57-based), MediaTek's MT6595 (Cortex-A17-based) and NVIDIA's Denver-based Tegra-K1 SoC. There are indications that this benchmark test heavily depends on on-chip cache (primarily L2 cache) size and speed.

GPU performance of MT8173's PowerVR GX6250 GPU improves on G6200


The MT8173 test device's GPU performance as shown in GFXBench results database is not overly impressive, but suitable for a mid-range chip and an improvement over the PowerVR G6200 GPU used in other MediaTek SoCs such as MT6595 and MT6795. In the T-Rex Offscreen benchmark, the MT8173 registers a score of 1487, higher than the 1311 of the MT6595 (G6200)-equipped Meizu MX4. In the GFXBench 3.0 low-level tests, alpha blending scores higher than the MT6595 while the other low-level scores are comparable.

Sources: ARM (Cortex-A57 announcement press release), AnandTech (MediaTek MT8173 article), MediaTek (MT8173 announcement), Geekbench Browser (MT8173 test device results), GFXBench (MT8173 test device result)

Updated 10 March 2015.

Friday, March 6, 2015

China tablet processor market declines in Q1

According to a recent article published by DigiTimes Research, tablet applications processor unit shipments to Chinese manufacturers grew by 4.7% in Q4 2014 to reach 34.7 million units. However, shipments are estimated to decline by 24% in Q1 2015 when compared to Q4 2104. Year-over-year, shipments are expected to drop by about 8%, which marks the first time quarterly tablet processor shipments in China experience a year-over-year decline. Excess inventory from Q4 2014 is given as a cause for the decline in shipments.

MediaTek leads Chinese tablet market in Q1 2015


Based on information published by DigiTimes Research, MediaTek, Rockchip, Allwinner and Intel were the top four providers of tablet processors in China, in that order, in Q4 2014. For Q1 2015, MediaTek is estimated to expand it market share by about 1% to reach 28.5%, although absolute shipments will decline significantly due to the overall market decline.

Rockchip, who was the market share leader for most of 2014, is estimated to see its market share remain stable in Q1 2015, registering a 0.6% increase according to DigiTimes Research, who did not supply a market share figure for Rockchip, although it is probably in the region of 25%. DigiTimes mentioned that Rockchip's new chips launched at the end of 2014 (which includes the Cortex-A7-based RK3126 and RK3128) have not yet reached strong shipments.

Meanwhile, Allwinner continues the trend of a steady decline a market share, being expected to have a share of 15.6% compared to 17.6% in Q4 2014. This allows it to be passed by Intel in terms of market share, with Intel's market share estimated to rise from 15% to 16.3% in Q1 2015.

Intel's global market share has increased and is significant, especially revenue share


It should be noted that in terms of global market share, Intel has a stronger position than what would be inferred just from the Chinese market due to a strong position at brand-name tablet manufacturers outside of China, such as Asus and Acer. The other chip players in the Chinese tablet processor market, especially Rockchip and Allwinner, have a weak position outside of China. Due to the higher-end nature of Intel's product mix, Intel also has a higher revenue share, whereas the sales of companies such as Allwinner are mostly concentrated in low-end processors. It has been reported that Intel is abandoning its "contra-revenue" strategy of subsidizing tablet processor sales, which it probably can afford to do because its chip solutions are fairly competitive on their own.

Global brand names gain share, use different chip suppliers


In the global tablet marker, brand name manufacturers are gaining share and dominate the dollar value of the market, also for semiconductor content. Apple and Samsung, who lead the global tablet market, use a lot of in-house chip solutions (100% in the case of Apple). Samsung also uses suppliers like Qualcomm and Marvell, who otherwise do not have a strong position in the Chinese tablet market.

MediaTek used to have strong market share among Taiwanese tablet manufacturers such as Asus and Acer. However, its market share their seems to have been eroded significantly by strong adoption of Intel's Atom SoCs at these manufacturers (who have strong ties with Intel through PC manufacturing).

Popular tablet SoCs as of Q1 2015


By analyzing the tablet models offered on Chinese e-commerce portals, one can get some idea of what SoCs are currently used the most in tablets from China. I took a look at the tablet offerings on Banggood.com.

Rockchip's RK3188 (which probably means the RK3188T variant in most cases) is still widely used. Originally a mid-range performance segment SoC, there are indications that Rockchip built a significant inventory of this SoC (which is not particularly cheap in terms of manufactuing cost) last year, and the chip has been used in cheaper models as well. Rockchip's RK3126, which is more cost-effective than RK3188, is slowly starting to appear in new tablet models.

Meanwhile, Rockchip's high-end RK3288 is used in several models from Pipo, Teclast and FNF, and these seem to be reasonably popular for a high-end product. I have some concerns about power consumption and battery life regarding these products due to the processor cores used in the SoC.

The most popular MediaTek chips used in tablets are SoCs with 3G connectivity such as the low-end dual-core MT8312 and quad-core MT8382 (the equivalent of the MT6572 and MT6582 smartphone SoCs), as well as the more performance oriented octa-core MT6592/MT8392, which provides good performance and battery-life and has moved down to lower-priced tablet models. Additionally, the new 64-bit MT8752 with 4G (equivalent to the MT6752 smartphone SoC) is starting to appear in new models (Cube, Teclast). For WiFi-only tablets, the MT8127 (which has a relatively powerful GPU for a cheap SoC) is used in some low-to-mid-range tablets.

Allwinner's A31s, which was released in 2013 but perhaps its last successful product introduction, appears to be still used for production. Low-end tablets are available with the A23 and A33 SoCs, although the A33 does not seem to have been very successful and has been affected by weakness in the low-end segment of the tablet market.

Allwinner's new octa-core A83T has started to appear in a few new models, and is probably replacing the high-end A80 Octa which is likely to have had low profit margins.

Finally, Intel's Z3735F, Z3735G and Z3736F Atom SoCs are widely used in tablets, although most prominently in higher-prices models that come equipped with Microsoft Windows.

Update (15 March): 3G smartphone chip inventory unloaded onto Chinese tablet market


In an article published on 13 March 2015, DigiTimes Research reported that due to a high inventory level of 3G smartphone solutions in China, such chips will be unloaded onto the Chinese tablet market by players such as MediaTek, Qualcomm and Spreadtrum.

3G-enabled chip solutions for tablets are usually very similar to similar solutions for smartphones. For example, MediaTek's smartphone solutions have commonly been used in tablets, while MediaTek's official 3G-enabled tablet solutions most likely consist of a chip virtually identical to the smartphone version, with the main difference being a different model number (e.g. MT6582 vs MT8382). That MediaTek would target any excess inventory of 3G smartphone chipsets at the tablet market is not surprising.

However, I am little sceptical about the volume that may be involved. The Chinese tablet market is clearly contracting in the near term, and the volumes in the tablet market are considerably smaller than the smartphone market, even the declining 3G part of the smartphone SoC market. To put things into perspective, MediaTek's quarterly 3G smartphone chip shipments were on the order of 70 million in Q4 2014, while its 3G tablet chip shipments were probably in the range of 5 to 10 million.

The article also mentions Qualcomm, which in the past has not been a major player in the Chinese white-box tablet market. It mentions rumours that Qualcomm may form a partnership with Allwinner (which has been consistently losing market share) to penetrate the tablet market in China. The article also states that while Intel has introduced 3G tablet solutions, Intel's solutions are unlikely to be widely adopted until Intel introduces the 4G version of its Atom x3 (formerly SoFIA) platform.

Sources: DigiTimes (Q1 2015 China tablet AP market article)DigiTimes Research (smartphone chips inventory unloaded to tablet market)

Updated 15 March 2015.

Thursday, March 5, 2015

A deeper look at graphics benchmark results, including GFXBench 3.1 and Basemark X

In this post I will take a closer at graphics benchmark results for different SoCs. I will look beyond just GFXBench (for which a new version has appeared), because the workload tested by well-known GFXBench tests such as T-Rex and Manhattan is not necessarily reflective of the actual gaming experience. Alternative benchmarks exist, such as Basemark X which uses the Unity engine that is commonly used in games.

GFXBench 3.1 released for OpenGL ES 3.1, Snapdragon 805 does well


Kishonti recently released a new version of GFXBench, GFXBench 3.1 for OpenGL ES 3.1, that includes tests for the OpenGL ES 3.1 API standard supported by many recent devices. A few results from the new benchmark tests are already available, with the Adreno 420 GPU inside Snapdragon 805 closing most of the performance gap with the Mali-T760 MP6/MP8 in Samsung's Exynos SoCs in the Manhattan 3.1 test.

                                                      Offscreen Manhattan Manhattan
Device               SoC             GPU              T-Rex        3.0       3.1

NVIDIA Shield Tablet NVIDIA K1-32    Tegra K1 GPU        3692     1979      1443  
HTC One M9           Snapdragon 810  Adreno 430          2732     1413
Galaxy S6 Edge       Exynos 7420     Mali-T760 MP8?      3312     1607       793
Sams. Galaxy Note 4  Snapdragon 805  Adreno 420          2386     1153       773
Samsung Galaxy S6    Exynos 7420     Mali-T760 MP8?      3314     1609       634
Sams. Galaxy Note 4  Exynos 5433     Mali-T760 MP6       2163     1110       436
HTC One M8           Snapdragon 801  Adreno 330          1608      768
Teclast X98 Air      Atom Z3736F     Intel HD            1014      564       307
Google Nexus 10      Exynos 5250     Mali-T604 MP4        818      351       185

NVIDIA's Tegra 32-bit version of Tegra K1 leads (the 64-bit Denver-based version of Tegra K1, and Tegra X1, have not yet been tested). Performance of Snapdragon 805 as implemented in certain models of the Samsung Galaxy Note 4 holds up better in the Manhattan 3.1 test than Samsung's Exynos SoCs with Mali-T760 MP6/MP8. Whereas Exynos 7420 (used in the Galaxy S6) has a clear advantage in existing benchmarks (1609 vs 1153 for Manhattan and 3314 vs 2386 for T-Rex), it loses that advantage in the new Manhattan 3.1 test (although the Galaxy S6 Edge benchmarks result suggests it is still slightly superior). Intel's Baytrail SoCs seem to hold up relatively well looking at the result for an Atom Z3736F-based tablet, albeit at a lower performance level.

GFXBench 3.1 results for Snapdragon 801 and the new Snapdragon 810 are not yet available. However, given the fact that GFXBench appears to generally do well on Snapdragon SoCs, they can be expected to score fairly highly. I'll say more about the apparent advantage for Qualcomm's SoC in GFXBench in the final section of this article.

Basemark X is a useful alternative to GFXBench


Basemark X is a gaming benchmark that utilizes the Unity engine that is commonly used in games, and developer Rightware claims that it actually reflects practical performance in games. Although it does include an on-screen demo, the actual benchmark scores appear to be derived from off-screen rendering at a fixed resolution, so that benchmark results can be compared objectively between different devices.

Previous generation SoCs: MT6582 beats Snapdragon 400 in Basemark X


Taking a look at previous-generation cost-sensitive SoCs, while MediaTek's ubiquitous quad-core 3G SoC MT6582 (which supports Open GL ES 2.0 only, through its Mali-400 MP2 GPU) scores lower than Snapdragon 400 in GFXBench's OpenGL ES 2.0-based T-Rex test (about 230 vs 330), in Basemark X MT6582-based devices score higher than Snapdragon 400 based devices. This is despite the fact that Snapdragon was/is often employed in devices with a considerably higher selling price than MT6582-based devices.

Device               SoC             GPU                 Display*   Medium   High

Samsung SM-G800F     Exynos 3470     Mali-400 MP4        1280x720    7527    2712
Vodafone 985N        MT6582          Mali-400 MP2         960x540    4950    1717
Acer E53             MT6582          Mali-400 MP2        1280x720    4870    1694
Wiko Rainbow         MT6582          Mali-400 MP2        1280x720    4826
Galaxy S3 Neo        Snapdragon 400T Adreno 305          1280x720    4540    1551
Moto G (XT1032)      Snapdragon 400  Adreno 305          1280x720    4440
HTC Desire 816d      Snapdragon 400T Adreno 405          1280x720    4354    1441
Samsung SM-A500F     Snapdragon 410  Adreno 306          1280x720    4132    1900
Samsung SM-A300F     Snapdragon 410  Adreno 306           960x540    4076    1892
Samsung SM-G530H     Snapdragon 410  Adreno 306           960x540    3987    1690
Samsung SM-G800A     Snapdragon 400  Adreno 305          1280x720    3946    1362
HTC Desire 820q      Snapdragon 410  Adreno 306          1280x720    3786

* While Basemark X is independent of display resolution in terms of rendering, the
memory bandwidth used for screen refresh has some impact, giving lower-resolution
devices a small advantage.
Notes: Samsung SM-G800F is the Galaxy S5 Mini (Exynos version), while SM-G800A is a Snapdragon 400 running at the non-standard maximum clock speed of 1.4 GHz; Vodafone 985N is the Vodafone Smart 4 Power; Acer E53 is the Acer Liquid E700; Galaxy S3 Neo runs the Snapdragon 400 SoC at a non-standard maximum speed of 1.4 GHz; HTC Desire 816d runs the Snapdragon 400 SoC at 1.6 GHz; SM-A500F is the Galaxy A5, while SM-A300F is the Galaxy A3; SM-G530H is the Galaxy Grand Prime.

For both the medium detail and high detail settings, MT6582-based devices consistently score higher in Basemark X than Snapdragon 400 and also Snapdragon 410-based devices for the medium detail test, which gives a different picture than the one you get from just looking at GFXBench's T-Rex benchmark

Snapdragon 410 performs worse than Snapdragon 400 in Basemark X medium-detail


Also notable is that Snapdragon 410, which is the successor of the Snapdragon 400 and would normally be expected to improve performance, actually has lower performance in practice as judged by the Basemark X medium detail benchmark. This matches earlier findings of performance flaws in Snapdragon 410. When running the high detail Basemark X benchmark, Snapdragon 410 does better and beats Snapdragon 400.

Mid-range SoCs: Snapdragon 615 and MT6752 closely matched


When running GFXBench, Snapdragon 615 and MT6752 are closely matched, with Snapdragon 615 scoring about 830 to 850 in T-Rex while MT6752 scores just above 870. For T-Rex, devices using MediaTek's prior-generation octa-core MT6592 score in the range 650 to 750. In the OpenGL ES 3.0 API-based Manhattan benchmark, Snapdragon 615 and MT6752 are very closely matched, both scoring around 360. We will also take a look at Basemark X results.

The following table shows Basemark X results for the new competing mid-range SoCs Snapdragon 615, MT6752 and HiSilicon's octa-core Hi6210 (Kirin 620), as well as for the prior-generation octa-core MT6592 from MediaTek.

Device               SoC             GPU                 Display*   Medium   High

Lenovo P70-A         MT6752          Mali-T760 MP2       1280x720   11311 
Meizu M1 Note        MT6752          Mali-T760 MP2       1920x1080  11168    4636
HTC Desire 816G      MT6592          Mali-450 MP4        1280x720   10984
Huawei CHE2-TL00     Hi6210          Mali-450 MP4        1280x720   10546    3439
Oppo R8106           Snapdragon 615  Adreno 405          1920x1080  10277    4846 
HTC Desire 820       Snapdragon 615  Adreno 405          1280x720   10133    4814
Samsung SM-A700FD    Snapdragon 615  Adreno 405          1920x1080  10052    4757
Archos 50C Oxygen    MT6592          Mali-450 MP4        1280x720    9867    3702
HTC Desire 616d      MT6592M         Mali-450 MP4        1280x720    7976    3045

* While Basemark X is independent of display resolution in terms of rendering, the
memory bandwidth used for screen refresh has some impact, giving lower-resolution
devices a small advantage.
Notes: SM-A700FD is the Galaxy A7; Huawei CHE2-TL00 is a new version of the Honor 4X.

When running the standard medium-detail version of Basemark X, MediaTek's MT6752 has  a moderate advantange over Snapdragon 615, while at the high detail setting Snapdragon 615 has a small advantage. Huawei's Kirin 620 performs adequately and just ahead of Snapdragon 615 in the medium detail setting.

MediaTek's prior-generation octa-core MT6592 with Mali-450 MP4 GPU keeps up relatively well in Basemark X,  with certain models (e.g. HTC Desire 816G) actually beating Snapdragon 615 in the medium detail setting.

Performance-oriented SoCs with Basemark X


The following table shows Basemark X results for several performance-oriented mobile SoCs.

Device               SoC             GPU                 Display*   Medium   High

Samsung Galaxy S6    Exynos 7420     Mali-T760 MP6       2560x1440  36017
Galaxy S5 LTE-A      Snapdragon 805  Adreno 420          1920x1080  32685   18334
Google Nexus 6       Snapdragon 805  Adreno 420          2560x1440  30362   20265
Sams. Galaxy Note 4  Snapdragon 805  Adreno 420          2560x1440  31963   21152
Sams. Galaxy Note 4  Exynos 5433     Mali-T760 MP6       2560x1440  29335   19019 

Apple iPad Air 2     Apple A8X       PowerVR Series 6    2048x1536  41700   29239
Google Nexus 9       NVIDIA K1-64    Tegra-K1 GPU        2048x1536  37939   28646
Apple iPad Mini 3    Apple A7        PowerVR Series 6    2048x1536  26499   14780
Teclast X98 Air      Atom Z3736F     Intel HD            2048x1536  14825    7160
Teclast P90HD        Rockchip RK3288 Mali-T764           2048x1536  13053    5645
Onda V989 Core8      Allwinner A80   PowerVR G6230       2048x1536  11004    5724

Meizu MX4 Pro        Exynos 5430     Mali-T628 MP6       1920x1200  25547   12674
Samsung SM-G900A     Snapdragon 801  Adreno 330          1920x1080  25178   11930
Samsung SM-G850F     Exynos 5430     Mali-T628 MP6       1280x720   21872   10666
Meizu MX4            MT6595          PowerVR G6200       1920x1200  17038    7817
Huawei MT7-TL10      Kirin 925       Mali-T624 MP4       1920x1080  15973    6802

* While Basemark X is independent of display resolution in terms of rendering, the
memory bandwidth used for screen refresh has some impact, giving lower-resolution
devices a small advantage.
Notes: SM-G900A is the Samsung Galaxy S5 (US version), Huawei MT7-TL10 is the Huawei Mate 7.

Looking at the ultra-high-end smartphone segment (mostly with a display resolution of 2560x1440), Exynos 7420 provides superior performance in Basemark X. Snapdragon 805 follows, a small distance ahead of Exynos 5433 as used in the Samsung Galaxy Note 4.

In the high-end tablet segment, Apple's iPad Air 2 with the Apple A8X leads, but the Nexus 9 with NVIDIA's Tegra K1 (64-bit version) comes fairly close. Apple's prior generation SoCs also delivers good performance, while Intel's current Baytrail SoC for the tablet market outperforms two high-end chips from established Chinese players in the tablet SoC market, Rockchip's RK3288 and Allwinner A80 Octa.

Mainwhile, in the mainstream performance smartphone segment, Snapdragon 801 (in the past the performance leader in the market) still provides good performance, but is actually just beaten by the 32-bit Exynos 5430 in the Meizu MX4 Pro. The chip is also used in the Galaxy Alpha (for which it provides higher-than-necessary performance given its relatively low screen resolution), while the performance of MediaTek's MT6595 SoC, while not bad, falls short of most other high-end solutions. HiSilicon's Kirin 925 as implemented in the Huawei Mate 7 is just behind.

Conclusion


It appears that just concentrating on GFXBench may give a misleading picture with regard to 3D graphics performance of mobile SoCs. In particular it is apparent that Qualcomm's Snapdragon SoCs consistently do better in GFXBench than in other benchmarks such as Basemark X. This is particularly true for the lower-end Snapdragon 400 and higher-end Snapdragon 800 series; for Snapdragon 615, results are more consistent across different benchmarks.

Basemark X, which utilizes the Unity game engine commonly used in mobile games, may more accurately reflect real-world performance.

Sources: Rightware Power Board (Basemark X benchmark results), GFXBench results database

Updated 5 March 2015: Add Galaxy S6 Edge result for GFXBench 3.1.
Updated 15 March 2015.

Tuesday, March 3, 2015

A detailed comparison of Cortex-A53-based and other SoCs using Geekbench, and impact of AArch64

More Cortex-A53 CPU core-based SoCs have recently come to market and more benchmark results are now available, for example from the Geekbench results database. Firmware is also becoming more mature. This makes it possible to make better comparisons between different Cortex-A53-based SoCs (for example, octa-core SoCs) and compare the performance of the highest-performance chips with competitive chips that use more expensive CPU cores such as Krait 400 and Cortex-A57.

Overview of Cortex-A53-based SoCs


The following is a list of Cortex-A53 CPU core-based mobile SoCs that have appeared in the market or for which benchmark results have become available. All chips integrate 4G LTE modem functionality unless otherwise noted.

  • Snapdragon 410 (MSM8916), utilizing four early Cortex-A53r0p0 cores. Numerous cost-sensitive smartphones now use this chip. However, none of them appears to take any advantage at all of the new ARMv8 instruction set, with all of them running in ARMv7 compatibility mode. This is counter-intuitive because AArch32 (32-bit version of ARMv8), which is used by the other SoCs, already brings significant benefits. Snapdragon 410 generally perform significantly worse than other Cortex-A53-based SoCs, even when correcting for the low clock speed. This is also reflected in memory performance. The Adreno 306 GPU tends to be even a little slower than the Adreno 305 GPU in Snapdragon 400. The net result is a chip that is not much faster than Snapdragon 400 in many cases while having worse battery life.
  • Snapdragon 615 (MSM8939), equipped with an octa-core Cortex-A53r0p1 CPU configuration with four cores running (in practice) at 1.54 GHz or 1.50 GHz and four cores running at a lower maximum clock frequency (probably 1.0 GHz). This chip has appeared in an increasing number of new smartphone models. Runs in AArch32 mode. Performance is significantly lower than MediaTek's octa-core Cortex-A53-based SoCs, which can run all eight Cortex-A53 cores at the maximum frequency. Memory performance is improved from Snapdragon 410 but falls short of that of MediaTek's SoCs. The Adreno 405 GPU is fairly competitive, suitable for a mid-range SoC, although the 32-bit RAM interface of the SoC limits performance, especially at high resolutions. It is manufactured used TSMC's lower performance 28LP process. There have been reports that the chip gets hot with intensive use and requires throttling.
  • MediaTek MT6732, with an quad-core Cortex-A53r0p2 CPU configuration running at a maximum clock speed of 1.5 GHz. Devices using the chip are starting to become available, and tablets with the tablet version of this chip (MT8732) have also been announced. Although it has only four CPU cores, it has good performance, beating Snapdragon 615 in single core performance at a similar clock speed, and memory performance is significantly higher. The Mali-T760 MP2 GPU contributes to better GPU performance than previous MediaTek chips targeting cost-sensitive segments, although falling short of that of Snapdragon 615 and MT6752. A tablet version of the chip exists as MT8732.
  • MediaTek MT6752, featuring an octa-core Cortex-A53r0p2 CPU configuration with a maximum clock frequency of 1.69 GHz. Several devices have come to market using this chip, including the Meizu M1 Note. Performance is excellent, with high scores in the Geekbench CPU benchmark, considerably higher than Snapdragon 615 and beating high-end SoCs such as Snapdragon 801 in several metrics. The Mali-T760 MP2 GPU is clocked higher than that of the MT6732, resulting in good GPU performance, comparable to that of Snapdragon 615, as measured with GFXBench, although the 32-bit memory interface will be a bottleneck at high resolutions. Manufactured using TSMC's high-performance 28HPM process. A tablet version of the chip exists as MT8752.
  • MediaTek MT6795, with an octa-core Cortex-A53r0p2 CPU with clock speed up to 2.16 GHz. With a dual-channel memory interface and high resolution support, this SoC targets a higher performance segment than the previously mentioned chips, for which it can potentially offer much better performance/dollar because of the small die size of Cortex-A53 cores. Originally announced as become available in commercial devices before the end of 2014, it was delayed but competitive benchmark scores for what appears to be more mature versions of the chip have recently shown up. It appears to be configured with full AArch64 mode. Performance is excellent, with single-core performance closing much of the gap with the high-end Snapdragon 801, while multi-core performance is significantly higher. There appears to be a "Turbo" version running the CPU up to 2.16 GHz, while the regular version clocks at 1.95 GHz. At the MWC on 2 March 2015, MediaTek apparently rebranded the MT6795 as Helio X10.
  • MediaTek's MT6735 is a SoC for entry-level smartphones for which benchmark results have not yet become available. It has a quad-core Cortex-A53 CPU configuration and a Mali-T720 GPU, a downgrade from the Mali-T760 GPU in MT6732. The recently announced MT6753, with eight Cortex-A53 cores running up to 1.5 GHz, is compatible with the MT6735 and also has a Mali-T720 GPU (probably MP4). Other chips that have shown up in product announcements include the MT8161 (probably the equivalent of the MT6735 without modem) and MT8165 (equivalent to MT8732 without modem).
  • Qualcomm has announced additional octa-core Cortex-A53-based chips, Snapdragon 415 and Snapdragon 425. These probably utilize symmetrical Cortex-A53 configuration with all cores running at the same maximum clock frequency, unlike Snapdragon 615. Otherwise, the new SoCs are similar to Snapdragon 615, with the same Adreno 405 GPU. According to Qualcomm, devices using these chips will become commercially available in the second half of 2015.
  • Kirin 620 (Hi6210) from HiSilicon (Huawei) is an octa-core Cortex-A53r0p3-based SoC running up to 1.2 GHz. The GPU is a Mali-450 MP4. Although performance (including single-core performance) is better than Snapdragon 410, it is not as optimized as chips such as MT6752 and runs at a relatively low clock speed. Multi-core performance scaling is less than expected.

Geekbench integer and memory scores comparison


The following table provides details about selected Geekbench integer and memory benchmark scores for different Cortex-A53-based SoCs, and also other smartphone SoCs from Qualcomm, MediaTek and Samsung for comparison.

                Arch    Max freq. JPEG C. IPC   JPEG C. Dijkstra      Stream Copy   Geekbench
                                  Single  x A7  Multi   Single Multi  Single Multi  Ref. number

Snapdragon 410  ARMv7     1.19      596   1.30   2384     810   2135   431   492    1551964
Snapdragon 615  AArch32 1.50/1.0    820   1.42   4979     886   3646   572   703    2015694
MT6732          AArch32   1.50      843   1.46   3357    1041   3002  1001  1199    1546611
MT6752          AArch32   1.69      952   1.46   7554    1144   4483  1071  1191    1583540
MT6795          AArch64   1.95     1026   1.37   8167     990   3802  1356  2068    2002894
MT6795T         AArch64   2.16     1128   1.36   8962    1064   4109  1350  2140    1984431
Hi6210          AArch32   1.20      660   1.43   3501     744   2772   602   900    1999304

Snapdragon 400  ARMv7     1.19      462   1.01   1860     700   2132   534   551    1938063
Snapdragon 801  ARMv7     2.46     1347   1.42   5437    1174   3586  1931  2144    1491681
Snapdragon 805  ARMv7     2.65     1475   1.45   4105    1230   4058  2117  2910    1502687
Snapdragon 810  AArch64  ?/1.55    1358          5972    1073   3584  1428  1838    2017257
MT6582          ARMv7     1.30      506   1.01   2027     748   2354   250   396    2017732
MT6592          ARMv7     1.66      643   1.01   5086     891   3327   261   388    2000008
MT6595          ARMv7   2.20/1.69  1350   1.59   6080    1844   5612  1652  1986    1591744
Exynos 5430     ARMv7   1.80/1.3   1056   1.52   5140    1102   3918  1457  1559    1556780
Exynos 5433     AArch32   1.89     1456   2.10   6209    1523   5728  1396  1458    2017193
Exynos 7420     AArch64  ?/1.50    1481          7168    1065   4596  1953  2579    2012972

The low performance of Snapdragon 410 is apparent in the scores, with normalized IPC (instructions per cycle to the equivalent of a 1.0 GHz Cortex-A7) for the CPU-speed sensitive single-core JPEG Compress benchmark being lower than that of other Cortex-A53-based SoCs, probably due to being limited to ARMv7. The Dijkstra benchmark even scores lower on Snapdragon 410 than on an equivalently clocked Snapdragon 400, and memory performance is also lower.

Snapdragon 615, while improving on Snapdragon 410, also appears to be less optimized than MT6732/MT6752 in terms of single-core IPC, despite a very similar clock frequency. Looking at multi-core performance, MT6752 is significantly faster than Snapdragon 615, largely due to being able run all eight cores at the maximum clock frequency. MT6732 and MT6752 also have significantly higher memory performance, reaching an impressive score for devices with a 32-bit memory interface.

The higher clock speed of MT6795 (Helio X10) brings benefits for integer performance, but due to the use of the AArch64 instruction set, normalized IPC is lower (1.36 vs 1.46 for JPEG Compress). This is especially true for the Dijkstra benchmark, where AArch64 mode imposes a significant penalty (this is also seen on other platforms utilizing AArch64).

Overall, a high-speed Cortex-A53 configuration such as implemented in the MT6795T comes fairly close to Snapdragon 801 for single-core performance, while being significantly faster for multi-core performance, at a significantly lower cost. Several metrics are also in the same ballpark as the current high-end leader Exynos 7420.

Analysis of the Geekbench Lua subtest


The Lua integer benchmark appears to be particularly sensitive to memory subsystem efficiency, including L2 cache size, and memory bandwidth as well being dependent on CPU speed. It is the kind of code that may frequently occur in actual practice on a smartphone.

                Arch      Lua     IPC   Lua    CPU    #CPUs
                          Single  x A7  Multi  Par.

Snapdragon 410  ARMv7      603    1.23  2137   3.54   4
Snapdragon 615  AArch32    709    1.15  1644   2.32   4 + 4
MT6732          AArch32    753    1.22  2419   3.21   4
MT6752          AArch32    842    1.21  2361   2.80   8
MT6795          AArch64   1053    1.31  8203   7.79   8
MT6795T         AArch64   1173    1.32  8847   7.54   8
Hi6210          AArch32    587    1.19  1740   2.96   8

Snapdragon 400  ARMv7      476    0.97  1874   3.94   4
Snapdragon 801  ARMv7      980    0.97  2880   2.94   4
Snapdragon 805  ARMv7     1016    0.93  2917   2.87   4
Snapdragon 810  AArch64   1283          1065   0.83   4 + 4
MT6582          ARMv7      514    0.96  1644   3.20   4
MT6592          ARMv7      651    0.95  1344   2.06   8
MT6595          ARMv7     1509    1.67  2498   1.66   4 + 4
Exynos 5430     ARMv7      981    1.33  1861   1.90   4 + 4
Exynos 5433     AArch32   1397    1.89  5478   3.92   4 + 4
Exynos 7420     AArch64   1409          7088   5.03   4 + 4

In this test, Snapdragon 410 performs reasonably well. MT6752's multi-core performance seems limited by a bottleneck, probably external memory bandwidth. MT6795's performance is impressive; while single-core performance falls a little short of Cortex-A57 based SoCs, for multi-core performance it blows past them, with CPU parallelism fully exploited. It seems the bottleneck present with the MT6752 (presumably memory bandwidth and the L2 cache memory size available to each core) is not present with the MT6795.

Qualcomm's Snapdragon 810 consistently scores in the 1000-1200 range for both the single-core and multi-core test, while the multi-core test would have been expected to be significantly higher. This appears to reflect a serious deficiency in the memory subsystem of the SoC (which might not only be related tot the LPDDR4 SDRAM controller, but also the on-chip L2 cache) which might also have negative implications for smoothness in every-day use.

Geekbench floating points subtests


Finally, let's look at floating point performance. The Mandelbrot subtest tests pure floating point performance, while the SGEMM and SFFT tests also significantly depend on memory performance.


                Arch      Mandelbrot                 SGEMM         SFFT
                          Single  IPC   Multi  Par.  Single Multi  Single Multi

Snapdragon 410  ARMv7      448    1.10  1794   4.00   245    489    317   1258
Snapdragon 615  AArch32    583    1.14  3611   6.19   303    688    426   2517
MT6732          AArch32    585    1.14  2336   3.99   337    653    430   1727
MT6752          AArch32    661    1.15  5257   7.95   384   1148    481   3870
MT6795          AArch64    823    1.24  6406   7.78   484   1542    618   4764
MT6795T         AArch64    912    1.24  7245   7.94   529   1659    694   5333
Hi6210          AArch32    467    1.14  3509   7.51   264    876    343   2178

Snapdragon 400  ARMv7      405    1.00  1620   4.00   203    634    285   1182
Snapdragon 801  ARMv7      788    0.94  3104   3.94   907   2816    992   3518
Snapdragon 805  ARMv7      848    0.94  3389   4.00  1011   2669   1130   4135
Snapdragon 810  AArch64   1100          5144   4.68   749   1828   1009   3643
MT6582          ARMv7      444    1.00  1765   3.98   230    512    328   1316
MT6592          ARMv7      568    1.00  4430   7.80   282    696    419   3397
MT6595          ARMv7     1284    1.71  5822   4.53   748   2337   1187   4255
Exynos 5430     ARMv7      990    1.61  4745   4.79   657   2491    896   3971
Exynos 5433     AArch32   1174    1.91  4883   4.16   751   2369   1044   4031
Exynos 7420     AArch64   1198          6129   5.12   945   2888   1313   4874

From these numbers its is clear that Cortex-A53 improves floating point performance somewhat when compared to Cortex-A7 at the same clock speed. When eight cores can run in parallel at high speed, multi-core floating point performance is impressive, as demonstrated by MT6752 and MT6795. Snapdragon 801 and 805 are looking a bit dated in this department.

In the memory-intensive SGEMM and SFFT tests, Snapdragon 400 comes close to Snapdragon 410, illustrating the lack of performance improvement by Snapdragon 410. In fact MediaTek's previous generation MT6582 matches the floating point performance of Snapdragon 410 across all tests.

The Cortex-A57 based SoCs have the highest single-core floating point performance, although the Cortex-A17-based MT6595 is also very strong. Exynos 5433 and Exynos 7420 beat Snapdragon 810 in most floating point tests, although the difference is not as large as it used to be with earlier results for Snapdragon 810.

Conclusion


It is clear that octa-core Cortex-A53-based SoCs can deliver strong performance at a relatively low cost, and this particularly true for MediaTek's new chips, MT6752 and MT6795. The MT6795, with its higher clock speed and dual-channel memory interface, can match current high-end chips in most metrics, being not much slower in single-core performance while being superior in multi-core.

One unknown question is whether the high maximum clock frequency of the MT6795 and MT6795T, which deliver impressive performance/dollar, translates to acceptable power consumption and battery life. Observations that power consumption for Cortex-A53 can quickly increase at higher frequencies for the Samsung-manufactured Exynos 5433 have been made, but MT6795 is manufactured on different process at TSMC and probably makes use of specific design optimizations for high clock speeds (ARM POP IP core hardening technology) that make power consumption more acceptable.

Sources: Geekbench Browser

Updated 10 March 2015.

Sunday, March 1, 2015

Samsung announces Galaxy S6 with Exynos 7420 SoC manufactured on "14nm" FinFET process

At the Mobile World Congress today (Sunday 1 March), Samsung announced the Galaxy S6 and Galaxy S6 Edge, featuring a numerous improvements over the previous generation Galaxy S5, including a SoC manufactured on Samsung's 14 nm FinFET-based process. The Galaxy S6 is planned to available in 20 countries starting on April 10th, 2015.

New model implement several improvements


The improvements in the new model include the following:
  • Exynos 7420 SoC manufactured on 14 nm FinFET process with 20 nm interconnects. The CPU is a big.LITTLE configuration with four Cortex-A57 and four Cortex-A53 cores, similar to Exynos 5433. The maximum clock speeds are 2.1 GHz and 1.5 GHz, respectively. Samsung claims 20% better performance and 35% better efficiency for the new chip when compared to Exynos 5433, which is manufactured using Samsung's 20 nm HKMG process.
  • The GPU has been rumoured to be a faster version of the Exynos 5433's Mali-T760 MP6 (either a higher clock rate or an MP8 configuration).
  • Early benchmarks indicate a significant increase in CPU and memory performance combined with a measurable increase in GPU performance (which is required because of the higher screen resolution).
  • Runs in 64-bit AArch64 mode, which has several advantages, as well as some disadvantages.
  • Uses new LPDDR4 SDRAM (3 GB), which has higher memory bandwidth at a given memory bus width due to higher effective clock speeds.
  • The cameras have been improved, including greater light gathering capability.
  • The 5.1" AMOLED screen's resolution is QHD (2560x1440), which is 77% more pixels than the FullHD (1920x1080) screen in Galaxy S5. The higher CPU, GPU and memory performance are essential to keep pace with increased demands caused by the higher resolution.
  • Utilizes the new UFS 2.0 interface for embedded flash memory, providing SSD-like performance according to Samsung.
  • Cat 6 LTE mode.
  • Touchwiz user-interface on top of 64-bit Android 5.0 is said to be more intuitive and less demanding in terms of processing requirements.
At the same time,  Samsung has dropped the MicroSD slot and the battery is non-removable. The battery capacity is also slightly smaller that of the Galaxy S5.

The Galaxy S6 Edge, like the Galaxy Note 4 Edge, features a screen that curves around the edges. It is priced significantly higher than the Galaxy S6, which will not be cheap either.

Quick ramp of 14nm FinFET process brings challenges to Samsung


The initial 14 nm FinFET process used by Samsung has been reported to use 20 nm interconnects with a 14 nm features size. As such it is more of an evolutionary step from 20 nm than full-blooded 14 nm FinFET would be, comparable to some degree with TSMC's 16FF process.

Still, Samsung will face a huge challenge ramping up the process in sufficient volume and acceptable yield rates to equip the high volume of Galaxy S6's expected. Rumours have mentioned low yield for the process in the recent past as Samsung started ramping up (test) production. Given the massive investment in the new process and non-optimal yield rates, it is unlikely that Samsung will significantly benefit financially from production of the chip in the near-term in terms of gross margin and other chip production-related metrics.

However, the performance lead of the Galaxy S6 made possible by the new chip could have significant positive implications for the sales and financial performance of Samsung's smartphone division, allowing Samsung to recoup some of its investment.

A few months ago, Samsung already signed an agreement with Apple whereby Samsung would supply part of the production capacity for future Apple processors. If this bears fruit it would allow Samsung to recoup more of its investment in 14 nm FinFET technology in the future.

Early benchmark performance impressive


In early benchmarks scores reported in Geekbench's result database, a device that probably is the Galaxy S6 shows impressive performance, well ahead of most existing SoCs and devices. In a direct comparison with an Exynos 5433-equipped Galaxy Note 4, the performance gain is fairly significant for most benchmarks (up to 30% for integer tests, higher for floating point), with a few negative outliers such as SHA2 and the Dijkstra integer subtest. The Dijkstra subtest also scores lower on other 64-bit AArch64 platforms, suggesting it suffers from particular AArch64 features such as the doubled size for pointer storage.

Memory performance is also significantly higher, aided by high clock rate and high amount of bandwidth delivered by the LPDDR4 memory interface, which unlike Qualcomm's Snapdragon 810 does not seem to have serious flaws.

Sources: AnandTech (Samsung annnounces the Galaxy S6 and Galaxy S6 Edge), AnandTech (Samsung Unpacked, MWC 2015 Live Blog), Geekbench Browser (Samsung SM-G925F)

Tuesday, February 17, 2015

Cortex-A53 not as power efficient as Cortex-A7

Recent detailed technical review articles published by AnandTech based on a comparison of Samsung Exynos SoCs have elucidated some of the details about the performance of the Cortex-A53 core, including processing performance, power consumption and die size. Overall, it appears that while Cortex-A53 is significantly faster than Cortex-A7 at the same clock speed, die size and power consumption on an equivalent manufacturing process has increased by a greater amount, leading to lower performance/Watt.

Direct comparison of Cortex-A7 and Cortex-A53 on the same process


In a recently published technical review article about the ARM Cortex-A53, Cortex-A57 CPU cores and Mali-T760 GPU core, based Samsung's Exynos-based Galaxy Note 4 model, AnandTech has provided details about the performance, power consumption and die size of the 64-bit Cortex-A53 core relative the its 32-bit predecessor, Cortex-A7. It has done so by comparing measurements of the Cortex-A53 cores inside the Exynos 5433 used in the Note 4 with the Cortex-A7 cores inside the Exynos 5430 used in the Galaxy Alpha. Both SoCs are produced using a similar 20nm process at Samsung, making a direct comparison possible.

Cortex-A7 is an in-order pipeline CPU core with moderate performance but an extremely small die size and very low power consumption. The Cortex-A53 core has been designed by ARM as a logical extension of Cortex-A7 to ARM's 64-bit ARMv8 instruction set with higher performance. However, in doing so die size and power efficiency have suffered somewhat.

CPU performance increased in Cortex-A53


According to the designer of Cortex-A53 at ARM, Cortex-A53 increases SPECint-2000 performance from 0.35 SPEC/MHz to 0.50 SPEC/MHz when compared to the Cortex-A7 core. In Geekbench integer benchmarks, disregarding cryptography benchmarks which a show a large increase, performance is still about 50% higher for Cortex-A53 when compared to Cortex-A7 at the same clock speed, with the biggest gains coming with multi-threaded performance (aided by the increased memory performance).

For floating point benchmarks the performance increase reported by AnandTech is dramatic, with most benchmarks showing a two to three times performance increase. However, there seems to be a discrepancy between these benchmarks results and benchmark results available from the Geekbench results database for Cortex-A53 and Cortex-A7-based devices, showing ony a moderate floating point performance increase for Cortex-A53 over Cortex-A7. Most likely, AnandTech is erroneously reporting Cortex-A57 core floating performance in this case (this matches Geekbench results that I previously tabulated).

Memory performance benchmarks performed by AnandTech show a relative increase in latency for a Cortex-A53 cluster between transfer sizes of 256 KB and 512 KB when compared to a Cortex-A7 cluster, despite the fact that this should fit inside the 512 KB L2 cache. However, as I previously noted in earlier blog articles, the benchmarks show that memory bandwidth has significantly increased with Cortex-A53 when compared to Cortex-A7, virtually doubling. This most likely contributes to the Cortex-A53 core's greater multi-threading performance in practice.

Power consumption of Cortex-A7 greatly reduced with Samsung's 20 nm process


AnandTech has published a detailed chart showing estimates for power consumption of the previous generation 32-bit Cortex-A7 and Cortex-A15 cores on both 20 nm and 28 nm processes at Samsung, based on Samsung's Exynos 5422 (28 nm) and Exynos 5430 (20 nm) SoCs.

While the high-performance Cortex-A15 cores are seeing a power reduction of about 25%, power consumption of the Cortex-A7 cores sees a significant 40% reduction with a 56% reduction at the highest CPU frequency of 1300 MHz. This can be partly explained by Samsung optimizing the Cortex-A7 cores inside Exynos 5430 for low power consumption using ARM's POP IP optimization platform.

Ironically, the excellent power characteristics of the Cortex-A7 at the latest processes such as Samsung's 20 nm process have not been taken advantage of in the market except in Samsung's Exynos big.LITTLE 5430, since Cortex-A7 adoption is mostly limited to 40 and 28 nm and all announced 20 nm SoCs use Cortex-A57 and Cortex-A53 cores. There seems to be an opportunity for ultra-efficient 20 nm Cortex-A7-based SoCs for certain product segments, while there is also a significant opportunity for 20 nm Cortex-A53-only SoCs that should be more power efficient than their 28 nm equivalents.

One could envision a hypothetical octa-core Cortex-A7-based SoC manufactured on TSMC's 20nm HPM process delivering spectacular performance/Watt, with relatively high clock speeds being possible. AnandTech's article notes that TSMC's 28nm and 20 nm HPM processes are most likely significantly more efficient than Samsung's equivalent process technology because they allow CPUs to operate at lower voltage level. A similar argument applies to Cortex-A53-based SoCs manufactured at 20 nm, albeit with lower performance/Watt.

In terms of die size, AnandTech reports a significant reduction of 45% for the the Cortex-A7 cores and 64% for the Cortex-A15 cores in the 20 nm Exynos 5430 vs 28 nm Exynos 5422.

Cortex-A53 has significantly greater power consumption than Cortex-A7


AnandTech has published a detailed chart with power consumption characteristics of the Cortex-A53 cores inside Samsung's Exynos 5433 manufactured at 20nm. In their analysis, AnandTech notes a relatively large increase in power consumption when utilizing multiple Cortex-A53 cores at their highest frequency (1300 MHz on Exynos 5433), when compared to running at 1.0 GHz. This correlates with a voltage bump when going from 1.0 to 1.3 GHz.

Based on this analysis, the article concludes the power consumption is more than twice as large for Cortex-A53 when compared to Cortex-A7 at an equivalent clock speed of 1300 MHz at a similar manufacturing process (Samsung's 20nm process). Although the Cortex-A53 core's CPU performance is greater, it is not twice as great leading to clearly lower performance/Watt for Cortex-A53 when compared to Cortex-A7.

It is possible that the chip errata (hardware bugs) in earlier revisions of Cortex-A53 that I mentioned in previous articles play a role in reducing the measured performance and power efficiency of Cortex-A53. Exynos 5433 uses Cortex-A53r0p1, which is affected by this. The chip errata require more frequent cache flushing as a work-around, which can potentially affect performance as well as power consumption. The non-optimal state of big.LITTLE kernel scheduling code may exacerbate these problems. There is potential for later revisions of Cortex-A53 such as r0p3 to deliver higher efficiency because they are not affected by these hardware problems. Chips with Cortex-A53 revision r0p3 have not yet appeared on the market.

Chip-specific core optimizations makes comparisons more difficult


It should be noted that specific optimization of the processor cores for a particular higher clock frequency target (e.g. in chip like MediaTek's MT6752 and MT6795) or low power consumption at lower clock frequency (for example, in a big.LITTLE configuration), using ARM's POP core hardening technology, has the potential skew the comparison between different chips. MediaTek's MT6752 has already been reported to have acceptable power consumption while running at relatively high maximum clock frequency, which would otherwise be incompatible with the steep rise in power consumption for clock speeds above 1.2 GHz observed in the charts for the Samsung chips.

Die size of Cortex-A53 increased compared to Cortex-A7


The die size of Cortex-A53 cores when compared to Cortex-A7 in Samsung's chips is about 1.75 times greater according to AnandTech, although it is still below one square millimeter, which is still low for a CPU. When looking at the total cluster size, which includes the L2 cache (the same amount of 512 KB for Cortex-A53 and Cortex-A7), the die size of the cluster is 1.38 times greater. The larger die size has consequences for cost-sensitive SoCs for low-end mobile devices and IoT applications, for which Cortex-A7 remains more attractive. Cortex-A7 can also be employed as an embedded CPU in a functional block such as a baseband processor,  just like Cortex-A5 is frequently used.

Consequences for mobile SoCs


The higher performance of Cortex-A53 when compared to Cortex-A7, especially memory bandwidth, makes high-clocked multi-core Cortex-A53-based SoCs suitable for mid-range performance segments. Examples of this are MediaTek's MT6752 and Qualcomm's Snapdragon 615 SoC. These SoCs also have higher GPU performance than that traditionally associated with Cortex-A7-based SoCs.

The increased power consumption and die size of Cortex-A53 causes Cortex-A7 to remain relevant, because it still delivers superior power efficiency, cost and die size, and consequently performance/Watt and performance/dollar are better than Cortex-A53. Hypothetically, a 20nm octa-core Cortex-A7 based SoC would deliver excellent power efficiency with quite acceptable performance due to higher clock speeds, and their may be a market for such a solution for smartphones. The main drawback would be that OS ecosystems such as Android are moving towards 64-bit implementations and can also make use of new cryptography instructions in ARMv8.

Sources: AnandTech (technical Exynos Galaxy 4 Note review)

Updated 1 March 2015 (Add section about core-hardening).