Showing posts with label Benchmarks. Show all posts
Showing posts with label Benchmarks. Show all posts

Friday, June 5, 2015

Smartphone platforms migrate to 64-bit (AArch64) mode

Recently, most existing and new mobile SoCs have started to become available configured in native 64-bit mode (AArch64) in conjuction with a 64-bit version of Android 5. Although SoCs targeting premium-level devices that are already shipping were the first to support AArch64 (including Tegra K1-64, Exynos 7420 and Snapdragon 810), recent entries in the Geekbench results database show that cost-sensitive platforms are also migrating to native 64-bit mode in upcoming smartphones.

This move involves Cortex-A53-based platforms such as MediaTek's MT6735, MT6752, MT6753 and MT6795, Qualcomm's Snapdragon 615 (MSM8939) as well as a new Snapdragon 410 (MSM8916) platform (which was previously limited to ARMv7), and HiSilicon's Kirin 620 and Kirin 930.

Initial ARMv8 platforms used hybrid AArch32 mode


Several ARMv8 based SoCs have been shipping for some time, but most have been using AArch32 mode, a hybrid mode which takes advantage of some of the architectural improvements in ARMv8 but does not expose native 64-bit mode to applications. Snapdragon 410 did not even take any advantage of ARMv8, running in 100% ARMv7 mode.

One reason why full AArch64 mode has not been adopted right away is that is does come with a performance penalty due to the increased storage requirements for program code and pointers, which puts greater demands on the memory subsystem of the SoC. Cost-sensitive smartphone models are especially sensitive to this due to a lower amount of RAM and smaller on-chip CPU caches. A decrease in the price of RAM chips has allowed the amount of RAM in cost-sensitive models to increase (e.g. more devices shipping with 2GB RAM), making AArch64 mode more appealing.

AArch64 also has benefits, in particular for floating point and data-intensive applications that use NEON vector instructions.

Comparison of CPU benchmark results


The migration to AArch64 mode across the board makes it easier to compare CPU benchmarks of different SoCs, which was previously made more difficult by the fact that some SoCs used AArch64 mode while others were still limited to AArch32.

In the following sections, I will return to Geekbench CPU test results and try to make apples-to-apples comparison for different groups of SoCs.

Quad-core Cortex-A53 SoCs


Quad-core SoCs included are MT6732, MT6735 and Snapdragon 410. Note that the version of Snapdragon 410 tested most likely reflects a newer silicon revision that has not yet widely appeared in end devices, since previous versions of Snapdragon 410 (MSM8916) were always limited to ARMv7 mode (seemingly being unable to run in AArch32 mode).

The following table shows selected integer tests results from Geekbench entries for the mentioned SoCs, running in AArch64 mode.

SoC        Geekbench  Clock  JPEG Compress (int)      Lua (int)
           ref        speed  Single IPC   Multi Par   Single IPC   Multi Par

MT6732     2705430    1.50    783   1.36  3108  3.97   795   1.29  3017  3.79
MT6735     2650175    1.30    646   1.36  2604  4.03   656   1.23  2047  3.12
MSM8916-64 2708213    1.21    626   1.34  2481  3.96   615   1.24  1280  2.08

The table below shows selected floating point and memory results.

SoC        Geekbench  Clock  Mandelbrot (float)       Stream Copy (memory)
           ref        speed  Single IPC   Multi Par   Single Multi

MT6732     2705430    1.50    631   1.23  2490  3.95  1030   1156
MT6735     2650175    1.30    526   1.19  2091  3.98   901    965
MSM8916-64 2708213    1.21    508   1.23  1969  3.88   447    505

The "IPC" value as shown in the tables is an index calculated from a comparison with the performance of common Cortex-A7-based SoCs, normalized to the same clock speed. The parallelism value ("Par") is the performance scaling from single-core to multi-core for the specific Geekbench subtest.

The IPC values are fairly consistent, as would be expected from the same CPU core (Cortex-A53) running the same ISA (instruction set architecture). When scaling to multiple cores, MT6732 does best, as shown by the scaling in the Lua benchmarks. This is not surprising as MT6732 is not an entry-level SoC given its cost structure, being better described as belonging to the mid-range segment. It is likely to have a better memory subsystem (in particular, a larger and faster L2 cache) than the other chips.

MediaTek's new entry-level chip, MT6735, apart from running at a somewhat higher clock speed (1.3 GHz vs 1.2 GHz), outperforms the 64-bit version of Snapdragon 410 when normalized to the same clock speed, which is especially evident in the Lua multi-core test and memory tests. The Lua results could be a reflection of L2 cache size and/or speed. Memory performance (based on the Stream Copy subtest) of both MediaTek chips is roughly double that of Snapdragon 410 (something which was already evident in the respective 32-bit platform results).

Mid-range octa-core Cortex-A53-based SoCs


The octa-core Cortex-A53-based SoCs targeting the mid-range segment include MediaTek's performance-oriented MT6752, the recent cost-reduced MT6753, Qualcomm's Snapdragon 615 (MSM8939), and HiSilicon's Kirin 620 (Hi6210).

These SoCs use different CPU clock speed configurations. MediaTek's MT6752 and MT6753 run all cores at the same maximum clock speed, 1.66 GHz for MT6752 and (at least in the tested device) seemingly only about 1.1 GHz for MT6753, even though Geekbench reports a maximum clock speed of 1.3 GHz. HiSilicon's Kirin 620 can run all cores up to a maximum speed of 1.2 GHz.

Qualcomm's Snapdragon 615 uses a pseudo-big.LITTLE, hierarchical architecture with one performance cluster of four cores running up to 1.65 GHz in the most recent version of the platform (previous versions ran up to 1.5 GHz), with the other power-efficient cluster running at a significantly lower clock speed. MediaTek's annnouncement of the MT6755 (Helio P10) shows that MediaTek is also transitioning to a hierarchical CPU clusters for new chips, similar to Snapdragon 615.

Having one power-optimized CPU cluster helps power efficiency for low CPU demand scenarios such as smartphone standby or light usage. The fact that Snapdragon 615 is not very power efficient, despite the low-clocked cluster, in mostly due to the low-performance 28LP manufacturing process used.

The following table shows selected integer tests results from Geekbench entries for the mentioned SoCs, running in AArch64 mode.

SoC        Geekbench  Clock  JPEG Compress (int)      Lua (int)
           ref        speed  Single IPC   Multi Par   Single IPC   Multi Par

MSM8939    2704276    1.65    837   1.32  4269  5.10   789   1.16   667  0.85
MT6752     2709869    1.69    890   1.37  6719  7.55   907   1.31  6531  7.20
MT6753     2699665    1.10?   572   1.35  4298  7.51   587   1.30  4282  7.29
Hi6210     2704356    1.20    630   1.36  3473  5.51   626   1.27  2156  3.44

The table below shows selected floating point and memory results.

SoC        Geekbench  Clock  Mandelbrot (float)       Stream Copy (memory)
           ref        speed  Single IPC   Multi Par   Single Multi

MSM8939    2704276    1.65    661   1.17  4019  6.08    512   569
MT6752     2709869    1.69    714   1.24  5637  7.89   1024  1158
MT6753     2699665    1.10?   463   1.23  3597  7.77    802   958
Hi6210     2704356    1.20    506   1.24  3419  6.76    833  1030

IPC values are fairly consistent for MT6752, Hi6210 and MT6753 (when a likely clock speed of 1.1 GHz is assumed), but Snapdragon 615 consistently shows somewhat lower IPC, possibly related to the earlier revision (r0p1) of the Cortex-A53 core used. It is also possible that, similar to what seems to be the case for the MT6753 entry used (Meizu M2 note), the actual maximum CPU clock speed is lower than the one advertised and reported to Geekbench.

Multi-core performance scaling approaches 8.0 for the MediaTek chips, which can be expected due to the symmetrical CPU cluster configuration. Multi-core scaling for Kirin 620 is lower than expected for the integer tests, especially Lua, possibly due to L2 cache performance constraints.

Snapdragon 615, due to half the cores being clocked at a lower clock speed, shows a lower scaling factor, however the Lua scaling is particularly low, the benchmark score in fact often being worse than the single-core result, while being only modestly higher in other cases. This could be due to L2 cache constraints for one of the clusters and associated synchronisation issues in the multi-threading implementation used by the Geekbench test.

Looking at memory performance, MT6752 has the highest performance, closely followed by MT6753 and Hi6210. Qualcomm's Snapdragon 615 is well behind, probably due to the older/slower interconnect bus used.

MT6753 benchmark results suggests performance issue


Even though a clock speed of 1.30 GHz is reported to Geekbench by the operating system in the MT6753-equipped Meizu M2 Note, actual Geekbench subtest results are not consistent with a Cortex-A53 core running at that clock speed. There is variability in the results between different runs, which could be caused by thermal throttling. Many of the results seem to correspond to an effective clock speed of approximately 1.10 GHz, although for some runs the score of certain tests (including JPEG Compress) does approach the level expected for a clock speed of 1.3 GHz. Most of the time however, performance is significantly lower than expected, as if the clock speed is throttled to around 1.1 GHz for long periods of time.

The lower than expected performance could be related to the manufacturing process. The MT6753 was designed with cost-reduction in mind, and may use TSMC's 28LP process which has low cost but lower performance. Qualcomm's Snapdragon 410 and 615 also use this process, limiting their performance (and in the case of Snapdragon 615 resulting in heat production). MT6753 was announced as supporting a clock speed up to 1.5 GHz, and the lower-than-expected attainable clock speed may force MediaTek to adjust the specifications for the chip if the issue is not resolved.

Sources: Geekbench browser

Updated 6 June 2015.

Thursday, May 21, 2015

Battery performance based on Geekbench battery test results

A while ago, Primate Labs added a battery performance test to the Geekbench benchmark suite, which has been frequently used on this blog and elsewhere to analyze CPU processing peformance. The battery performance test gives the opportunity to better gauge the power efficiency of different CPU architectures, especially for the type of workload that the Geekbench battery test represents.

Battery test overview


The battery test is intended to be run starting from a fully loaded battery until the battery is completely run down. It appears to target a certain fixed level of CPU processing that is sustained throughout the test. In the test results, a duty cycle parameter is given for several time points, which more or less represents CPU utilization. Slower CPU cores (such as quad-core Cortex-A7-based SoCs) have a higher duty cycle percentage, while high-performance "big" cores such as Cortex-A57 and Krait-400 show a lower percentage.

In practice, most battery test results in the Geekbench database were terminated early in the benchmark process and do not give useful information. The test runs that completed a full run-down from 100% to close to 0% battery do give a usable indication of battery efficiency. The benchmark expresses battery performance as a number, similar to Geekbench CPU performance scores. This score is correlated with the duration and duty cycle using a certain formula, reflecting the amount of CPU work done and the battery running time. The score is heavily influenced by the actual capacity of the battery used in the device.

Overview of results for common SoCs


The following table shows Geekbench approximate battery test scores for common SoCs used in smartphone models for which a battery capacity specification is available. The table is ordered by SoC model name.


Device                    SoC              Score      Capacity  Duration    Score /
                                           (Range)    (mAh)     (hrs:min)   mAh

Apple iPhone 5S           Apple A7         1220-2090  1560      2:00-3:30   0.78-1.34
Apple iPhone 6            Apple A8         1550-2360  1810      2:35-4:00   0.86-1.30
Apple iPhone 6 Plus       Apple A8         2580-3250  2915      4:20-5:25   0.89-1.11
Meizu MX Pro              Exynos 5430      2080-2730  3350      7:45-10:10  0.62-0.81
Samsung Galaxy Alpha      Exynos 5430      1850-2710  1860      4:30-5:00   0.99-1.46
Samsung Galaxy Note 4     Exynos 5433      3190-3650  3220      5:20-6:00   0.99-1.13
Samsung Galaxy S6 Edge    Exynos 7420      4100-4600  2600      7:00-7:45   1.58-1.77
Huawei Honor 6            Kirin 920        1580-2080  3100      2:40-3:30   0.51-0.67
Huawei Mate 7 (MT7-L09)   Kirin 925        2470-2820  4100      4:05-4:20   0.60-0.69
Huawei P8 (GRA-L09)       Kirin 930        3270-4150  2680      5:30-7:00   1.22-1.55
Lenovo A5000              MT6582           3740       4000      14:00       0.94
Xiaomi Redmi Note         MT6592           2850-3560  3200      7:30-9:00   0.89-1.11
Huawei G750-U10           MT6592           2960-3430  3000      7:45-9:00   0.99-1.14
Meizu MX4                 MT6595           2540-2780  3100      6:20-6:55   0.82-0.90
Lenovo A7000-A            MT6752M          4550-4950  2900      8:16-8:50   1.57-1.71
Meizu M1 Note             MT6752           4900-6310  3140      8:10-10:30  1.56-2.01
HTC Desire 820s           MT6752           3580-3730  2600      6:15-6:30   1.38-1.43
HTC One E9+               MT6795           3370       2800      6:00        1.20
Moto G                    MSM8226 (SD400)  1600-2000  2070      6:00-7:30   0.77-0.97
Xiaomi Redmi 1S           MSM8226 (SD400T) 1485       2000      5:30        0.74
Lenovo A6000              MSM8916 (SD410)  2700       2300      6:50        1.17
HTC Desire 826            MSM8939 (SD615)  1800       2600      4:25        0.69
Xiaomi Mi 4i              MSM8939          2520-2810  3120      5:50-7:30   0.81-0.90
HTC One M8                MSM8974 (SD801)  2500-3300  2600      4:20-5:50   0.96-1.27
Xiaomi Mi 4               MSM8974          3150       3080      7:45        1.02
Samsung Galaxy Note 4     APQ8084 (SD805)  2500-3550  3220      4:10-6:15   0.78-1.10
LG G4                     MSM8992 (SD808)  2500-3260  3000      4:15-5:30   0.89-1.09
HTC One M9                MSM8994 (SD810)  1400-2580  2840      2:20-4:20   0.49-0.91

Devices with low processing power but long battery life may be penalized by having to power the screen and wireless connectivity for a longer period during the test.

The ratio of the battery score and the battery capacity (in mAh) gives a very rough indication of the efficiency of a particular CPU architecture, although the comparison may be skewed by several factors.

Results by SoC type


The previous generation of Cortex-A7-based SoCs such as Snapdragon 400 and MT6582 shows long running time due the effiency of the Cortex-A7 core, but the battery score appears to be affected by the limited CPU power. Snapdragon 410 does relatively well despite (or perhaps thanks to) being limited to ARMv7 mode.

SoCs with previous generation Cortex-A15 cores for performance in a big.LITTLE configuration, such as Kirin 920/925, show relatively low efficiency, as is to be expected given the relatively high power consumption Cortex-A15 is known for. Exynos 5430, which is manufactured on a relatively advanced 20 nm process, generally does better.

Octa-core mid-range: MediaTek does well


Among octa-core mid-range SoCs such as the Cortex-A53-based MT6752 and Qualcomm's Snapdragon 615 and MediaTek's previous-generation Cortex-A7-based MT6592, both the MT6752 and MT6592 make a strong showing, with MT6752 getting particularly high scores.

MT6752 has an optimized memory architecture with a 32-bit memory interface and is manufactured on TSMC's 28HPM process, which helps performance relative to Snapdragon 615. Although not tested by Geekbench, reports indicate that wireless standby power efficiency is not as great as the CPU efficiency for this SoC. It is possible that due to the CPU cores being optimized for relatively heavy CPU loads (not big.LITTLE so no cores optimized for low power consumption at low frequencies), which includes the Geekbench battery test, a low load scenario (such as reflected in standby time) produces less optimal power consumption.

Qualcomm's Snapdragon 615 (MSM8939) does relatively poorly, which can largely be explained by the assymmetric CPU configuration and lower-performance 28LP manufacturing process.

Performance segment SoCs


The poor performance of Snapdragon 810 (as illustrated by the HTC One M9) is apparent, with significant worse battery efficiency than the previous generation Snapdragon 801 and 805. Snapdragon 808, which uses a later revision Cortex-A57 core and is used inside the LG G4, does somewhat better.

Largely due to the relatively advanced manufacturing process (14 nm FinFET for Exynos 7420), Samsung's latest SoCs do well, particularly Exynos 7420 used inside the Galaxy S6. Even Samsung's previous generation Exynos 5433 appears to be well ahead of Snapdragon 810 in terms of efficiency.

A limited number of results is available for two Cortex-A53-based performance SoCs (characterized by a wide memory interface and more powerful GPU than mid-range solutions), MediaTek's MT6795 (Helio-X10) and HiSilicon's Kirin 930. Kirin 930 shows relatively good efficiency in this benchmark, possibly ahead of MediaTek's MT6795. Kirin 930 has a two-level hierarchy in which one cluster of Cortex-A53 cores is optimized for a higher and the other for a lower frequency, while in MT6795 all cores can reach the maximum frequency.

Source: Geekbench Browser (Battery search)

Updated 28 May 2015.

Thursday, April 9, 2015

Cortex-A53 based SoCs: MT6735 shows up, power efficiency of MT6752 in question

More and more devices with Cortex-A53-based SoCs, mainly targeting the entry-level and mid-range segments, are coming into the market. Qualcomm's original Snapdragon 410 (MSM8916) has already shipped in large volume, and devices using Qualcomm's Snapdragon 615 (MSM8939), as well as MediaTek's MT6732 and MT6752, have also ramped up. Meanwhile, Huawei is introducing devices using its in-house HiSilicon Kirin 620 SoC.

In the Geekbench database, results for new SoCs that are not yet shipping in end products are showing up, including MediaTek's delayed performance-oriented MT6795 (Helio-X) and the appearance of a result for the MT6735, MediaTek's new offering for the cost-sensitive segment.

In this post, I will be examining updated benchmark results for these SoCs, as well as taking a look at battery life benchmarks. Power efficiency of Cortex-A53-based products does not appear to be as good as hoped, with significant variability present (for MT6752-based devices, for example).

Snapdragon 410 smartphone platform appears to be slightly updated


Qualcomm's Snapdragon 410 (MSM8916) smartphone platform, which has performance flaws probably associated with the use of an early-revision Cortex-A53 core, seems to have been slightly updated in some recent models and reference designs, with a minor performance improvement due to a slightly higher clock speed (1.21 GHz vs 1.19 GHz) and what appears to be somewhat improved memory performance, while still being limited to 32-bit ARMv7 mode.

This improvement could be the result of a new revision of the SoC with a few hardware tweaks and an associated reference design, although it does not appear to be a radical redesign that would, for example, upgrade the Cortex-A53 core to allow use of the ARMv8 instruction set. Qualcomm's modem-less stand-alone version of Snapdragon 410, APQ8016, does appear to be a new design that does not have the restrictions of the smartphone SoC and can run in full 64-bit mode (it targets development boards and tablets).

MediaTek's MT6735 shows up in Geekbench


A single result for MediaTek's MT6735  SoC has appeared in the Geekbench database. The MT6735 is MediaTek's much-needed offering for the entry-level market with integrated LTE modem with world-mode support. It has been described as a cost-down version of the MT6732, which is a quad-core Cortex-A53-based SoC with a Mali-760 MP2 GPU. The MT6735 downgrades the GPU to a Mali-720 (probably Mali-720 MP4) which appear to be associated with lower manufacturing cost.

The MT6735 has an upgraded r0p3 revision of the Cortex-A53 core which, according to Linux kernel commits by ARM, fixes a few hardware errata which might improve performance and efficiency over previous revisions. The Geekbench entry shows the MT6735 running at a maximum clock speed of 1.3 GHz, which is lower than the 1.5 GHz of the MT6732. This could be due to the use of the cheaper 28LP process at TSMC, instead of the higher-performance 28HPM.

Notably, the device is running in full AArch64 mode, which has pros and cons for performance, but is unusual for a cost-sensitive platform because those platforms are usually sensitive to the higher demands on the memory subsystem from the increased addressing size and addressing space in AArch64 mode. Those platforms until recently only used AArch32, the 32-bit variant of the ARMv8 instruction set. The use of AArch64 makes comparisons a little difficult because it affects different benchmarks (including different Geekbench subtests) in different ways. The Android version (5.0) is also different from most existing entries for comparable SoCs, which use Android 4.4.4.

MT6752's power efficiency average, with high variability


According to most reviews that have appeared for MT6752-based devices such as the Meizu M1 Note and other devices, power-efficiency and battery life is generally average, with significant variability between devices. The Cortex-A53 core, although delivering higher performance, clearly seems to be associated with reduced power efficiency as compared with Cortex-A7 in SoC such as MediaTek's MT6582 and Qualcomm's Snapdragon 400, which generally have excellent battery life.

The variability in MT6752 performance could reflect variable performance yields in the manufacturing process, with some chips performing better (with lower voltage and power at a given frequency) than others. Frequently, chips are separated into speed bins and lower-performing ones may be sold as a cost-reduced variant running at a lower maximum clock speed. Indeed, a review of the Acer Liquid Jade S containing the MT6752M, which is likely from the poorest-performing speed bin of the MT6752, reports relatively poor battery life and some heat production. This suggests the variability may be quite large.

Update (21 May 2015): Recent information suggests that CPU power efficiency for this SoC is relatively high when CPU power is demanded, but standby efficiency (including wireless network standby) may be less impressive.

Overview of Geekbench results for Cortex-A53-based SoCs


The following tables show Geekbench results for a recent, representative entry for each Cortex-A53-based SoC. The first table below gives an overview of the devices, with SoC, CPU configuration, device model, Geekbench reference number, Android version and the instruction set architecture tested.

SoC                       CPU configuration                  Device               Geekbench Android Arch
                                                                                  reference version
Snapdragon 410 (MSM8916)  4 x 1.19 GHz Cortex-A53r0p0        Samsung SM-G360F     2275416  4.4.4   ARMv7
Snapdragon 410 (MSM8916)  4 x 1.21 GHz Cortex-A53r0p0        Xiaomi 2014817       2181099  4.4.4   ARMv7
Snapdragon 410 (MSM8916)  4 x 1.21 GHz Cortex-A53r0p0        Motorola Moto-E2     2275732  5.0.2   ARMv7
Snapdragon 615 (MSM8939)  4/4 x 1.50/1.0 GHz Cortex-A53r0p1  Samsung SM-A700FD    2274606  4.4.4   AArch32
MT6732                    4 x 1.50 GHz Cortex-A53r0p2        Elephone P6000 O2    2265175  4.4.4   AArch32
MT6735                    4 x 1.30 GHz Cortex-A53r0p3        "bq DENDE"           2268728  5.0     AArch64
MT6752                    8 x 1.69 GHz Cortex-A53r0p2        Lenovo P70-A         2276814  4.4.4   AArch32
MT8752                    8 x 1.69 GHz Cortex-A53r0p2        CUBE T7 (tablet)     2078854  4.4.4   AArch32
MT6795                    8 x 1.95 GHz Cortex-A53r0p2        Alps k6795v1_64_op01 2076054  5.0     AArch64
MT6795T                   8 x 2.16 GHz Cortex-A53r0p2        Unknown              2188071  5.0     AArch64
Kirin 620 (Hi6210)        8 x 1.20 GHz Cortex-A53r0p3        HUAWEI Che2-L11      2269931  4.4.2   AArch32
The Geekbench version used in the entries is 3.3.2 or 3.3.1.

Snapdragon 410-based devices are still limited to ARMv7 compatibility mode. Unusually for a cost-sensitive platform, the MT6735 test device uses AArch64 mode instead of AArch32 mode. Both the MT6735 and HiSilicon's Kirin 620 use a more recent version of the Cortex-A53 core, revision r0p3.

Integer subtest results


The following table shows results for integer subtests from Geekbench.

           CPU          JPEG Compress            Dijkstra                 Lua
                        Single IPC   Multi Par.  Single IPC   Multi Par.  Single IPC   Multi Par.
MSM8916    4 x 1.19      591   1.29  2379  4.03   816   1.09  2122  2.60   614   1.26  2229  3.63
MSM8916    4 x 1.21      602   1.29  2416  4.01   830   1.09  2182  2.63   632   1.27  2267  3.59
MSM8916    4 x 1.21      599   1.29  2404  4.01   739   0.97  2159  2.92   592   1.19  2168  3.66
MSM8939    4 x 1.50 + 4  832   1.44  4962  5.96   942   1.00  3469  3.68   744   1.21  2360  3.17
MT6732     4 x 1.50      842   1.46  3357  3.99  1035   1.10  3049  2.94   740   1.20  3049  4.12
MT6735     4 x 1.30      650   1.30  2563  3.94   712   0.87  1856  2.61   642   1.20  1902  2.96
MT6752     8 x 1.69      954   1.47  5810  6.09  1153   1.08  4817  4.18   850   1.22  2244  2.64
MT8752     8 x 1.69      952   1.46  7527  7.91  1200   1.13  4168  3.47   829   1.19  2294  2.77
MT6795     8 x 1.95     1026   1.37  8071  7.87   992   0.81  3886  3.92  1051   1.31  8075  7.68
MT6795T    8 x 2.16     1128   1.36  8991  7.97  1054   0.78  4159  3.95  1112   1.25  4159  3.74
AArch64 mode as used for the MT6735 and MT6795/MT6795T results has a significant influence, with the IPC (throughout per CPU cycle) for the JPEG Compress and Dijkstra tests being reduced when compared to AArch32 mode, while the IPC of the Lua test appears to be better in AArch64 mode, at least for the MT6795.

The MT6735 scores lower than the MT6732 in the Lua subtest, especially multi-core, even when correcting for the lower clock speed, which is probably the result of a smaller or slower L2 CPU cache inside the MT6735, which is targeted at the entry-level segment. The Dijkstra results are also lower, but that is probably mainly due to the use of AArch64 mode, which imposes a significant penalty on the results of this test.

Finally, while earlier results for the MT6795 showed very impressive Lua multi-core throughout, the result for the recent MT6795T entry is significantly lower (although still respectable). This is possibly due to a smaller L2 cache size in the latest revision of the MT6795T, although other reasons cannot be ruled out.

Memory and floating point subtest results



           CPU           Stream Copy  SGEMM        SFFT         Mandelbrot
                         Single Multi Single Multi Single Multi Single IPC   Multi
MSM8916    4 x 1.19      551    655    258   536   316    1264    450  1.11  1796
MSM8916    4 x 1.21      505    615    267   515   322    1292    456  1.11  1819
MSM8916    4 x 1.21      424    518    247   517   320    1277    451  1.09  1810
MSM8939    4 x 1.50 + 4  581    651    255   678   425    2510    583  1.14  3442
MT6732     4 x 1.50     1000   1187    343   697   430    1728    586  1.15  2329
MT6735     4 x 1.30      944   1034    322   636   403    1574    526  1.19  2102
MT6752     8 x 1.69     1007   1115    375  1123   485    3894    662  1.15  5279
MT8752     8 x 1.69      891   1045    387  1162   486    3902    662  1.15  5280
MT6795     8 x 1.95     1296   2070    484  1536   629    5021    824  1.24  6350
MT6795T    8 x 2.16     1380   2129    543  1847   687    5565    912  1.24  7171
Hi6210     8 x 1.20      575    996    262   819   343    2098    468  1.14  2842
The results show the memory performance advantage of MediaTek's Cortex-A53-based SoCs remains, scoring significantly higher than Qualcomm's existing SoCs, probably due to the use of a faster internal interconnect bus.

The first entry for Snapdragon 410 (MSM8916) running at 1.19 GHz is a Samsung SM-G360F, which appears to use relatively high-clocked memory, increasing memory performance over standard configurations (not listed). The two devices with a 1.21 GHz configuration have different memory performance, with the Moto G2 4G scoring lower than the Xiaomi device, probably due to the use of slower RAM. An impact from the use of Android 5 on the Moto G2 cannot be ruled out.

Sources: Geekbench browser, GSMArena (Acer Liquid Jade S review)

Updated 16 April 2015.

Tuesday, March 10, 2015

Qualcomm's Snapdragon 808 fixes flaws of Snapdragon 810

Snapdragon 808 (MSM8992) is a performance-oriented SoC that Qualcomm announced last year together with Snapdragon 810. It has similarities to Snapdragon 810 (MSM8994), including the use of ARM Cortex-A57 CPU cores and Cortex-A53 cores in a big.LITTLE configuration. Snapdragon 808 appears to fix some of the performance flaws that are apparent in Snapdragon 810, especially the memory subsystem, while being significantly less costly.

Snapdragon 808 features


Features and differences with Snapdragon 810 include:

  • Snapdragon 808 has only two Cortex-A57 cores (revision r1p2) compared to four Cortex-A57 cores (revision rp1p1) for Snapdragon 810. Both contain four Cortex-A53 cores.
  • Snapdagon 808 has a more economical dual-channel LPDDR3 memory interface, compared to the LPDDR4 interface of Snapdragon 810.
  • Snapdragon 808 has an Adreno 418 GPU, compared to Adreno 420 in Snapdragon 810, presumably with somewhat lower performance.
  • Manufactured on TSMC's 20 nm process, the same as Snapdragon 810.
  • 4K resolution video playback (H.264/H.265), on-device display resolution up to 2560x1600 (Snapdragon 810 theoretically supports 4K on-device display resolution, but all currently announced smartphones using Snapdragon 810 are limited to a resolution of 1920x1080).

 

Early benchmark results suggest Snapdragon 808 fixes performance flaws of Snapdragon 810


Early benchmarks for Snapdragon 808 have already appeared on the Geekbench Browser. We can compare Snapdragon 808's single-core performance with Snapdragon 810 and Exynos 7420, all of which run in AArch64 mode in the published benchmark results.

To reduce the impact of thermal throttling, the best Geekbench subtest results for a given device have been collected and combined in the table below. I have made an attempt to estimate the actual maximum clock speed of the Cortex-A57 cores during the benchmarks, partly based on the maximum frequency reported by Geekbench when it appears to apply to the "big" cores and not the "LITTLE" cores.

SoC          "big" CPU                    Arch     JPEG (int)  Lua (int)   Mandelb. (float)
                                                   Comp. IPC         IPC         IPC

MSM8992      2 x 1.69? GHz Cortex-A57r1p2 AArch64  1257  1.96  1385  1.99  1031  1.79
MSM8994      4 x 1.8? GHz Cortex-A57r1p1  AArch64  1358  1.96  1283  1.73  1100  1.79
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  1486  1.96  1409  1.74  1198  1.78

MT6795       8 x 1.95 GHz Cortex-A53r0p2  AArch64  1026  1.37  1053  1.31   823  1.24
MT6795T      8 x 2.16 GHz Cortex-A53r0p2  AArch64  1128  1.36  1173  1.32   912  1.24

The IPC figures are calibrated on the Cortex-A7 core, whose IPC is fixed at 1.00. Fixing the maximum cock speed to 1.8 GHz for the MSM8994 (Snapdragon 810) results (based on HTC One M9 entries) and at 1.69 GHz for the MSM8992 (Snapdragon 808) produces similar IPC figures for the JPEG Compress integer test and the Mandelbrot floating point test, making them reasonably plausible. The best Lua subtest result for the MSM8992 shows a higher IPC, which may reflect improved L2 cache performance in the MSM8992, which uses a later revision of the Cortex-A57 core.

The single-core CPU performance results show no suprises, with Snapdragon 808 showing good performance that is slightly lower than Snapdragon 810, proportional to the lower maximum clock frequency in the tested devices. However, the Lua test shows higher performance with Snapdragon 808, which is especially true for the multi-core test (results not shown), where Snapdragon 810 seems to be limited to a score of about 1200 with little gain when compared to single-core performance, while Snapdragon 808 consistently scores in the region of 4000.

Memory subsystem performs much better than Snapdragon 810


The following table lists Geekbench scores for some memory-dependent tests. 

SoC          "big" CPU                    Arch     Stream Copy  SGEMM SFFT  SGEMM SFFT
                                                   Single Multi             Multi Multi
MSM8992      2 x 1.69? GHz Cortex-A57r1p2 AArch64  1527   1733   767  1126  1678  2946
MSM8994      4 x 1.8? GHz Cortex-A57r1p1  AArch64  1428   1838   741  1009  1870  3649
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  2003   2622   957  1363  2888  5014

MT6795       8 x 1.95 GHz Cortex-A53r0p2  AArch64  1356   2068   484   618  1542  4764
MT6795T      8 x 2.16 GHz Cortex-A53r0p2  AArch64  1350   2140   529   694  1659  5333

Notably, Snapdragon 808 delivers memory performance similar to Snapdragon 810 at much lower cost, despite using only a regular LPDDR3 memory interface, as compared to the Snapdragon 810's LPDDR4 memory interface which in theory delivers almost twice the bandwidth. This provides clear evidence that the Snapdragon 810's memory interface is still flawed, while that of Snapdragon 808 is much more optimized. Snapdragon 808 even beats Snapdragon 810 in the single-core SGEMM and SFFT test, despite running at a lower clock speed, which probably also reflects a more optimized and functional memory controller. Even in the multi-core SGEMM and SFFT tests, Snapdragon 808 is not much behind Snapdragon 810 despite having only half the number of CPU cores.

Comparison with MT6795


In the marketplace, Snapdragon 808 may compete with MediaTek's MT6795 (Helios X10), which is a cost-effective performance-segment SoC that only uses Cortex-A53 cores. Comparing Geekbench subtest results, MT6795 scores signficantly lower than Cortex-A57-based SoCs such as Snapdragon 808 in single-core benchmarks, although the gap is not very large except in the SFFT benchmark. The MT6795 does relatively well in multi-core benchmarks, where it beats the Cortex-A57-based Snapdragon 808 and Snapdragon 810 in most cases by a considerable margin, especially in the JPEG Compress, Lua and Mandelbrot tests which are sensitive to the number of CPU cores (multi-core scores have not been listed for these tests in the tables above). As an example, MT6795 scores 8167 in the multi-core JPEG Compress test, twice the score of Snapdragon 808 and almost 40% higher than Snapdragon 810.

Conclusion


Snapdragon 808 appears to be a much more optimized, less flawed SoC product than Snapdragon 810 that may perform similarly or even better than Snapdragon 810 in practical use cases due to the performance flaws present in Snapdragon 810. At the same time, Snapdragon 808 is likely be considerably cheaper. The only caveat is the question of whether excessive heat production makes thermal throttling necessary to the same degree as Snapdragon 810. With only two Cortex-A57 cores, the SoC should be less problematic in this regard.

Source: Geekbench Browser (MSM8992 results), Geekbench Browser (MSM8994 results), Qualcomm (MSM8992 specifications)

Updated 15 March 2015.

Early benchmarks appear for Cortex-A72-based SoC

ARM recently announced the new Cortex-A72 processor core, which is an improved version of the existing high-performance Cortex-A57 processor core.

Alongside the Cortex-A72 CPU core, ARM also announced the CCI-500 interconnect technology as well as the high-end Mali-T880 GPU. Devices incorporating the combination of these technologies are expected to become available in 2016.

However, SoCs using the Cortex-A72 CPU are likely to become available earlier. Qualcomm and MediaTek have both announced SoCs using the Cortex-A72 core with commercial availability in the second half of 2015, suggesting that the CPU core itself is at an advanced stage of introduction. Already, early benchmarks for MediaTek's MT8173 tablet SoC that incorporates the Cortex-A72 have become available.

Cortex-A72 appears to be enhanced version Cortex-A57 optimized for next-generation processes


In its announcement press release from 3 February 2015, ARM claims that more than ten partners have already licensed Cortex-A72, including HiSilicon, MediaTek and Rockchip. Cortex-A72 is based on ARM's ARMv8-A instruction set architecture, and can be combined with the existing Cortex-A53 in a big.LITTLE configuration. Cortex-A72 seems to be positioned as a replacement for Cortex-A57. The similarities with Cortex-A57 are very apparent, for example in the identically sized L1 instruction and data caches, and a feature set that is otherwise very similar.

On a 16 nm FinFET process, the core can sustain operation at speeds up to 2.5 GHz within the constraints of a mobile power envelope (e.g. smartphones), with scalability to higher speeds for larger form-factor devices. However, the first announced devices, such as MediaTek's MT8173, appear to use older processes such as the tried-and-trusted 28 nm HPM process at TSMC, so they are likely to have a lower maximum clock speed.

ARM claims increased performance and power efficiency, although these claims seem to be based on implementation on next-generation processes such as 16 nm FinFET that deliver a significant intrinsic improvement in these metrics. ARM mentions micro-architectural improvements that result in enhancements in floating point, integer and memory performance. When implemented on a 16 nm FinFET process, ARM expects Cortex-A57 to provide 85% higher performance when compared to the Cortex-A57 core on a 20 nm process within a similar smartphone power budget.

Overall, the differences with Cortex-A57 appear to be relatively minor, so that Cortex-A72 is best viewed as an enhanced version of Cortex-A57 that is optimized for next-generation processes such as 16 nm FinFET. Nevertheless, the first SoCs to use the Cortex-A72 core will be manufactured using a less advanced process.

Benchmarks appear for MediaTek's MT8173


MediaTek's MT8173 is a mid-range tablet processor mainly targeting Wi-Fi-only tablets, since it does not have an integrated modem. It has two Cortex-A72 cores and two Cortex-A53 cores in a big.LITTLE configuration. Probably manufactured using the established 28HPM process at TSMC, the maximum clock speed of the Cortex-A57 cores is likely to be lower that the target for 16 nm FinFET, although MediaTek claims a clock speed up to 2.4 GHz, while a much lower frequency is apparent in early benchmarks results.

The chip also features a PowerVR GX6250 GPU, which delivers higher performance than the G6200 GPU used inside MediaTek's existing MT8135 and MT6795.

Recently, early benchmarks for a MT8173 development board have appeared both in the Geekbench Browser and in the results database of GFXBench. The first Geekbench results already appeared in December 2014. The latest set of Geekbench results date from the end of February 2015, although they do show a certain amount variation that may reflect thermal throttling.

Single-core performance good, but not spectacular


As expected, the Geekbench results show good single-core performance, albeit not spectacular. As shown in the following table, singe-core performance is in line with Cortex-A57-based SoCs such as Exynos 5433 and Exynos 7420. It should be noted that the MT8173 test SoC is most likely manufactured at 28 nm with a corresponding relatively low maximum CPU clock speed, while Exynos 5433 and 7420 are manufactured using smaller leading edge processes at Samsung.


SoC          "big" CPU                    Arch     JPEG (int)  Lua (int)   Mandelb. (fp)
                                                   Comp. IPC         IPC         IPC
MT8173       2 x 1.6? GHz Cortex-A72      AArch32  1310  2.13  1380  2.10  1064  1.95
Exynos 5433  4 x 1.80 GHz Cortex-A57r1p0  AArch32  1456  2.10  1397  1.89  1174  1.91
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  1481  1.97  1409  1.74  1198  1.92

In this table, to determine the IPC index I have made an educated guess about the actual clock speed of MT8173 when running the benchmarks. Geekbench reports a 1.40 GHz clock speed (which probably applies to the Cortex-A53 cores), 1.6 GHz seems to be a good match, providing just a little better IPC than Cortex-A57. Note that Exynos 7420 runs in AArch64 mode, which skews direct IPC comparisons.

Practical implications unclear


Without knowing the exact clock speed of the Cortex-A72 cores, it is hard to draw conclusions about the actual IPC improvement over Cortex-A57. If the MT8173 uses a 28 nm process, the ability to approach the single-core performance of Samsung's Exynos 7420 manufactured using 14 nm FinFET process is impressive. However, although MediaTek demonstrated the MT8173 in an actual tablet at MWC, it is unclear what kind of device the Alps development board in the benchmark entries actually represents, so it remains to be seen whether the benchmarks actually reflect the power budget of a tablet.

The multi-core performance reported is not very impressive, as expected because of the relatively small number of CPU cores. The JPEG Compress multi-core score shows CPU scaling factor of 2.72, which is good and implies utilization of the Cortex-A53 cores. The Mandelbrot floating point benchmark shows similar scaling.

However, the Lua integer benchmark has a very low multi-core scaling factor of 1.41, which is lower than expected, even when allowing for the limited number of cores. For example, MediaTek's MT6795 achieves multi-core scaling of 7.5 in this benchmark, and the Exynos chips range from 3.9 to 5.0. Other chips with a low multi-core scaling factor for Geekbench's Lua subtest include Snapdragon 810 (Cortex-A57-based), MediaTek's MT6595 (Cortex-A17-based) and NVIDIA's Denver-based Tegra-K1 SoC. There are indications that this benchmark test heavily depends on on-chip cache (primarily L2 cache) size and speed.

GPU performance of MT8173's PowerVR GX6250 GPU improves on G6200


The MT8173 test device's GPU performance as shown in GFXBench results database is not overly impressive, but suitable for a mid-range chip and an improvement over the PowerVR G6200 GPU used in other MediaTek SoCs such as MT6595 and MT6795. In the T-Rex Offscreen benchmark, the MT8173 registers a score of 1487, higher than the 1311 of the MT6595 (G6200)-equipped Meizu MX4. In the GFXBench 3.0 low-level tests, alpha blending scores higher than the MT6595 while the other low-level scores are comparable.

Sources: ARM (Cortex-A57 announcement press release), AnandTech (MediaTek MT8173 article), MediaTek (MT8173 announcement), Geekbench Browser (MT8173 test device results), GFXBench (MT8173 test device result)

Updated 10 March 2015.

Thursday, March 5, 2015

A deeper look at graphics benchmark results, including GFXBench 3.1 and Basemark X

In this post I will take a closer at graphics benchmark results for different SoCs. I will look beyond just GFXBench (for which a new version has appeared), because the workload tested by well-known GFXBench tests such as T-Rex and Manhattan is not necessarily reflective of the actual gaming experience. Alternative benchmarks exist, such as Basemark X which uses the Unity engine that is commonly used in games.

GFXBench 3.1 released for OpenGL ES 3.1, Snapdragon 805 does well


Kishonti recently released a new version of GFXBench, GFXBench 3.1 for OpenGL ES 3.1, that includes tests for the OpenGL ES 3.1 API standard supported by many recent devices. A few results from the new benchmark tests are already available, with the Adreno 420 GPU inside Snapdragon 805 closing most of the performance gap with the Mali-T760 MP6/MP8 in Samsung's Exynos SoCs in the Manhattan 3.1 test.

                                                      Offscreen Manhattan Manhattan
Device               SoC             GPU              T-Rex        3.0       3.1

NVIDIA Shield Tablet NVIDIA K1-32    Tegra K1 GPU        3692     1979      1443  
HTC One M9           Snapdragon 810  Adreno 430          2732     1413
Galaxy S6 Edge       Exynos 7420     Mali-T760 MP8?      3312     1607       793
Sams. Galaxy Note 4  Snapdragon 805  Adreno 420          2386     1153       773
Samsung Galaxy S6    Exynos 7420     Mali-T760 MP8?      3314     1609       634
Sams. Galaxy Note 4  Exynos 5433     Mali-T760 MP6       2163     1110       436
HTC One M8           Snapdragon 801  Adreno 330          1608      768
Teclast X98 Air      Atom Z3736F     Intel HD            1014      564       307
Google Nexus 10      Exynos 5250     Mali-T604 MP4        818      351       185

NVIDIA's Tegra 32-bit version of Tegra K1 leads (the 64-bit Denver-based version of Tegra K1, and Tegra X1, have not yet been tested). Performance of Snapdragon 805 as implemented in certain models of the Samsung Galaxy Note 4 holds up better in the Manhattan 3.1 test than Samsung's Exynos SoCs with Mali-T760 MP6/MP8. Whereas Exynos 7420 (used in the Galaxy S6) has a clear advantage in existing benchmarks (1609 vs 1153 for Manhattan and 3314 vs 2386 for T-Rex), it loses that advantage in the new Manhattan 3.1 test (although the Galaxy S6 Edge benchmarks result suggests it is still slightly superior). Intel's Baytrail SoCs seem to hold up relatively well looking at the result for an Atom Z3736F-based tablet, albeit at a lower performance level.

GFXBench 3.1 results for Snapdragon 801 and the new Snapdragon 810 are not yet available. However, given the fact that GFXBench appears to generally do well on Snapdragon SoCs, they can be expected to score fairly highly. I'll say more about the apparent advantage for Qualcomm's SoC in GFXBench in the final section of this article.

Basemark X is a useful alternative to GFXBench


Basemark X is a gaming benchmark that utilizes the Unity engine that is commonly used in games, and developer Rightware claims that it actually reflects practical performance in games. Although it does include an on-screen demo, the actual benchmark scores appear to be derived from off-screen rendering at a fixed resolution, so that benchmark results can be compared objectively between different devices.

Previous generation SoCs: MT6582 beats Snapdragon 400 in Basemark X


Taking a look at previous-generation cost-sensitive SoCs, while MediaTek's ubiquitous quad-core 3G SoC MT6582 (which supports Open GL ES 2.0 only, through its Mali-400 MP2 GPU) scores lower than Snapdragon 400 in GFXBench's OpenGL ES 2.0-based T-Rex test (about 230 vs 330), in Basemark X MT6582-based devices score higher than Snapdragon 400 based devices. This is despite the fact that Snapdragon was/is often employed in devices with a considerably higher selling price than MT6582-based devices.

Device               SoC             GPU                 Display*   Medium   High

Samsung SM-G800F     Exynos 3470     Mali-400 MP4        1280x720    7527    2712
Vodafone 985N        MT6582          Mali-400 MP2         960x540    4950    1717
Acer E53             MT6582          Mali-400 MP2        1280x720    4870    1694
Wiko Rainbow         MT6582          Mali-400 MP2        1280x720    4826
Galaxy S3 Neo        Snapdragon 400T Adreno 305          1280x720    4540    1551
Moto G (XT1032)      Snapdragon 400  Adreno 305          1280x720    4440
HTC Desire 816d      Snapdragon 400T Adreno 405          1280x720    4354    1441
Samsung SM-A500F     Snapdragon 410  Adreno 306          1280x720    4132    1900
Samsung SM-A300F     Snapdragon 410  Adreno 306           960x540    4076    1892
Samsung SM-G530H     Snapdragon 410  Adreno 306           960x540    3987    1690
Samsung SM-G800A     Snapdragon 400  Adreno 305          1280x720    3946    1362
HTC Desire 820q      Snapdragon 410  Adreno 306          1280x720    3786

* While Basemark X is independent of display resolution in terms of rendering, the
memory bandwidth used for screen refresh has some impact, giving lower-resolution
devices a small advantage.
Notes: Samsung SM-G800F is the Galaxy S5 Mini (Exynos version), while SM-G800A is a Snapdragon 400 running at the non-standard maximum clock speed of 1.4 GHz; Vodafone 985N is the Vodafone Smart 4 Power; Acer E53 is the Acer Liquid E700; Galaxy S3 Neo runs the Snapdragon 400 SoC at a non-standard maximum speed of 1.4 GHz; HTC Desire 816d runs the Snapdragon 400 SoC at 1.6 GHz; SM-A500F is the Galaxy A5, while SM-A300F is the Galaxy A3; SM-G530H is the Galaxy Grand Prime.

For both the medium detail and high detail settings, MT6582-based devices consistently score higher in Basemark X than Snapdragon 400 and also Snapdragon 410-based devices for the medium detail test, which gives a different picture than the one you get from just looking at GFXBench's T-Rex benchmark

Snapdragon 410 performs worse than Snapdragon 400 in Basemark X medium-detail


Also notable is that Snapdragon 410, which is the successor of the Snapdragon 400 and would normally be expected to improve performance, actually has lower performance in practice as judged by the Basemark X medium detail benchmark. This matches earlier findings of performance flaws in Snapdragon 410. When running the high detail Basemark X benchmark, Snapdragon 410 does better and beats Snapdragon 400.

Mid-range SoCs: Snapdragon 615 and MT6752 closely matched


When running GFXBench, Snapdragon 615 and MT6752 are closely matched, with Snapdragon 615 scoring about 830 to 850 in T-Rex while MT6752 scores just above 870. For T-Rex, devices using MediaTek's prior-generation octa-core MT6592 score in the range 650 to 750. In the OpenGL ES 3.0 API-based Manhattan benchmark, Snapdragon 615 and MT6752 are very closely matched, both scoring around 360. We will also take a look at Basemark X results.

The following table shows Basemark X results for the new competing mid-range SoCs Snapdragon 615, MT6752 and HiSilicon's octa-core Hi6210 (Kirin 620), as well as for the prior-generation octa-core MT6592 from MediaTek.

Device               SoC             GPU                 Display*   Medium   High

Lenovo P70-A         MT6752          Mali-T760 MP2       1280x720   11311 
Meizu M1 Note        MT6752          Mali-T760 MP2       1920x1080  11168    4636
HTC Desire 816G      MT6592          Mali-450 MP4        1280x720   10984
Huawei CHE2-TL00     Hi6210          Mali-450 MP4        1280x720   10546    3439
Oppo R8106           Snapdragon 615  Adreno 405          1920x1080  10277    4846 
HTC Desire 820       Snapdragon 615  Adreno 405          1280x720   10133    4814
Samsung SM-A700FD    Snapdragon 615  Adreno 405          1920x1080  10052    4757
Archos 50C Oxygen    MT6592          Mali-450 MP4        1280x720    9867    3702
HTC Desire 616d      MT6592M         Mali-450 MP4        1280x720    7976    3045

* While Basemark X is independent of display resolution in terms of rendering, the
memory bandwidth used for screen refresh has some impact, giving lower-resolution
devices a small advantage.
Notes: SM-A700FD is the Galaxy A7; Huawei CHE2-TL00 is a new version of the Honor 4X.

When running the standard medium-detail version of Basemark X, MediaTek's MT6752 has  a moderate advantange over Snapdragon 615, while at the high detail setting Snapdragon 615 has a small advantage. Huawei's Kirin 620 performs adequately and just ahead of Snapdragon 615 in the medium detail setting.

MediaTek's prior-generation octa-core MT6592 with Mali-450 MP4 GPU keeps up relatively well in Basemark X,  with certain models (e.g. HTC Desire 816G) actually beating Snapdragon 615 in the medium detail setting.

Performance-oriented SoCs with Basemark X


The following table shows Basemark X results for several performance-oriented mobile SoCs.

Device               SoC             GPU                 Display*   Medium   High

Samsung Galaxy S6    Exynos 7420     Mali-T760 MP6       2560x1440  36017
Galaxy S5 LTE-A      Snapdragon 805  Adreno 420          1920x1080  32685   18334
Google Nexus 6       Snapdragon 805  Adreno 420          2560x1440  30362   20265
Sams. Galaxy Note 4  Snapdragon 805  Adreno 420          2560x1440  31963   21152
Sams. Galaxy Note 4  Exynos 5433     Mali-T760 MP6       2560x1440  29335   19019 

Apple iPad Air 2     Apple A8X       PowerVR Series 6    2048x1536  41700   29239
Google Nexus 9       NVIDIA K1-64    Tegra-K1 GPU        2048x1536  37939   28646
Apple iPad Mini 3    Apple A7        PowerVR Series 6    2048x1536  26499   14780
Teclast X98 Air      Atom Z3736F     Intel HD            2048x1536  14825    7160
Teclast P90HD        Rockchip RK3288 Mali-T764           2048x1536  13053    5645
Onda V989 Core8      Allwinner A80   PowerVR G6230       2048x1536  11004    5724

Meizu MX4 Pro        Exynos 5430     Mali-T628 MP6       1920x1200  25547   12674
Samsung SM-G900A     Snapdragon 801  Adreno 330          1920x1080  25178   11930
Samsung SM-G850F     Exynos 5430     Mali-T628 MP6       1280x720   21872   10666
Meizu MX4            MT6595          PowerVR G6200       1920x1200  17038    7817
Huawei MT7-TL10      Kirin 925       Mali-T624 MP4       1920x1080  15973    6802

* While Basemark X is independent of display resolution in terms of rendering, the
memory bandwidth used for screen refresh has some impact, giving lower-resolution
devices a small advantage.
Notes: SM-G900A is the Samsung Galaxy S5 (US version), Huawei MT7-TL10 is the Huawei Mate 7.

Looking at the ultra-high-end smartphone segment (mostly with a display resolution of 2560x1440), Exynos 7420 provides superior performance in Basemark X. Snapdragon 805 follows, a small distance ahead of Exynos 5433 as used in the Samsung Galaxy Note 4.

In the high-end tablet segment, Apple's iPad Air 2 with the Apple A8X leads, but the Nexus 9 with NVIDIA's Tegra K1 (64-bit version) comes fairly close. Apple's prior generation SoCs also delivers good performance, while Intel's current Baytrail SoC for the tablet market outperforms two high-end chips from established Chinese players in the tablet SoC market, Rockchip's RK3288 and Allwinner A80 Octa.

Mainwhile, in the mainstream performance smartphone segment, Snapdragon 801 (in the past the performance leader in the market) still provides good performance, but is actually just beaten by the 32-bit Exynos 5430 in the Meizu MX4 Pro. The chip is also used in the Galaxy Alpha (for which it provides higher-than-necessary performance given its relatively low screen resolution), while the performance of MediaTek's MT6595 SoC, while not bad, falls short of most other high-end solutions. HiSilicon's Kirin 925 as implemented in the Huawei Mate 7 is just behind.

Conclusion


It appears that just concentrating on GFXBench may give a misleading picture with regard to 3D graphics performance of mobile SoCs. In particular it is apparent that Qualcomm's Snapdragon SoCs consistently do better in GFXBench than in other benchmarks such as Basemark X. This is particularly true for the lower-end Snapdragon 400 and higher-end Snapdragon 800 series; for Snapdragon 615, results are more consistent across different benchmarks.

Basemark X, which utilizes the Unity game engine commonly used in mobile games, may more accurately reflect real-world performance.

Sources: Rightware Power Board (Basemark X benchmark results), GFXBench results database

Updated 5 March 2015: Add Galaxy S6 Edge result for GFXBench 3.1.
Updated 15 March 2015.

Tuesday, March 3, 2015

A detailed comparison of Cortex-A53-based and other SoCs using Geekbench, and impact of AArch64

More Cortex-A53 CPU core-based SoCs have recently come to market and more benchmark results are now available, for example from the Geekbench results database. Firmware is also becoming more mature. This makes it possible to make better comparisons between different Cortex-A53-based SoCs (for example, octa-core SoCs) and compare the performance of the highest-performance chips with competitive chips that use more expensive CPU cores such as Krait 400 and Cortex-A57.

Overview of Cortex-A53-based SoCs


The following is a list of Cortex-A53 CPU core-based mobile SoCs that have appeared in the market or for which benchmark results have become available. All chips integrate 4G LTE modem functionality unless otherwise noted.

  • Snapdragon 410 (MSM8916), utilizing four early Cortex-A53r0p0 cores. Numerous cost-sensitive smartphones now use this chip. However, none of them appears to take any advantage at all of the new ARMv8 instruction set, with all of them running in ARMv7 compatibility mode. This is counter-intuitive because AArch32 (32-bit version of ARMv8), which is used by the other SoCs, already brings significant benefits. Snapdragon 410 generally perform significantly worse than other Cortex-A53-based SoCs, even when correcting for the low clock speed. This is also reflected in memory performance. The Adreno 306 GPU tends to be even a little slower than the Adreno 305 GPU in Snapdragon 400. The net result is a chip that is not much faster than Snapdragon 400 in many cases while having worse battery life.
  • Snapdragon 615 (MSM8939), equipped with an octa-core Cortex-A53r0p1 CPU configuration with four cores running (in practice) at 1.54 GHz or 1.50 GHz and four cores running at a lower maximum clock frequency (probably 1.0 GHz). This chip has appeared in an increasing number of new smartphone models. Runs in AArch32 mode. Performance is significantly lower than MediaTek's octa-core Cortex-A53-based SoCs, which can run all eight Cortex-A53 cores at the maximum frequency. Memory performance is improved from Snapdragon 410 but falls short of that of MediaTek's SoCs. The Adreno 405 GPU is fairly competitive, suitable for a mid-range SoC, although the 32-bit RAM interface of the SoC limits performance, especially at high resolutions. It is manufactured used TSMC's lower performance 28LP process. There have been reports that the chip gets hot with intensive use and requires throttling.
  • MediaTek MT6732, with an quad-core Cortex-A53r0p2 CPU configuration running at a maximum clock speed of 1.5 GHz. Devices using the chip are starting to become available, and tablets with the tablet version of this chip (MT8732) have also been announced. Although it has only four CPU cores, it has good performance, beating Snapdragon 615 in single core performance at a similar clock speed, and memory performance is significantly higher. The Mali-T760 MP2 GPU contributes to better GPU performance than previous MediaTek chips targeting cost-sensitive segments, although falling short of that of Snapdragon 615 and MT6752. A tablet version of the chip exists as MT8732.
  • MediaTek MT6752, featuring an octa-core Cortex-A53r0p2 CPU configuration with a maximum clock frequency of 1.69 GHz. Several devices have come to market using this chip, including the Meizu M1 Note. Performance is excellent, with high scores in the Geekbench CPU benchmark, considerably higher than Snapdragon 615 and beating high-end SoCs such as Snapdragon 801 in several metrics. The Mali-T760 MP2 GPU is clocked higher than that of the MT6732, resulting in good GPU performance, comparable to that of Snapdragon 615, as measured with GFXBench, although the 32-bit memory interface will be a bottleneck at high resolutions. Manufactured using TSMC's high-performance 28HPM process. A tablet version of the chip exists as MT8752.
  • MediaTek MT6795, with an octa-core Cortex-A53r0p2 CPU with clock speed up to 2.16 GHz. With a dual-channel memory interface and high resolution support, this SoC targets a higher performance segment than the previously mentioned chips, for which it can potentially offer much better performance/dollar because of the small die size of Cortex-A53 cores. Originally announced as become available in commercial devices before the end of 2014, it was delayed but competitive benchmark scores for what appears to be more mature versions of the chip have recently shown up. It appears to be configured with full AArch64 mode. Performance is excellent, with single-core performance closing much of the gap with the high-end Snapdragon 801, while multi-core performance is significantly higher. There appears to be a "Turbo" version running the CPU up to 2.16 GHz, while the regular version clocks at 1.95 GHz. At the MWC on 2 March 2015, MediaTek apparently rebranded the MT6795 as Helio X10.
  • MediaTek's MT6735 is a SoC for entry-level smartphones for which benchmark results have not yet become available. It has a quad-core Cortex-A53 CPU configuration and a Mali-T720 GPU, a downgrade from the Mali-T760 GPU in MT6732. The recently announced MT6753, with eight Cortex-A53 cores running up to 1.5 GHz, is compatible with the MT6735 and also has a Mali-T720 GPU (probably MP4). Other chips that have shown up in product announcements include the MT8161 (probably the equivalent of the MT6735 without modem) and MT8165 (equivalent to MT8732 without modem).
  • Qualcomm has announced additional octa-core Cortex-A53-based chips, Snapdragon 415 and Snapdragon 425. These probably utilize symmetrical Cortex-A53 configuration with all cores running at the same maximum clock frequency, unlike Snapdragon 615. Otherwise, the new SoCs are similar to Snapdragon 615, with the same Adreno 405 GPU. According to Qualcomm, devices using these chips will become commercially available in the second half of 2015.
  • Kirin 620 (Hi6210) from HiSilicon (Huawei) is an octa-core Cortex-A53r0p3-based SoC running up to 1.2 GHz. The GPU is a Mali-450 MP4. Although performance (including single-core performance) is better than Snapdragon 410, it is not as optimized as chips such as MT6752 and runs at a relatively low clock speed. Multi-core performance scaling is less than expected.

Geekbench integer and memory scores comparison


The following table provides details about selected Geekbench integer and memory benchmark scores for different Cortex-A53-based SoCs, and also other smartphone SoCs from Qualcomm, MediaTek and Samsung for comparison.

                Arch    Max freq. JPEG C. IPC   JPEG C. Dijkstra      Stream Copy   Geekbench
                                  Single  x A7  Multi   Single Multi  Single Multi  Ref. number

Snapdragon 410  ARMv7     1.19      596   1.30   2384     810   2135   431   492    1551964
Snapdragon 615  AArch32 1.50/1.0    820   1.42   4979     886   3646   572   703    2015694
MT6732          AArch32   1.50      843   1.46   3357    1041   3002  1001  1199    1546611
MT6752          AArch32   1.69      952   1.46   7554    1144   4483  1071  1191    1583540
MT6795          AArch64   1.95     1026   1.37   8167     990   3802  1356  2068    2002894
MT6795T         AArch64   2.16     1128   1.36   8962    1064   4109  1350  2140    1984431
Hi6210          AArch32   1.20      660   1.43   3501     744   2772   602   900    1999304

Snapdragon 400  ARMv7     1.19      462   1.01   1860     700   2132   534   551    1938063
Snapdragon 801  ARMv7     2.46     1347   1.42   5437    1174   3586  1931  2144    1491681
Snapdragon 805  ARMv7     2.65     1475   1.45   4105    1230   4058  2117  2910    1502687
Snapdragon 810  AArch64  ?/1.55    1358          5972    1073   3584  1428  1838    2017257
MT6582          ARMv7     1.30      506   1.01   2027     748   2354   250   396    2017732
MT6592          ARMv7     1.66      643   1.01   5086     891   3327   261   388    2000008
MT6595          ARMv7   2.20/1.69  1350   1.59   6080    1844   5612  1652  1986    1591744
Exynos 5430     ARMv7   1.80/1.3   1056   1.52   5140    1102   3918  1457  1559    1556780
Exynos 5433     AArch32   1.89     1456   2.10   6209    1523   5728  1396  1458    2017193
Exynos 7420     AArch64  ?/1.50    1481          7168    1065   4596  1953  2579    2012972

The low performance of Snapdragon 410 is apparent in the scores, with normalized IPC (instructions per cycle to the equivalent of a 1.0 GHz Cortex-A7) for the CPU-speed sensitive single-core JPEG Compress benchmark being lower than that of other Cortex-A53-based SoCs, probably due to being limited to ARMv7. The Dijkstra benchmark even scores lower on Snapdragon 410 than on an equivalently clocked Snapdragon 400, and memory performance is also lower.

Snapdragon 615, while improving on Snapdragon 410, also appears to be less optimized than MT6732/MT6752 in terms of single-core IPC, despite a very similar clock frequency. Looking at multi-core performance, MT6752 is significantly faster than Snapdragon 615, largely due to being able run all eight cores at the maximum clock frequency. MT6732 and MT6752 also have significantly higher memory performance, reaching an impressive score for devices with a 32-bit memory interface.

The higher clock speed of MT6795 (Helio X10) brings benefits for integer performance, but due to the use of the AArch64 instruction set, normalized IPC is lower (1.36 vs 1.46 for JPEG Compress). This is especially true for the Dijkstra benchmark, where AArch64 mode imposes a significant penalty (this is also seen on other platforms utilizing AArch64).

Overall, a high-speed Cortex-A53 configuration such as implemented in the MT6795T comes fairly close to Snapdragon 801 for single-core performance, while being significantly faster for multi-core performance, at a significantly lower cost. Several metrics are also in the same ballpark as the current high-end leader Exynos 7420.

Analysis of the Geekbench Lua subtest


The Lua integer benchmark appears to be particularly sensitive to memory subsystem efficiency, including L2 cache size, and memory bandwidth as well being dependent on CPU speed. It is the kind of code that may frequently occur in actual practice on a smartphone.

                Arch      Lua     IPC   Lua    CPU    #CPUs
                          Single  x A7  Multi  Par.

Snapdragon 410  ARMv7      603    1.23  2137   3.54   4
Snapdragon 615  AArch32    709    1.15  1644   2.32   4 + 4
MT6732          AArch32    753    1.22  2419   3.21   4
MT6752          AArch32    842    1.21  2361   2.80   8
MT6795          AArch64   1053    1.31  8203   7.79   8
MT6795T         AArch64   1173    1.32  8847   7.54   8
Hi6210          AArch32    587    1.19  1740   2.96   8

Snapdragon 400  ARMv7      476    0.97  1874   3.94   4
Snapdragon 801  ARMv7      980    0.97  2880   2.94   4
Snapdragon 805  ARMv7     1016    0.93  2917   2.87   4
Snapdragon 810  AArch64   1283          1065   0.83   4 + 4
MT6582          ARMv7      514    0.96  1644   3.20   4
MT6592          ARMv7      651    0.95  1344   2.06   8
MT6595          ARMv7     1509    1.67  2498   1.66   4 + 4
Exynos 5430     ARMv7      981    1.33  1861   1.90   4 + 4
Exynos 5433     AArch32   1397    1.89  5478   3.92   4 + 4
Exynos 7420     AArch64   1409          7088   5.03   4 + 4

In this test, Snapdragon 410 performs reasonably well. MT6752's multi-core performance seems limited by a bottleneck, probably external memory bandwidth. MT6795's performance is impressive; while single-core performance falls a little short of Cortex-A57 based SoCs, for multi-core performance it blows past them, with CPU parallelism fully exploited. It seems the bottleneck present with the MT6752 (presumably memory bandwidth and the L2 cache memory size available to each core) is not present with the MT6795.

Qualcomm's Snapdragon 810 consistently scores in the 1000-1200 range for both the single-core and multi-core test, while the multi-core test would have been expected to be significantly higher. This appears to reflect a serious deficiency in the memory subsystem of the SoC (which might not only be related tot the LPDDR4 SDRAM controller, but also the on-chip L2 cache) which might also have negative implications for smoothness in every-day use.

Geekbench floating points subtests


Finally, let's look at floating point performance. The Mandelbrot subtest tests pure floating point performance, while the SGEMM and SFFT tests also significantly depend on memory performance.


                Arch      Mandelbrot                 SGEMM         SFFT
                          Single  IPC   Multi  Par.  Single Multi  Single Multi

Snapdragon 410  ARMv7      448    1.10  1794   4.00   245    489    317   1258
Snapdragon 615  AArch32    583    1.14  3611   6.19   303    688    426   2517
MT6732          AArch32    585    1.14  2336   3.99   337    653    430   1727
MT6752          AArch32    661    1.15  5257   7.95   384   1148    481   3870
MT6795          AArch64    823    1.24  6406   7.78   484   1542    618   4764
MT6795T         AArch64    912    1.24  7245   7.94   529   1659    694   5333
Hi6210          AArch32    467    1.14  3509   7.51   264    876    343   2178

Snapdragon 400  ARMv7      405    1.00  1620   4.00   203    634    285   1182
Snapdragon 801  ARMv7      788    0.94  3104   3.94   907   2816    992   3518
Snapdragon 805  ARMv7      848    0.94  3389   4.00  1011   2669   1130   4135
Snapdragon 810  AArch64   1100          5144   4.68   749   1828   1009   3643
MT6582          ARMv7      444    1.00  1765   3.98   230    512    328   1316
MT6592          ARMv7      568    1.00  4430   7.80   282    696    419   3397
MT6595          ARMv7     1284    1.71  5822   4.53   748   2337   1187   4255
Exynos 5430     ARMv7      990    1.61  4745   4.79   657   2491    896   3971
Exynos 5433     AArch32   1174    1.91  4883   4.16   751   2369   1044   4031
Exynos 7420     AArch64   1198          6129   5.12   945   2888   1313   4874

From these numbers its is clear that Cortex-A53 improves floating point performance somewhat when compared to Cortex-A7 at the same clock speed. When eight cores can run in parallel at high speed, multi-core floating point performance is impressive, as demonstrated by MT6752 and MT6795. Snapdragon 801 and 805 are looking a bit dated in this department.

In the memory-intensive SGEMM and SFFT tests, Snapdragon 400 comes close to Snapdragon 410, illustrating the lack of performance improvement by Snapdragon 410. In fact MediaTek's previous generation MT6582 matches the floating point performance of Snapdragon 410 across all tests.

The Cortex-A57 based SoCs have the highest single-core floating point performance, although the Cortex-A17-based MT6595 is also very strong. Exynos 5433 and Exynos 7420 beat Snapdragon 810 in most floating point tests, although the difference is not as large as it used to be with earlier results for Snapdragon 810.

Conclusion


It is clear that octa-core Cortex-A53-based SoCs can deliver strong performance at a relatively low cost, and this particularly true for MediaTek's new chips, MT6752 and MT6795. The MT6795, with its higher clock speed and dual-channel memory interface, can match current high-end chips in most metrics, being not much slower in single-core performance while being superior in multi-core.

One unknown question is whether the high maximum clock frequency of the MT6795 and MT6795T, which deliver impressive performance/dollar, translates to acceptable power consumption and battery life. Observations that power consumption for Cortex-A53 can quickly increase at higher frequencies for the Samsung-manufactured Exynos 5433 have been made, but MT6795 is manufactured on different process at TSMC and probably makes use of specific design optimizations for high clock speeds (ARM POP IP core hardening technology) that make power consumption more acceptable.

Sources: Geekbench Browser

Updated 10 March 2015.

Sunday, March 1, 2015

Samsung announces Galaxy S6 with Exynos 7420 SoC manufactured on "14nm" FinFET process

At the Mobile World Congress today (Sunday 1 March), Samsung announced the Galaxy S6 and Galaxy S6 Edge, featuring a numerous improvements over the previous generation Galaxy S5, including a SoC manufactured on Samsung's 14 nm FinFET-based process. The Galaxy S6 is planned to available in 20 countries starting on April 10th, 2015.

New model implement several improvements


The improvements in the new model include the following:
  • Exynos 7420 SoC manufactured on 14 nm FinFET process with 20 nm interconnects. The CPU is a big.LITTLE configuration with four Cortex-A57 and four Cortex-A53 cores, similar to Exynos 5433. The maximum clock speeds are 2.1 GHz and 1.5 GHz, respectively. Samsung claims 20% better performance and 35% better efficiency for the new chip when compared to Exynos 5433, which is manufactured using Samsung's 20 nm HKMG process.
  • The GPU has been rumoured to be a faster version of the Exynos 5433's Mali-T760 MP6 (either a higher clock rate or an MP8 configuration).
  • Early benchmarks indicate a significant increase in CPU and memory performance combined with a measurable increase in GPU performance (which is required because of the higher screen resolution).
  • Runs in 64-bit AArch64 mode, which has several advantages, as well as some disadvantages.
  • Uses new LPDDR4 SDRAM (3 GB), which has higher memory bandwidth at a given memory bus width due to higher effective clock speeds.
  • The cameras have been improved, including greater light gathering capability.
  • The 5.1" AMOLED screen's resolution is QHD (2560x1440), which is 77% more pixels than the FullHD (1920x1080) screen in Galaxy S5. The higher CPU, GPU and memory performance are essential to keep pace with increased demands caused by the higher resolution.
  • Utilizes the new UFS 2.0 interface for embedded flash memory, providing SSD-like performance according to Samsung.
  • Cat 6 LTE mode.
  • Touchwiz user-interface on top of 64-bit Android 5.0 is said to be more intuitive and less demanding in terms of processing requirements.
At the same time,  Samsung has dropped the MicroSD slot and the battery is non-removable. The battery capacity is also slightly smaller that of the Galaxy S5.

The Galaxy S6 Edge, like the Galaxy Note 4 Edge, features a screen that curves around the edges. It is priced significantly higher than the Galaxy S6, which will not be cheap either.

Quick ramp of 14nm FinFET process brings challenges to Samsung


The initial 14 nm FinFET process used by Samsung has been reported to use 20 nm interconnects with a 14 nm features size. As such it is more of an evolutionary step from 20 nm than full-blooded 14 nm FinFET would be, comparable to some degree with TSMC's 16FF process.

Still, Samsung will face a huge challenge ramping up the process in sufficient volume and acceptable yield rates to equip the high volume of Galaxy S6's expected. Rumours have mentioned low yield for the process in the recent past as Samsung started ramping up (test) production. Given the massive investment in the new process and non-optimal yield rates, it is unlikely that Samsung will significantly benefit financially from production of the chip in the near-term in terms of gross margin and other chip production-related metrics.

However, the performance lead of the Galaxy S6 made possible by the new chip could have significant positive implications for the sales and financial performance of Samsung's smartphone division, allowing Samsung to recoup some of its investment.

A few months ago, Samsung already signed an agreement with Apple whereby Samsung would supply part of the production capacity for future Apple processors. If this bears fruit it would allow Samsung to recoup more of its investment in 14 nm FinFET technology in the future.

Early benchmark performance impressive


In early benchmarks scores reported in Geekbench's result database, a device that probably is the Galaxy S6 shows impressive performance, well ahead of most existing SoCs and devices. In a direct comparison with an Exynos 5433-equipped Galaxy Note 4, the performance gain is fairly significant for most benchmarks (up to 30% for integer tests, higher for floating point), with a few negative outliers such as SHA2 and the Dijkstra integer subtest. The Dijkstra subtest also scores lower on other 64-bit AArch64 platforms, suggesting it suffers from particular AArch64 features such as the doubled size for pointer storage.

Memory performance is also significantly higher, aided by high clock rate and high amount of bandwidth delivered by the LPDDR4 memory interface, which unlike Qualcomm's Snapdragon 810 does not seem to have serious flaws.

Sources: AnandTech (Samsung annnounces the Galaxy S6 and Galaxy S6 Edge), AnandTech (Samsung Unpacked, MWC 2015 Live Blog), Geekbench Browser (Samsung SM-G925F)

Wednesday, February 25, 2015

Early benchmarks for MT6795 show high performance, suggest use of eight Cortex-A53 cores

MediaTek originally announced the MT6795, a SoC targeting the premium-level and performance segments of the smartphone market, in July 2014, with expectations of devices being commercially available to end users before the end of 2014. However, the chip was delayed (problems with the memory controller were reported) and competitive benchmark results are only now beginning to surface for the chip.

According to the announcement, the SoC was to have an octa-core CPU configuration with clock speeds up to 2.2 GHz, a strong dual-channel memory interface with support for LPDDR3 up to 933 MHz, 2K (2560x1600) display support. Other reports and information have suggested that it uses a PowerVR G6200 GPU, similar to the one used in MediaTek's MT6595, which can be seen as 32-bit predecessor of the new chip.

Confusion about processor cores, octa-core Cortex-A53 seems likely


The actual CPU cores used inside the MT6795 continue to be source of confusion. Initially understood to be an octa-core Cortex-A53 CPU configuration clocked at a high frequency, later a purported leaked MediaTek product roadmap surfaced that described the MT6795 as a big.LITTLE design that includes Cortex-A57 cores. However, a recent new entry in the Geekbench database suggesst that the chip actually has eight Cortex-A53 cores as originally suspected, as the IPC (instructions per cycle) of the integer and floating point subtests would be hard to reconcile with Cortex-A57 cores being present.

Geekbench results show mixed performance but high overall score


The Geekbench results show strong CPU performance, with the overall score being superior to that of available results for Snapdragon 810, which has a significantly higher cost design but has been plagued by performance issues, although it scores lower than Exynos 5433/Exynos 7 Octa with Cortex-A57 cores as used in the Galaxy Note 4. Note that MT6795 uses a less advanced 28 nm process compared to the 20 nm process used for Snapdragon 810 and Exynos 5433.

Single-score integer performance is not spectacular and below that of the previous generation high-end chips such as Snapdragon 801. Although this is compatible with the use of medium-performance Cortex-A53 cores, integer single-core performance is actually lower than the mid-range MT6752, despite the higher clock rate, pointing to continuing hardware performance problems with the chip. The Dijkstra benchmark result is particular low. This benchmark has a lot of external memory access and likely branches a lot, taxing certain elements of the CPU and SoC that simpler CPU benchmarks do not. It may be affected by the doubled address size in AArch64 mode, either through the increased size of pointer storage or reduced efficiency of the branch prediction unit inside the processor core.

Single core floating point performance in the Mandelbrot benchmark is higher than the MT6752 and actually compatible with the Cortex-A53 core running at 2.1 GHz, close to the originally envisaged maximum clock speed for the MT6795. Multi-core performance in this subtest is impressive, with a score that is higher than most existing SoCs including Exynos 7 Octa, which employs faster Cortex-A57 cores.

Finally, the dual-channel memory interface seems to working reasonably well in the tested revision of the chip/development board, with memory scores consistent with an optimized dual-channel interface, and higher, for example, than those of Exynos 5433. However, they are generally lower than those of the 32-bit MT6595.

One caveat is that the MT6795 entry is running in AArch64 mode, while the other devices were running in AArch32 (32-bit ARMv8) or 32-bit ARMv7 mode.

Average single-core CPU performance, strong multi-core performance


In a direct comparison with the MT6752, which has a comparable CPU configuration but clocked lower and has only a 32-bit memory interface, the MT6795 is only slightly faster, although the MT6795 uses a full 64-bit AArch64 instruction set model, while the tested MT6752 configurations use AArch32 with partial use of ARMv8 features. There are a few anomalous results, including a low score for the MT6795 in the single-core AES benchmark, and as mentioned it also scores significantly lower in the Dijkstra benchmark. Floating point performance is consistently higher for the MT6795 (more than the increase in clock rate would explain), which may be caused by the higher-performance memory subsystem of the MT6795 and/or the increased number of floating point registers available in AArch64 mode.

The MT6795 is clearly slower than its 32-bit predecessor MT6595 (which uses high-performance Cortex-A17 and Cortex-A7 cores in a big.LITTLE configuration) in most metrics, with only the heavy weighting and large performance gain for the AES and SHA1 cryptography tests  (due to the new ARMv8 instruction set) shifting the advantage for the overall score towards the MT6795.

When making a comparison with a median entry for the high performance Exynos 5433 (Exynos 7 Octa) inside the Samsung Galaxy Note 4, the MT6795 fairly consistently shows clearly lower single-core performance but higher multi-core performance.

MT6795 likely to be most cost-effective performance segment processor on the market


The exclusive use of Cortex-A53 CPU cores, and not the much more expensive and die-space consuming Cortex-A57 (or, in a 32-bit comparison, Cortex-A15/A17 cores), has positive implications for the cost of the chip. Die space dedicated to the CPU cores will be relatively low, although L2 caches will take considerable space when configured with a size that matches the desired performance level and market segment. Overall, the chip is likely to be attractive in terms of performance/dollar for the performance segment.

In terms of SoC optimizations, the chip would probably work better with the employment of additional ARM IP such as a Mali T760 or Mali-T800 series GPU, which offers advantages in combination with ARM cores such as Cortex-A53 in tandem with techniques such as AFBC, smart composition and transaction elimination, and new interconnect buses within the chip. SoCs like the MT6752 probably benefit from these optimizations, while the MT6795 cannot do so fully because of the non-ARM GPU. It seems likely that the MT6795 will be superseeded in next generation products to be announced by MediaTek in the future by a similar SoC with an ARM Mali-T760 or T800 series GPU.

Update (2 March): Based on a closed-door presentation event at the MWC, MediaTek appears to have rebranded MT6795 as Helio X10 with future Helio P series products also being announced.

Sources: MediaTek (MT6795 announcement), Geekbench browser

Tuesday, December 30, 2014

Early benchmarks for Snapdragon 810 show performance flaws

Recently, reports have surfaced, including one from BusinessKorea published on December 4, about Qualcomm's new high-end chip, Snapdragon 810, being affected by performance issues related to heat production and issues with the memory controller. Subsequently, Geekbench results for some Samsung prototype devices using the SoC (MSM8994) have also appeared in the Geekbench results database. Detailed analysis of the Geekbench results seems to confirm the issues with thermal throttling and especially memory controller performance, at least in the early revision of SoC that was used to obtain the mentioned benchmark scores, resulting in sub-par performance for its segment.

Updated (January 5, 2015): A section has been added discussing new Geekbench results from a LG G Flex2 prototype using Snapdragon 810, which shows improvement in some areas.

Snapdragon 810: A departure from Qualcomm's in-house Krait cores


For a long time, Qualcomm has used its own ARM-compatible Krait cores (most recently Krait-400/450 in Snapdragon 801/805) for SoCs targeting the performance segment. However, with Snapdragon 810 (as well as Snapdragon 808 and to a certain extent Snapdragon 615), Qualcomm seems to be migrating to standard ARM cores for performance-oriented SoCs. Some time ago, Qualcomm already transitioned its cost-effective SoCs (such as the Snapdragon 200 and 400 series) to cost efficient ARM cores such as Cortex-A7 (and later Cortex-A53).

Snapdragon 810 contains four Cortex-A57 cores (clocked up to about 1.5 GHz based on current evidence) as well as four Cortex-A53 cores in a big.LITTLE configuration. In this respect the chip is similar to Samsung's Exynos 7 Octa (5433) that has already been shipping for several months in devices such as the Galaxy Note 4 and shows impressive CPU performance. However, Snapdragon 810 is the direct successor to Snapdragon 805 and has a similarly ambitious memory interface with high total bandwidth (pioneering the use of new LPDDR4 SDRAM), which puts it squarely in the very high end category, like Snapdragon 805.

Qualcomm also has a SoC in planning for the more mainstream part of the high-end performance segment, Snapdragon 808, which has two Cortex-A57 cores instead of four while retaining the four Cortex-A53 cores. Importantly, Snapdragon 808 also simplifies the memory interface to dual-channel 32-bit with more standard LPDDR3 memory instead of LPDDR4, reducing cost and being comparable to Snapdragon 801, the current high-end standard.

20nm process and LPDDR4 memory


Snapdragon 810 is Qualcomm's first SoC product to be manufactured using TSMC's 20nm process technology. 20nm, in theory, significantly increases performance and power efficiency when compared to the 28nm process technology that Qualcomm has been using recently for most of its chips.

The SoC also features a LPDDR4 external memory interface in a dual-channel 32-bit configuration, with maximum clock speed of 1600 MHz according to Qualcomm's webpage, resulting in memory bandwidth of 25.6 GB/s, similar to Snapdragon 805, which achieves its bandwidth with a wide 64-bit dual channel memory interface with LPDDR3. This is a very high amount of memory bandwidth for a mobile device, making the chip suitable for driving very high resolutions such as QHD. However, it also increases cost, and the apparent requirement of using higher-clocked LPDDR4 memory instead of mainstream LPDDR3 is also likely to increase cost, despite the reduction in memory bus width allowed by LPDDR4.

Snapdragon 808 likely to be more attractive for high-volume flagship devices


Meanwhile, Snapdragon 808 seems to provide a more practical performance-oriented platform by utilizing standard LPDDR3 in a dual-channel 32-bit at a clock speed up to 933 MHz, resulting in maximum memory bandwidth of 14.9 GB/s. Overall, Snapdragon 808 seems to be much more attractive for high-volume high-end devices as a successor to Qualcomm's popular Snapdragon 801.

Performance flaws evident in early Geekbench database entries


Early Geekbench results database entries show lower-than-expected CPU and memory performance, and detailed analysis of the results seems to confirm the reports about thermal throttling due to heat production as well as lower-than-expected memory performance. In practice, the version of Snapdragon 810 that was benchmarked seems to provide performance lower than even Snapdragon 801 in most respects.

Performance data for Snapdragon 810 in the Geekbench entries is clouded somewhat because of the use of 64-bit Aarch64 mode in Android. Until now, most Cortex-A57 and Cortex-A53 based solutions use AArch32 (32-bit ARMv8 mode, which takes advantage of some of the new features of Armv8 but is not fully 64-bit). Android AArch64 support and performance has been work in progress and is still likely to be not fully optimized. However, in the case of the Snapdragon 810 results, the performance deficit is of such magnitude that is clear that they are caused by flaws in the chip implementation and not AArch64 mode.

In the table in the Appendix below, some Snapdragon 810 and 801 results have been highlighted in bold to show some of the performance differences and in particular the areas where Snapdragon 810 performance is much lower than expected.

There are several entries for the device in the database that show considerable variation between runs, providing evidence that performance throttling caused by heat production is a significant problem. For the analysis below, the best benchmark result among the various entries has been used. There is evidence that some of the later entries impose a CPU clock speed limit of about 1.0 GHz or perhaps only use the Cortex-A53 cores in some cases (these entries are also represented in the table).

Deficits in pure CPU performance, especially multi-core


Compared to Samsung's Exynos 7 Octa (5433), which has a similar CPU configuration, basic integer tests such as JPEG Compress already show somewhat lower than expected performance based on the reported clock speed, with multi-core performance scaling being considerably less than expected, and also clearly lower than Snapdragon 801. The Dijkstra benchmark, which has more external memory access and branching, is more heavily affected and is at least 35% slower than on Exynos 5433, despite a similar clock speed, and slower than Snapdragon 801 as well as Snapdragon 805. However, this may for a large part be due to running in AArch64 compared to 32-bit mode used on the other chips, since the Dijkstra benchmark seems to similarly affected on other platforms that use AArch64.

For floating point performance, pure single-core performance, as shown by the Mandelbrot subtest results, is relatively unaffected, but multi-core performance scaling is much lower than Exynos, resulting in performance comparable to Snapdragon 805 rather than the higher floating point performance expected from Cortex-A57 cores (such as in Exynos 5433).

Memory performance significantly impacted


Memory performance is clearly seriously affected, confirming reported issues with the memory controller. The raw throughput of the Stream Copy subtest is signficantly lower than expected based on the 32-bit dual-channel memory interface with double-speed LPDDR4, being lower than Snapdragon 805 with a similar amount of memory bandwidth and even significantly lower than Snapdragon 801 with its 32-bit dual-channel LPDDR3 interface.

The flaws in memory performance are evident in the SGEMM subtest, which is a floating point test that is heavy on sequential memory access. Snapdragon 810 shows performance for this test barely more than half that of Snapdragon 801 and 805. It is even worse for the multi-core test, where Snapdragon 810 shows performance scaling worse than two times, while Snapdragon 801 and 805 have performance scaling more in line with the four CPU scores they possess.

Finally, in the SFFT test, which is a floating point test with heavy random memory access, only shows roughly half the performance of Snapdragon 801, Snapdragon 805 as well as Exynos 5433. This seems to provide the clearest evidence of performance problems with the memory controller.

Snapdragon 810 likely to be too costly for mainstream high-end devices


In popular technology websites on the internet, Snapdragon 810 has recently frequenty been mentioned as the likely chip used for future high-end models for a diverse range of well-known manufacturers such as Samsung, HTC and LG. However, the high-banwidth LPDDR4 memory interface (which increases device cost) and performance targets seems to put it clearly in the very high end category, comparable to Snapdragon 805, which does not make it ideal for high-volume performance devices that do not have an extremely high screen resolution such as QHD (2560x1440). Other new chips such as Snapdragon 808 and (for mid-range) Snapdragon 615 seems to be more suitable for performance-oriented mainstream devices, including several of the mainstream flagship devices from the mentioned manufacturers.

However, if the performance flaws that are evident in the current Snapdragon 810 are not fixed or if Qualcomm has significant inventory of flawed chips, it is possible that they will be unloaded onto the more mainstream performance segment for a discounted price. It seems likely however that Qualcomm, given its chip expertise, will be able to fix most of the performance issues with the Snapdragon 810 in a future revision of the chip.

Update (January 5): LG prototype shows better multi-core performance


A Geekbench test run was recorded on January 5 for a prototype LG G Flex2 with Snapdragon 810. This result shows some improvements, especially in the overall multi-core score, although it still well below that of Exynos 7 Octa (5433) which has a similar CPU configuration.

A closer look reveals that integer benchmarks, especially the more memory-intensive Dijkstra subtest, has not materially improved over the prior results. Multi-core floating point performance has improved significantly and contributes to the higher total multi-core score.

However, memory tests show mixed results. The Stream Copy subtests are lower than the previous best results from last month, remaining significantly lower than Snapdragon 805 and even Snapdragon 801, suggesting that sequential memory access performance has not improved. This is corroborated by the SGEMM subtest results, which also depend on sequential memory access performance and show results that are very similar to the earlier scores.

Meanwhile, the SFFT scores show a significant uptick, especially for multi-core performance, suggesting that Qualcomm has been able to improve the random memory access performance of the chip. However, the subtest scores are still clearly below those of Exynos 5433, Snapdragon 805 and even Snapdragon 801.

Update (January 10): New prototype entry shows improvements in memory performance


A subsequent Geekbench result entry recorded on January 9 for an unknown device shows further improvements in memory performance, although still falling short of the memory performance of the more mainstream Snapdragon 801 (let alone Snapdragon 805). The single-core JPEG Compress subtest result is also improved, but overall the CPU performance results still suggest that thermal throttling because of overheating is still likely to be a significant problem.

Appendix: Geekbench performance table


The table below is similar to the one published in my previous article. In the bottom half of the table, some relevant benchmark scores for Snapdragon 810 and Snapdragon 801/805 have been highlighted.

For a high-resolution version, view/copy/save the image above using the browser.

Sources: BusinessKoreaGeekbench browser (Samsung SM-N916S results), Qualcomm (Snapdragon 810 page), Wikipedia (Qualcomm Snapdragon)

Updated (January 5, 2015): Add discussion of recent LG prototype Geekbench test results, update performance table (also include Intel Atom results).
Updated (January 8, 2015): Correct DRAM interface of Snapdagon 810 (it is 32-bit dual-channel using LPDDR4, which can be clocked much higher than LPDDR3).
Updated (January 10, 2015): Add discussion of new Geekbench result entry, updated table.