Tuesday, December 30, 2014

Early benchmarks for Snapdragon 810 show performance flaws

Recently, reports have surfaced, including one from BusinessKorea published on December 4, about Qualcomm's new high-end chip, Snapdragon 810, being affected by performance issues related to heat production and issues with the memory controller. Subsequently, Geekbench results for some Samsung prototype devices using the SoC (MSM8994) have also appeared in the Geekbench results database. Detailed analysis of the Geekbench results seems to confirm the issues with thermal throttling and especially memory controller performance, at least in the early revision of SoC that was used to obtain the mentioned benchmark scores, resulting in sub-par performance for its segment.

Updated (January 5, 2015): A section has been added discussing new Geekbench results from a LG G Flex2 prototype using Snapdragon 810, which shows improvement in some areas.

Snapdragon 810: A departure from Qualcomm's in-house Krait cores


For a long time, Qualcomm has used its own ARM-compatible Krait cores (most recently Krait-400/450 in Snapdragon 801/805) for SoCs targeting the performance segment. However, with Snapdragon 810 (as well as Snapdragon 808 and to a certain extent Snapdragon 615), Qualcomm seems to be migrating to standard ARM cores for performance-oriented SoCs. Some time ago, Qualcomm already transitioned its cost-effective SoCs (such as the Snapdragon 200 and 400 series) to cost efficient ARM cores such as Cortex-A7 (and later Cortex-A53).

Snapdragon 810 contains four Cortex-A57 cores (clocked up to about 1.5 GHz based on current evidence) as well as four Cortex-A53 cores in a big.LITTLE configuration. In this respect the chip is similar to Samsung's Exynos 7 Octa (5433) that has already been shipping for several months in devices such as the Galaxy Note 4 and shows impressive CPU performance. However, Snapdragon 810 is the direct successor to Snapdragon 805 and has a similarly ambitious memory interface with high total bandwidth (pioneering the use of new LPDDR4 SDRAM), which puts it squarely in the very high end category, like Snapdragon 805.

Qualcomm also has a SoC in planning for the more mainstream part of the high-end performance segment, Snapdragon 808, which has two Cortex-A57 cores instead of four while retaining the four Cortex-A53 cores. Importantly, Snapdragon 808 also simplifies the memory interface to dual-channel 32-bit with more standard LPDDR3 memory instead of LPDDR4, reducing cost and being comparable to Snapdragon 801, the current high-end standard.

20nm process and LPDDR4 memory


Snapdragon 810 is Qualcomm's first SoC product to be manufactured using TSMC's 20nm process technology. 20nm, in theory, significantly increases performance and power efficiency when compared to the 28nm process technology that Qualcomm has been using recently for most of its chips.

The SoC also features a LPDDR4 external memory interface in a dual-channel 32-bit configuration, with maximum clock speed of 1600 MHz according to Qualcomm's webpage, resulting in memory bandwidth of 25.6 GB/s, similar to Snapdragon 805, which achieves its bandwidth with a wide 64-bit dual channel memory interface with LPDDR3. This is a very high amount of memory bandwidth for a mobile device, making the chip suitable for driving very high resolutions such as QHD. However, it also increases cost, and the apparent requirement of using higher-clocked LPDDR4 memory instead of mainstream LPDDR3 is also likely to increase cost, despite the reduction in memory bus width allowed by LPDDR4.

Snapdragon 808 likely to be more attractive for high-volume flagship devices


Meanwhile, Snapdragon 808 seems to provide a more practical performance-oriented platform by utilizing standard LPDDR3 in a dual-channel 32-bit at a clock speed up to 933 MHz, resulting in maximum memory bandwidth of 14.9 GB/s. Overall, Snapdragon 808 seems to be much more attractive for high-volume high-end devices as a successor to Qualcomm's popular Snapdragon 801.

Performance flaws evident in early Geekbench database entries


Early Geekbench results database entries show lower-than-expected CPU and memory performance, and detailed analysis of the results seems to confirm the reports about thermal throttling due to heat production as well as lower-than-expected memory performance. In practice, the version of Snapdragon 810 that was benchmarked seems to provide performance lower than even Snapdragon 801 in most respects.

Performance data for Snapdragon 810 in the Geekbench entries is clouded somewhat because of the use of 64-bit Aarch64 mode in Android. Until now, most Cortex-A57 and Cortex-A53 based solutions use AArch32 (32-bit ARMv8 mode, which takes advantage of some of the new features of Armv8 but is not fully 64-bit). Android AArch64 support and performance has been work in progress and is still likely to be not fully optimized. However, in the case of the Snapdragon 810 results, the performance deficit is of such magnitude that is clear that they are caused by flaws in the chip implementation and not AArch64 mode.

In the table in the Appendix below, some Snapdragon 810 and 801 results have been highlighted in bold to show some of the performance differences and in particular the areas where Snapdragon 810 performance is much lower than expected.

There are several entries for the device in the database that show considerable variation between runs, providing evidence that performance throttling caused by heat production is a significant problem. For the analysis below, the best benchmark result among the various entries has been used. There is evidence that some of the later entries impose a CPU clock speed limit of about 1.0 GHz or perhaps only use the Cortex-A53 cores in some cases (these entries are also represented in the table).

Deficits in pure CPU performance, especially multi-core


Compared to Samsung's Exynos 7 Octa (5433), which has a similar CPU configuration, basic integer tests such as JPEG Compress already show somewhat lower than expected performance based on the reported clock speed, with multi-core performance scaling being considerably less than expected, and also clearly lower than Snapdragon 801. The Dijkstra benchmark, which has more external memory access and branching, is more heavily affected and is at least 35% slower than on Exynos 5433, despite a similar clock speed, and slower than Snapdragon 801 as well as Snapdragon 805. However, this may for a large part be due to running in AArch64 compared to 32-bit mode used on the other chips, since the Dijkstra benchmark seems to similarly affected on other platforms that use AArch64.

For floating point performance, pure single-core performance, as shown by the Mandelbrot subtest results, is relatively unaffected, but multi-core performance scaling is much lower than Exynos, resulting in performance comparable to Snapdragon 805 rather than the higher floating point performance expected from Cortex-A57 cores (such as in Exynos 5433).

Memory performance significantly impacted


Memory performance is clearly seriously affected, confirming reported issues with the memory controller. The raw throughput of the Stream Copy subtest is signficantly lower than expected based on the 32-bit dual-channel memory interface with double-speed LPDDR4, being lower than Snapdragon 805 with a similar amount of memory bandwidth and even significantly lower than Snapdragon 801 with its 32-bit dual-channel LPDDR3 interface.

The flaws in memory performance are evident in the SGEMM subtest, which is a floating point test that is heavy on sequential memory access. Snapdragon 810 shows performance for this test barely more than half that of Snapdragon 801 and 805. It is even worse for the multi-core test, where Snapdragon 810 shows performance scaling worse than two times, while Snapdragon 801 and 805 have performance scaling more in line with the four CPU scores they possess.

Finally, in the SFFT test, which is a floating point test with heavy random memory access, only shows roughly half the performance of Snapdragon 801, Snapdragon 805 as well as Exynos 5433. This seems to provide the clearest evidence of performance problems with the memory controller.

Snapdragon 810 likely to be too costly for mainstream high-end devices


In popular technology websites on the internet, Snapdragon 810 has recently frequenty been mentioned as the likely chip used for future high-end models for a diverse range of well-known manufacturers such as Samsung, HTC and LG. However, the high-banwidth LPDDR4 memory interface (which increases device cost) and performance targets seems to put it clearly in the very high end category, comparable to Snapdragon 805, which does not make it ideal for high-volume performance devices that do not have an extremely high screen resolution such as QHD (2560x1440). Other new chips such as Snapdragon 808 and (for mid-range) Snapdragon 615 seems to be more suitable for performance-oriented mainstream devices, including several of the mainstream flagship devices from the mentioned manufacturers.

However, if the performance flaws that are evident in the current Snapdragon 810 are not fixed or if Qualcomm has significant inventory of flawed chips, it is possible that they will be unloaded onto the more mainstream performance segment for a discounted price. It seems likely however that Qualcomm, given its chip expertise, will be able to fix most of the performance issues with the Snapdragon 810 in a future revision of the chip.

Update (January 5): LG prototype shows better multi-core performance


A Geekbench test run was recorded on January 5 for a prototype LG G Flex2 with Snapdragon 810. This result shows some improvements, especially in the overall multi-core score, although it still well below that of Exynos 7 Octa (5433) which has a similar CPU configuration.

A closer look reveals that integer benchmarks, especially the more memory-intensive Dijkstra subtest, has not materially improved over the prior results. Multi-core floating point performance has improved significantly and contributes to the higher total multi-core score.

However, memory tests show mixed results. The Stream Copy subtests are lower than the previous best results from last month, remaining significantly lower than Snapdragon 805 and even Snapdragon 801, suggesting that sequential memory access performance has not improved. This is corroborated by the SGEMM subtest results, which also depend on sequential memory access performance and show results that are very similar to the earlier scores.

Meanwhile, the SFFT scores show a significant uptick, especially for multi-core performance, suggesting that Qualcomm has been able to improve the random memory access performance of the chip. However, the subtest scores are still clearly below those of Exynos 5433, Snapdragon 805 and even Snapdragon 801.

Update (January 10): New prototype entry shows improvements in memory performance


A subsequent Geekbench result entry recorded on January 9 for an unknown device shows further improvements in memory performance, although still falling short of the memory performance of the more mainstream Snapdragon 801 (let alone Snapdragon 805). The single-core JPEG Compress subtest result is also improved, but overall the CPU performance results still suggest that thermal throttling because of overheating is still likely to be a significant problem.

Appendix: Geekbench performance table


The table below is similar to the one published in my previous article. In the bottom half of the table, some relevant benchmark scores for Snapdragon 810 and Snapdragon 801/805 have been highlighted.

For a high-resolution version, view/copy/save the image above using the browser.

Sources: BusinessKoreaGeekbench browser (Samsung SM-N916S results), Qualcomm (Snapdragon 810 page), Wikipedia (Qualcomm Snapdragon)

Updated (January 5, 2015): Add discussion of recent LG prototype Geekbench test results, update performance table (also include Intel Atom results).
Updated (January 8, 2015): Correct DRAM interface of Snapdagon 810 (it is 32-bit dual-channel using LPDDR4, which can be clocked much higher than LPDDR3).
Updated (January 10, 2015): Add discussion of new Geekbench result entry, updated table.

1 comment:

Unknown said...

super! keep up the good work! the nerds and psuedo-nerds of the world appreciate you!! *_*