Showing posts with label Process nodes. Show all posts
Showing posts with label Process nodes. Show all posts

Thursday, June 4, 2015

MediaTek announces Helio P10 and MT6753 arrives in shipping devices

MediaTek has announced Helio P10 (MT6755), a performance mid-range smartphone SoC that is the successor of MT6752. Featuring an octa-core Cortex-A53 configuration, Helio P10 improves upon MT6752 by using TSMC's new 28HPC+ manufacturing process, which delivers power efficiency and performance improvements while remaining relatively cost-effective. It can reach a higher maximum CPU clock speed up to 2 GHz and upgrades the GPU to a Mali-T860 MP2. It is expected to be commercially available in end devices by the end of 2015.

Features shared with Helio-X10


The new SoC  incorporates a few features from Helio X10 (MT6795), MediaTek's current high-end offering, including dual ISPs with 21MP camera support and improved capture capability, as well as improved audio quality.

Otherwise, the SoC has significant similarities to MediaTek's MT6752 which it succeeds, most likely including a 32-bit external memory interface, which keeps SoC cost and phone PCB cost down. With MT6752, MediaTek already demonstrated the ability to achieve memory performance adequate for a 1080p device within the constraints of a 32-bit memory interface.

The 28HPC+ process is an upgrade of the existing 28HPC (high-performance compact) process (which is also relatively new, used by Allwinner's A83T and other SoCs), which improves performance and cost relative to the established 28HPM (high-performance mobile) process. Existing MediaTek chips like MT6752 and MT6795 most likely use 28HPM, which is established and has also been used for previous-generation SoCs such as MT6592 and Snapdragon 801/805.

MediaTek migrating to big.LITTLE CPU configurations in new SoCs


A significant departure from existing octa-core MediaTek SoCs such as MT6752 and Helio X10 (MT6795) is the pseudo-big.LITTLE CPU configuration, whereby one cluster of four Cortex-A53 cores is clocked at a higher frequency (up to 2 GHz in this case), while the second of cluster Cortex-A53 cores is optimized for lower frequencies, being clocked at a lower maximum frequency (1.1 GHz according to AnandTech).

Together with the previously announced high-end Helio X20 (MT6797) and tablet/Chromebook-oriented chips such as MT8173, Helio P10 marks a migration to (pseudo-)big.LITTLE, hierarchical CPU designs at MediaTek. While symmetrical octa-core designs such as MT6752 and MT6795 reach very high multi-core processing power by allowing all cores to run at the maximum frequency, there are signs that this configuration impacts power efficiency for tasks that require less CPU power, which can be run on power-optimized low-frequency cores.

In practice, this may be reflected in somewhat mediocre standby battery life for smartphones using MT6752 or MT6795, even though power efficiency for demanding tasks that utilize all cores is likely to be pretty good.

Budget mid-range MT6753 reaches end-market


Meanwhile, MediaTek's previously announced MT6753, which is a cost-effective budget mid-range SoC, has arrived in commercially shipping device in the form of Meizu M2 Note. Despite the name chosen by Meizu, the new model actually has lower performance than the existing Meizu M1 Note, because the MT6753  is a less costly, lower end chip when compared to to the MT6752 inside the M1 Note, with considerably slower maximum CPU speeds for the eight CPU cores, as well as a lower performance GPU. There are also signs that the memory interface and the actual memory frequency used by the M2 Note is slower. The lower cost of the MT6753 platform is reflected in the low selling price of the Meizu M2 Note.

MT6753 implements several cost-reducing features, including a lower maximum clock speed (reported to be 1.3 GHz for the M2 Note), most likely associated with a cheaper manufacturing process (either 28LP or 28HPC) than the 28HPM process of the MT6752. A significant factor for lower performance is likely to be a reduced size of the L2 CPU cache inside the MT6753. MT6753 is likely to become a significant volume driver in MediaTek's 4G product line.

However, early Geekbench entries for the Meizu M2 Note suggest that the CPU cores of the MT6753 SoC used in this model are mostly unable to reach the planned clock frequency. The Geekbench results are mostly consistent with an average maximum CPU clock speed of about 1.1 GHz, significantly lower than the 1.3 GHz reported by the OS and the 1.5 GHz mentioned when the MT6753 was originally announced a few months ago. My following blog article about the use of AArch64 provides more details on this subject.

MT6753 has lower-performance GPU than MT6752


MT6753 also has a significantly lower-performance and smaller GPU (Mali-T720 MP3), compared to the Mali-T760 MP2 inside MT6752. MT6753 marks the first Mali implementation with three pixel processing cores; previous Mali GPUs either had one, two, four, six or eight pixel processing cores, Most likely, Mali-T720 does not have the memory bandwidth usage optimization that are present in Mali-T760, which together with the more limited pixel processing throughput means that devices with a 1080p display such as the Meizu M2 Note may be impacted in terms of 1080p game performance and power efficiency for graphics-intensive operations.

World modem support in new MediaTek platforms


All new MediaTek SoCs (including Helio P10 (MT6755), MT6753, the low-end quad-core MT6735 and the announced high-end Helio X20 (MT6797)) have world-modem support, facilitating compatibility with more cellular networks used worldwide, including legacy CDMA networks in the US and other countries. This makes MediaTek SoCs more attractive to smartphone manufacturers targeting multiple or worldwide markets.

Sources: MediaTek (Helio P10 announcement), AnandTech (Helio P10 article)

Updated 6 June 2015.

Thursday, April 30, 2015

More details emerge about Cortex-A72 CPU core

Recently, more details have become available about the performance improvements implemented in ARM's Cortex-A72 core, which is a replacement for the high-performance Cortex-A57 core. Apart from the gains from using a more advanced process such as 14/16 nm FinFET, Cortex-A72 also implements fairly significant micro-architectural improvements affecting performance per cycle and power efficiency. AnandTech has published a detailed overview of these improvements.

Cortex-A57 based on Cortex-A15 and not fully optimized for power-efficiency


The Cortex-A57 CPU core, which was announced in 2012, has significant similarities to Cortex-A15, ARM's long-standing high-performance 32-bit CPU core, which has been known for relatively high power consumption. As such, it is not unexpected that improvements on the Cortex-A57 architecture (in the form of the Cortex-A72) have proven to be possible. Cortex-A57-based SoCs  such as Snapdragon 810 have been known to throttle, being forced to reduce the clock speed due to excessive heat production and power use, resulting in reduced sustained performance. Apple's A7 and A8 processors use CPU cores that most likely have strong similarities with Cortex-A57, but which exhibit little throttling due to a lower maxium clock speed, a lower number of cores and other factors related to the the chip design.

Increased level of sustained performance


ARM has made available a number of slides detailing the improvements in sustained performance and power efficiency in Cortex-A72 over Cortex-A57. On a 28 nm process and similar clock speed, ARM's charts indicate a roughly 20% improvement in power reduction. 

Sustained performance is expected to be higher than Cortex-A57, implementations of which (such as Snapdragon 810 and Exynos 5433, and to a lesser degree Exynos 7420) have suffered from an inability to maintain high clock speeds and throttle back to a relatively low speed due to heat production and associated power consumption. ARM gives a figure of sustained 750 mW operation per core on a 16FF+ process with a clock speed around 2.5 GHz.

In terms of IPC (instructions per cycle), ARM's information shows improvements in all instruction-level performance segments, with a 1.16x improvement for "analytics", 1.38x for cryptography, 1.50x for memory, 1.26x for floating point and 1.16 for integer compute. The increase in memory performance appears to be significant.

Improved single-core performance evident in early Geekbench results


Early Geekbench results for the MT8173 SoC from MediaTek, which includes two Cortex-A72 cores, give an indication of practical peformance of the Cortex-A72 core, although the exact clock speed the Cortex-A72 cores are running at is hard to determine. The following table shows single-core performance from a recent MT8173 Geekbench entry, comparing it to Exynos 7420 as used in the Samsung Galaxy S6. Both use 64-bit AArch64 mode.

SoC                        JPEG   Dijkstra  Lua   Mandelb. Stream SGEMM SFFT
                           Compr.                          Copy
28nm? MT8173 (Cortex-A72)  1429    1287     1675  1750     2217    979  1345
14nm Exynos 7420           1475    1082     1409  1147     1993    954  1379
The MT8173 easily matches the single-core performance of Exynos 7420, while showing significant improvements in the Mandelbrot floating point subtest and the memory-intensive Dijkstra subtest, and also the Lua subtest. Memory subtest (Stream Copy) performance is also better than Exynos 7420, despite the likely much wider memory interface of the latter, providing clear evidence of the improved memory performance (largely due to smarter prefetching) in Cortex-A72. Overall, since the MT8173 results reflects a SoC using 28 mn or perhaps 20 nm process technology, while Exynos 7420 uses Samsung's leading-edge 14 nm FinFET process, the ability of the MT8173 to beat Exynos 7420 in single-core performance while using a less advanced process is impressive and illustrates the performance improvements in the Cortex-A72 core.

Reduced silicon area results in lower cost


Cortex-A72 has a silicon area that is 10% smaller than Cortex-A57 on an equivalent process, while delivering improvements in performance and power efficiency. Already SoCs have been announced or described that utilize Cortex-A72 cores, such as MediaTek's MT8173 for tablets, Qualcomm's Snapdragon 618 and 620 for smartphones, and MediaTek's MT6797 (Helio-X20) for smartphones.

There seems to be a clear trend of using just two Cortex-A72 cores (instead of the four cores used in many Cortex-A57 implementations), reducing cost and maximum power consumption. These are cores are augmented by low-power, small-area Cortex-A53 cores running at a lower frequency. MT8173, Snapdragon 618 and Helio-X20 all use such as configuration.

Use of Cortex-A72 may be more effective than high-clocked Cortex-A53 cores


There are indications that Cortex-A53 cores running at a high frequency (such as implemented in MediaTek's MT6752 and MT6795 (Helio-X10), HiSilicon's Kirin 930 and to a lesser degree in Snapdragon 615 and the announced Snapdragon 415 and 420) run into a power efficiency bottleneck at higher clock speed, due the relatively steep increase in power consumption as the clock speed of the Cortex-A53 core increases above 1.3-1.5 GHz. Solutions that combine a small number of Cortex-A72 with lower-clocked, power efficient Cortex-A53 cores may prove to be a sweet spot in terms of practical performance and power efficiency for mid-range SoCs.

Source: AnandTech (Cortex-A72 Architecture Details article), Geekbench Browser

Thursday, April 23, 2015

Details surface about MediaTek's upcoming Helio-X20 SoC

Recently, details surfaced about MediaTek's upcoming Helio-X20 SoC, a high performance offering in the series of Helio-branded SoCs, of which the MT6795 (Helio-X10) is the first member. The deca-core Helio-X20, which has the model number MT6797, has a total of ten CPU cores and is the first mobile SoC with a hierarchy of three clusters of progressively less performance-oriented CPU cores: two ARM-Cortex-A72 cores, four high clocked ARM-Cortex-A53 cores and four lower clocked ARM-Cortex-A53 cores.

Three-cluster hierarchy extends the big.LITTLE principle


The SoC's ten CPU cores are organized as follows:
  • Two Cortex-A72 cores clocked up to 2.5 GHz to provide "extreme performance".
  • Four Cortex-A53 cores clocked up to 2.0 GHz for "best performance/power balance".
  • Four Cortex-A53 cores clocked up to 1.4 GHz for "best power efficiency".
The different clusters and their separate L2 caches are linked together using MediaTek's MCSI interconnect technology. MediaTek claims higher efficiency than big.LITTLE based designs, which have just two levels of cluster hierarchy.

The triple-level hierarchical design is a significant departure from the symmetric CPU configuration on current MediaTek smartphone SoCs such as MT6795 (Helio-X10) and MT6752, which have eight "equal" Cortex-A53 cores, although MediaTek does have experience with big.LITTLE, for example in the 32-bit MT6595 and some tablet processors.

Reports suggest the chip is manufactured using a 20 nm process at TSMC and will be in mass production as soon as July 2015. This marks MediaTek's first known product manufactured using a geometry below 28 nm.

Other features: ARM Mali-T880 MP4 GPU, dual-channel LPDDR3, world modem


Based on a recent report from Gizchina.com that gives more details about the specifications of the chip, other features include an ARM Mali-T880 MP4 GPU at 700 MHz and a dual-channel 32-bit LPDDR3 memory interface at 933 MHz. The maximum display resolution supported is 2560x1600. The integrated LTE modem has Cat. 6 capability. and also supports CDMA2000/EVDO Rev. A (world modem support). The video processor supports decoding and encoding of the H.265 format up to 4K resolution.

The report suggests the SoC will start shipping to manufacturers this summer with end products reaching stores by late autumn.

Execution issues at Qualcomm may help MediaTek's chances of success in high-end


Execution issues at Qualcomm regarding their high-end product roadmap may increase the chances of success of MediaTek's high-end product line. Qualcomm's Snapdragon 810 has some performance issues and has not been a great success, giving MediaTek the opportunity to capture more of the performance-oriented, premium level segment. MediaTek already has Helio-X10 (MT6795) in the market, which has gained design wins, but for which some key characteristics such as power efficiency are still unknown.

Meanwhile, MediaTek has come under pressure in the cost-sensitive smartphone SoC market, previously the bread-and-butter of the company, on which Qualcomm is encroaching by gaining market share for low-end devices in China. This is mainly the result of MediaTek's delayed introduction of cost-sensitive 4G SoC solutions.

MediaTek's sales performance under pressure


While MediaTek has made some progress penetrating the performance-oriented smartphone market with SoCs such as MT6752 and MT6795, it has lost ground in the cost-senstive smartphone segment among Chinese manufacturers, which it previously dominated. Although MediaTek's March 2015 sales rebounded from the low level of February, for the second quarter its sales performance is not expected to reach the level of previous quarters (such Q3 and Q4 of 2014). Indeed, the forecast given by MediaTek during its quarterly results presentation for Q1 2015 on April 30 sets sequential growth between -5% and +3% for Q2 2015, which represents a lower level of sales than the level MediaTek was accustomed to in 2014.

Due to a product mix with a significantly lower volume of cost-senstive SoCs, offset by some traction for performance-oriented SoCs, MediaTek's product mix has changed, with overall unit shipments and unit market share for MediaTek declining when compared to the previous year, despite likely higher performance-oriented chip shipments.

Update: MediaTek has officially announced Helio-X20


On 12 May, MediaTek officially announced Helio-X20. Most of the previously known details are confirmed in the announcement. The chip utilizes MediaTek's new CorePilot 3.0 heterogeneous computing scheduling algorithm, with together with the tri-cluster architecture should provide up to 30% reduction in power consumption. The chip has advanced camera features and has an ARM Cortex-M4-based sensor hub processor for better battery efficiency.

According to AnandTech, quoting MediaTek, the GPU used is not the Mali-T880 but an as yet unannounced Mali-T8xx series GPU, similar to Mali-T880. Compared to Helio-X10's PowerVR G6200, MediaTek sees a 40% performance improvement with a 40% drop in power.

Sources: CNXSoftware (Helio-X20 article), DigiTimes (MediaTek Q2 sales projection), DigiTimes (MediaTek Q2 2015 quarterly results), Gizchina.com (Comparison of MT6797 with Snapdragon 810), MediaTek (Helio-X20 announcement), AnandTech (Helio-X20 article)

Updated 21 May 2015.

Thursday, April 16, 2015

HiSilicon introduces Kirin 930/935, a performance-oriented Cortex-A53-based SoC

Huawei has introduced the Huawei P8 and P8max smartphones, featuring the Kirin 930 and Kirin 935 SoCs from Huawei's  HiSilicon semiconductor division. The octa-core Kirin 930 SoC is a performance-oriented SoC featuring only Cortex-A53 CPU cores. With a maximum clock frequency in excess of 2.0 GHz, it bears similarities to MediaTek's MT6795, but the use of a pseudo big.LITTLE configuration (four Cortex-A53 cores clocked up to 2.0 GHz and four Cortex-A53 cores clocked up to 1.5 GHz, for a total of eight cores) is reminiscent of Qualcomm's midrange Snapdragon 615 SoC, which runs at lower clock frequencies.

Huawei also introduced high-end models of both the P8 and P8max with larger storage capacity featuring the Kirin 935 SoC, which is a higher-clocked version of Kirin 930. The Huawei P8max is a smartphone with an unusually large 6.8" display.

SoC is targeted at performance-oriented devices


The Huawei P8 models are higher-priced performance-oriented smartphones, and the characteristics of the SoC match this segment. Apart from the high maximum clock speed of the Cortex-A53 cores, the external RAM interface is likely to be a dual-channel 32-bit configuration like previous performance-oriented SoCs from HiSilicon. Presentation materials from Huawei describe the Cortex-A53 cores in the faster cluster of four CPUs as being of a special, performance-enhanced type, which probably reflects the application of ARM's PoP core-hardening technology whereby the core is optimized for running at a specific frequency and a particular power profile, trading performance against die size. The process technology used is likely to be TSMC's proven 28HPM process.

The SoC is reminiscent of MediaTek's recently introduced MT6795 (Helio-X), which also targets the performance segment with an octa-core Cortex-A53 CPU configuration. MediaTek's SoC has been reported to have been adopted by competitors of Huawei such as HTC and Xiaomi.

Previous generation Mali-T628 MP4 GPU used


Rather than using an updated current-generation GPU like Mali-T760, the specs sheet for the P8max indicates the Kirin 930/935 SoCs continue to use the Mali-T628 MP4 GPU that was previously used in the Kirin 920 SoC. This GPU core is not known for great power efficiency, although there are suggestions that the more efficient Mali-T760 (which features memory bandwidth optimizations) has a relatively high silicon area and cost.

HiSilicon's new SoC line-up uses only Cortex-A53 CPU cores


Apart from Kirin 930, HiSilicon has also introduced the Kirin 620 SoC, which is an octa-core Cortex-A53 based SoC for the cost-sensitive segment, clocked up to 1.2 GHz and with a single-channel memory interface. This means Huawei now has in-house Cortex-A53-based SoCs suitable for most of its smartphone product range.

Tuesday, March 24, 2015

TSMC's 16 nm FinFET sees adoption by Qualcomm and Apple, competes with Samsung

TSMC will receive majority of Apple A9 business


According to reports, TSMC will receive the majority of Apple A9 SoC orders, which includes the A9 for next-generation iPhones and A9X for iPads. According to sources quoted by EE Times, Apple had originally planned to give Samsung a majority of the Apple A9 orders, but has recently shifted orders to TSMC, most likely using a 16 nm FinFET process.

Because ramping up production of a similar chip from a second source with different foundry technology is challenging and complicated, I believe it is likely that A9 production will be overwhelmingly (and perhaps exclusively) concentrated at TSMC. A parallel can be drawn with various reports from last year, which for a long time continued to echo incorrect projections that Samsung would serve a significant portion of the production of Apple's A8 generation SoCs, which has not turned out not to be the case.

In the mean time, TSMC's revenues continue to be a relatively high level despite Q1usually being seasonally down, with strong demand for 20 nm production, most likely reflecting continuing demand from Apple, which is offsetting weakness from Qualcomm for leading-edge processes. There have been rumours about an upcoming iPhone 6S and a lower cost iPhone 6C model which may involve substantial unit volumes. Apple's iPhone unit shipments have also been boosted by strong demand in China.

Low yield at Samsung and Exynos ramp contribute to TSMC orders


According to a source quoting sources in South Korea, TSMC's yield rate for its 16 nm FinFET process is better than that of Samsung's 14 nm process. Moreover, Samsung is seeing strong upcoming demand for it flagship Galaxy S6 smartphone, which uses the Exynos 7420 SoC produced on its 14 nm FinFET process, and most likely needs all capacity it can get to ramp up production of this SoC. Samsung also increasingly uses Exynos 7420 and other internally-developed SoCs for other product lines, such as other smartphone models as well as tablets.

Qualcomm said to have limited-time exclusive use of TSMC's 16FF+ technology


According a report by EETimes from a semiconductor industry conference in January, Qualcomm is likely to have locked up exclusive use of TSMC's 16FF+ process technology for about six months. The article appears to quote sources affiliated with Qualcomm that state that Qualcomm feels competitors such as MediaTek took advantage of previous-generation process technology (28HPM) that Qualcomm helped develop at TSMC, without having made the development investment that Qualcomm made.

However, this policy would be contrary to the principles based on which TSMC has operated for a long time, although the initial ramp of 20 nm at TSMC last year also seemed to be locked-up by another company (Apple). Its seems corporate pressure from these giant companies, backed by billions of dollars of cash, is forcing TSMC into these kinds of commitments.

The article mentions that the later access to 16FF+ won't affect MediaTek's mainstream products serving the mid-range to entry-level segments, because 28 nm technologies will continue to be used for such products in the market.

Leaked power consumption graphs suggest increased power efficiency


Power consumption graphs of current and upcoming high-end Qualcomm SoCs running a 3D game at high detail settings suggest power consumption and heat production of Qualcomm's unannounced Snapdragon 815 processor will be considerably lower than that of the Snapdragon 801 and Snapdragon 810, with Snapdragon 810 showing particularly unfavourable characteristics, as confirmed by widespread reports and reviews of Snapdragon 810-based devices.

Snapdragon 815 is unannounced and few details are known about it, with some reports suggesting the use of a next-generation Krait CPU core. Use of ARM Cortex-A72 processor cores appears to be not unlikely, since this core seems to be close to actual production. Most likely, the decreased heat production, which is likely to be associated with lower power consumption, is made possible by the use of the next-generation 16 nm FinFET process at TSMC.

Similar improvements in power consumption were observed for Snapdragon 620, which uses Cortex-A72 cores, when compared to the mid-range Snapdragon 615 SoC, which is reported to also have heating issues. Snapdragon 620, which has been announced, is also likely to have significantly higher CPU performance than Snapdragon 615 due to the use of Cortex-A72 cores, versus Cortex-A53 for Snapdragon 615, while also likely being produced on a much more efficient process (possibly  TSMC's 16FF+), since Snapdragon 615 is manufactured on a low-efficiency 28LP process.

Sources: EE Times (ISS 2015 conference report), EE Times (Apple A9 orders article), STJS Gadgets Portal (Snapdragon heat production graphs)

Updated 25 March 2015 (Add comments about 20 nm Apple production at TSMC).

Tuesday, March 10, 2015

Qualcomm's Snapdragon 808 fixes flaws of Snapdragon 810

Snapdragon 808 (MSM8992) is a performance-oriented SoC that Qualcomm announced last year together with Snapdragon 810. It has similarities to Snapdragon 810 (MSM8994), including the use of ARM Cortex-A57 CPU cores and Cortex-A53 cores in a big.LITTLE configuration. Snapdragon 808 appears to fix some of the performance flaws that are apparent in Snapdragon 810, especially the memory subsystem, while being significantly less costly.

Snapdragon 808 features


Features and differences with Snapdragon 810 include:

  • Snapdragon 808 has only two Cortex-A57 cores (revision r1p2) compared to four Cortex-A57 cores (revision rp1p1) for Snapdragon 810. Both contain four Cortex-A53 cores.
  • Snapdagon 808 has a more economical dual-channel LPDDR3 memory interface, compared to the LPDDR4 interface of Snapdragon 810.
  • Snapdragon 808 has an Adreno 418 GPU, compared to Adreno 420 in Snapdragon 810, presumably with somewhat lower performance.
  • Manufactured on TSMC's 20 nm process, the same as Snapdragon 810.
  • 4K resolution video playback (H.264/H.265), on-device display resolution up to 2560x1600 (Snapdragon 810 theoretically supports 4K on-device display resolution, but all currently announced smartphones using Snapdragon 810 are limited to a resolution of 1920x1080).

 

Early benchmark results suggest Snapdragon 808 fixes performance flaws of Snapdragon 810


Early benchmarks for Snapdragon 808 have already appeared on the Geekbench Browser. We can compare Snapdragon 808's single-core performance with Snapdragon 810 and Exynos 7420, all of which run in AArch64 mode in the published benchmark results.

To reduce the impact of thermal throttling, the best Geekbench subtest results for a given device have been collected and combined in the table below. I have made an attempt to estimate the actual maximum clock speed of the Cortex-A57 cores during the benchmarks, partly based on the maximum frequency reported by Geekbench when it appears to apply to the "big" cores and not the "LITTLE" cores.

SoC          "big" CPU                    Arch     JPEG (int)  Lua (int)   Mandelb. (float)
                                                   Comp. IPC         IPC         IPC

MSM8992      2 x 1.69? GHz Cortex-A57r1p2 AArch64  1257  1.96  1385  1.99  1031  1.79
MSM8994      4 x 1.8? GHz Cortex-A57r1p1  AArch64  1358  1.96  1283  1.73  1100  1.79
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  1486  1.96  1409  1.74  1198  1.78

MT6795       8 x 1.95 GHz Cortex-A53r0p2  AArch64  1026  1.37  1053  1.31   823  1.24
MT6795T      8 x 2.16 GHz Cortex-A53r0p2  AArch64  1128  1.36  1173  1.32   912  1.24

The IPC figures are calibrated on the Cortex-A7 core, whose IPC is fixed at 1.00. Fixing the maximum cock speed to 1.8 GHz for the MSM8994 (Snapdragon 810) results (based on HTC One M9 entries) and at 1.69 GHz for the MSM8992 (Snapdragon 808) produces similar IPC figures for the JPEG Compress integer test and the Mandelbrot floating point test, making them reasonably plausible. The best Lua subtest result for the MSM8992 shows a higher IPC, which may reflect improved L2 cache performance in the MSM8992, which uses a later revision of the Cortex-A57 core.

The single-core CPU performance results show no suprises, with Snapdragon 808 showing good performance that is slightly lower than Snapdragon 810, proportional to the lower maximum clock frequency in the tested devices. However, the Lua test shows higher performance with Snapdragon 808, which is especially true for the multi-core test (results not shown), where Snapdragon 810 seems to be limited to a score of about 1200 with little gain when compared to single-core performance, while Snapdragon 808 consistently scores in the region of 4000.

Memory subsystem performs much better than Snapdragon 810


The following table lists Geekbench scores for some memory-dependent tests. 

SoC          "big" CPU                    Arch     Stream Copy  SGEMM SFFT  SGEMM SFFT
                                                   Single Multi             Multi Multi
MSM8992      2 x 1.69? GHz Cortex-A57r1p2 AArch64  1527   1733   767  1126  1678  2946
MSM8994      4 x 1.8? GHz Cortex-A57r1p1  AArch64  1428   1838   741  1009  1870  3649
Exynos 7420  4 x 1.97 GHz Cortex-A57r1p0  AArch64  2003   2622   957  1363  2888  5014

MT6795       8 x 1.95 GHz Cortex-A53r0p2  AArch64  1356   2068   484   618  1542  4764
MT6795T      8 x 2.16 GHz Cortex-A53r0p2  AArch64  1350   2140   529   694  1659  5333

Notably, Snapdragon 808 delivers memory performance similar to Snapdragon 810 at much lower cost, despite using only a regular LPDDR3 memory interface, as compared to the Snapdragon 810's LPDDR4 memory interface which in theory delivers almost twice the bandwidth. This provides clear evidence that the Snapdragon 810's memory interface is still flawed, while that of Snapdragon 808 is much more optimized. Snapdragon 808 even beats Snapdragon 810 in the single-core SGEMM and SFFT test, despite running at a lower clock speed, which probably also reflects a more optimized and functional memory controller. Even in the multi-core SGEMM and SFFT tests, Snapdragon 808 is not much behind Snapdragon 810 despite having only half the number of CPU cores.

Comparison with MT6795


In the marketplace, Snapdragon 808 may compete with MediaTek's MT6795 (Helios X10), which is a cost-effective performance-segment SoC that only uses Cortex-A53 cores. Comparing Geekbench subtest results, MT6795 scores signficantly lower than Cortex-A57-based SoCs such as Snapdragon 808 in single-core benchmarks, although the gap is not very large except in the SFFT benchmark. The MT6795 does relatively well in multi-core benchmarks, where it beats the Cortex-A57-based Snapdragon 808 and Snapdragon 810 in most cases by a considerable margin, especially in the JPEG Compress, Lua and Mandelbrot tests which are sensitive to the number of CPU cores (multi-core scores have not been listed for these tests in the tables above). As an example, MT6795 scores 8167 in the multi-core JPEG Compress test, twice the score of Snapdragon 808 and almost 40% higher than Snapdragon 810.

Conclusion


Snapdragon 808 appears to be a much more optimized, less flawed SoC product than Snapdragon 810 that may perform similarly or even better than Snapdragon 810 in practical use cases due to the performance flaws present in Snapdragon 810. At the same time, Snapdragon 808 is likely be considerably cheaper. The only caveat is the question of whether excessive heat production makes thermal throttling necessary to the same degree as Snapdragon 810. With only two Cortex-A57 cores, the SoC should be less problematic in this regard.

Source: Geekbench Browser (MSM8992 results), Geekbench Browser (MSM8994 results), Qualcomm (MSM8992 specifications)

Updated 15 March 2015.

Tuesday, March 3, 2015

A detailed comparison of Cortex-A53-based and other SoCs using Geekbench, and impact of AArch64

More Cortex-A53 CPU core-based SoCs have recently come to market and more benchmark results are now available, for example from the Geekbench results database. Firmware is also becoming more mature. This makes it possible to make better comparisons between different Cortex-A53-based SoCs (for example, octa-core SoCs) and compare the performance of the highest-performance chips with competitive chips that use more expensive CPU cores such as Krait 400 and Cortex-A57.

Overview of Cortex-A53-based SoCs


The following is a list of Cortex-A53 CPU core-based mobile SoCs that have appeared in the market or for which benchmark results have become available. All chips integrate 4G LTE modem functionality unless otherwise noted.

  • Snapdragon 410 (MSM8916), utilizing four early Cortex-A53r0p0 cores. Numerous cost-sensitive smartphones now use this chip. However, none of them appears to take any advantage at all of the new ARMv8 instruction set, with all of them running in ARMv7 compatibility mode. This is counter-intuitive because AArch32 (32-bit version of ARMv8), which is used by the other SoCs, already brings significant benefits. Snapdragon 410 generally perform significantly worse than other Cortex-A53-based SoCs, even when correcting for the low clock speed. This is also reflected in memory performance. The Adreno 306 GPU tends to be even a little slower than the Adreno 305 GPU in Snapdragon 400. The net result is a chip that is not much faster than Snapdragon 400 in many cases while having worse battery life.
  • Snapdragon 615 (MSM8939), equipped with an octa-core Cortex-A53r0p1 CPU configuration with four cores running (in practice) at 1.54 GHz or 1.50 GHz and four cores running at a lower maximum clock frequency (probably 1.0 GHz). This chip has appeared in an increasing number of new smartphone models. Runs in AArch32 mode. Performance is significantly lower than MediaTek's octa-core Cortex-A53-based SoCs, which can run all eight Cortex-A53 cores at the maximum frequency. Memory performance is improved from Snapdragon 410 but falls short of that of MediaTek's SoCs. The Adreno 405 GPU is fairly competitive, suitable for a mid-range SoC, although the 32-bit RAM interface of the SoC limits performance, especially at high resolutions. It is manufactured used TSMC's lower performance 28LP process. There have been reports that the chip gets hot with intensive use and requires throttling.
  • MediaTek MT6732, with an quad-core Cortex-A53r0p2 CPU configuration running at a maximum clock speed of 1.5 GHz. Devices using the chip are starting to become available, and tablets with the tablet version of this chip (MT8732) have also been announced. Although it has only four CPU cores, it has good performance, beating Snapdragon 615 in single core performance at a similar clock speed, and memory performance is significantly higher. The Mali-T760 MP2 GPU contributes to better GPU performance than previous MediaTek chips targeting cost-sensitive segments, although falling short of that of Snapdragon 615 and MT6752. A tablet version of the chip exists as MT8732.
  • MediaTek MT6752, featuring an octa-core Cortex-A53r0p2 CPU configuration with a maximum clock frequency of 1.69 GHz. Several devices have come to market using this chip, including the Meizu M1 Note. Performance is excellent, with high scores in the Geekbench CPU benchmark, considerably higher than Snapdragon 615 and beating high-end SoCs such as Snapdragon 801 in several metrics. The Mali-T760 MP2 GPU is clocked higher than that of the MT6732, resulting in good GPU performance, comparable to that of Snapdragon 615, as measured with GFXBench, although the 32-bit memory interface will be a bottleneck at high resolutions. Manufactured using TSMC's high-performance 28HPM process. A tablet version of the chip exists as MT8752.
  • MediaTek MT6795, with an octa-core Cortex-A53r0p2 CPU with clock speed up to 2.16 GHz. With a dual-channel memory interface and high resolution support, this SoC targets a higher performance segment than the previously mentioned chips, for which it can potentially offer much better performance/dollar because of the small die size of Cortex-A53 cores. Originally announced as become available in commercial devices before the end of 2014, it was delayed but competitive benchmark scores for what appears to be more mature versions of the chip have recently shown up. It appears to be configured with full AArch64 mode. Performance is excellent, with single-core performance closing much of the gap with the high-end Snapdragon 801, while multi-core performance is significantly higher. There appears to be a "Turbo" version running the CPU up to 2.16 GHz, while the regular version clocks at 1.95 GHz. At the MWC on 2 March 2015, MediaTek apparently rebranded the MT6795 as Helio X10.
  • MediaTek's MT6735 is a SoC for entry-level smartphones for which benchmark results have not yet become available. It has a quad-core Cortex-A53 CPU configuration and a Mali-T720 GPU, a downgrade from the Mali-T760 GPU in MT6732. The recently announced MT6753, with eight Cortex-A53 cores running up to 1.5 GHz, is compatible with the MT6735 and also has a Mali-T720 GPU (probably MP4). Other chips that have shown up in product announcements include the MT8161 (probably the equivalent of the MT6735 without modem) and MT8165 (equivalent to MT8732 without modem).
  • Qualcomm has announced additional octa-core Cortex-A53-based chips, Snapdragon 415 and Snapdragon 425. These probably utilize symmetrical Cortex-A53 configuration with all cores running at the same maximum clock frequency, unlike Snapdragon 615. Otherwise, the new SoCs are similar to Snapdragon 615, with the same Adreno 405 GPU. According to Qualcomm, devices using these chips will become commercially available in the second half of 2015.
  • Kirin 620 (Hi6210) from HiSilicon (Huawei) is an octa-core Cortex-A53r0p3-based SoC running up to 1.2 GHz. The GPU is a Mali-450 MP4. Although performance (including single-core performance) is better than Snapdragon 410, it is not as optimized as chips such as MT6752 and runs at a relatively low clock speed. Multi-core performance scaling is less than expected.

Geekbench integer and memory scores comparison


The following table provides details about selected Geekbench integer and memory benchmark scores for different Cortex-A53-based SoCs, and also other smartphone SoCs from Qualcomm, MediaTek and Samsung for comparison.

                Arch    Max freq. JPEG C. IPC   JPEG C. Dijkstra      Stream Copy   Geekbench
                                  Single  x A7  Multi   Single Multi  Single Multi  Ref. number

Snapdragon 410  ARMv7     1.19      596   1.30   2384     810   2135   431   492    1551964
Snapdragon 615  AArch32 1.50/1.0    820   1.42   4979     886   3646   572   703    2015694
MT6732          AArch32   1.50      843   1.46   3357    1041   3002  1001  1199    1546611
MT6752          AArch32   1.69      952   1.46   7554    1144   4483  1071  1191    1583540
MT6795          AArch64   1.95     1026   1.37   8167     990   3802  1356  2068    2002894
MT6795T         AArch64   2.16     1128   1.36   8962    1064   4109  1350  2140    1984431
Hi6210          AArch32   1.20      660   1.43   3501     744   2772   602   900    1999304

Snapdragon 400  ARMv7     1.19      462   1.01   1860     700   2132   534   551    1938063
Snapdragon 801  ARMv7     2.46     1347   1.42   5437    1174   3586  1931  2144    1491681
Snapdragon 805  ARMv7     2.65     1475   1.45   4105    1230   4058  2117  2910    1502687
Snapdragon 810  AArch64  ?/1.55    1358          5972    1073   3584  1428  1838    2017257
MT6582          ARMv7     1.30      506   1.01   2027     748   2354   250   396    2017732
MT6592          ARMv7     1.66      643   1.01   5086     891   3327   261   388    2000008
MT6595          ARMv7   2.20/1.69  1350   1.59   6080    1844   5612  1652  1986    1591744
Exynos 5430     ARMv7   1.80/1.3   1056   1.52   5140    1102   3918  1457  1559    1556780
Exynos 5433     AArch32   1.89     1456   2.10   6209    1523   5728  1396  1458    2017193
Exynos 7420     AArch64  ?/1.50    1481          7168    1065   4596  1953  2579    2012972

The low performance of Snapdragon 410 is apparent in the scores, with normalized IPC (instructions per cycle to the equivalent of a 1.0 GHz Cortex-A7) for the CPU-speed sensitive single-core JPEG Compress benchmark being lower than that of other Cortex-A53-based SoCs, probably due to being limited to ARMv7. The Dijkstra benchmark even scores lower on Snapdragon 410 than on an equivalently clocked Snapdragon 400, and memory performance is also lower.

Snapdragon 615, while improving on Snapdragon 410, also appears to be less optimized than MT6732/MT6752 in terms of single-core IPC, despite a very similar clock frequency. Looking at multi-core performance, MT6752 is significantly faster than Snapdragon 615, largely due to being able run all eight cores at the maximum clock frequency. MT6732 and MT6752 also have significantly higher memory performance, reaching an impressive score for devices with a 32-bit memory interface.

The higher clock speed of MT6795 (Helio X10) brings benefits for integer performance, but due to the use of the AArch64 instruction set, normalized IPC is lower (1.36 vs 1.46 for JPEG Compress). This is especially true for the Dijkstra benchmark, where AArch64 mode imposes a significant penalty (this is also seen on other platforms utilizing AArch64).

Overall, a high-speed Cortex-A53 configuration such as implemented in the MT6795T comes fairly close to Snapdragon 801 for single-core performance, while being significantly faster for multi-core performance, at a significantly lower cost. Several metrics are also in the same ballpark as the current high-end leader Exynos 7420.

Analysis of the Geekbench Lua subtest


The Lua integer benchmark appears to be particularly sensitive to memory subsystem efficiency, including L2 cache size, and memory bandwidth as well being dependent on CPU speed. It is the kind of code that may frequently occur in actual practice on a smartphone.

                Arch      Lua     IPC   Lua    CPU    #CPUs
                          Single  x A7  Multi  Par.

Snapdragon 410  ARMv7      603    1.23  2137   3.54   4
Snapdragon 615  AArch32    709    1.15  1644   2.32   4 + 4
MT6732          AArch32    753    1.22  2419   3.21   4
MT6752          AArch32    842    1.21  2361   2.80   8
MT6795          AArch64   1053    1.31  8203   7.79   8
MT6795T         AArch64   1173    1.32  8847   7.54   8
Hi6210          AArch32    587    1.19  1740   2.96   8

Snapdragon 400  ARMv7      476    0.97  1874   3.94   4
Snapdragon 801  ARMv7      980    0.97  2880   2.94   4
Snapdragon 805  ARMv7     1016    0.93  2917   2.87   4
Snapdragon 810  AArch64   1283          1065   0.83   4 + 4
MT6582          ARMv7      514    0.96  1644   3.20   4
MT6592          ARMv7      651    0.95  1344   2.06   8
MT6595          ARMv7     1509    1.67  2498   1.66   4 + 4
Exynos 5430     ARMv7      981    1.33  1861   1.90   4 + 4
Exynos 5433     AArch32   1397    1.89  5478   3.92   4 + 4
Exynos 7420     AArch64   1409          7088   5.03   4 + 4

In this test, Snapdragon 410 performs reasonably well. MT6752's multi-core performance seems limited by a bottleneck, probably external memory bandwidth. MT6795's performance is impressive; while single-core performance falls a little short of Cortex-A57 based SoCs, for multi-core performance it blows past them, with CPU parallelism fully exploited. It seems the bottleneck present with the MT6752 (presumably memory bandwidth and the L2 cache memory size available to each core) is not present with the MT6795.

Qualcomm's Snapdragon 810 consistently scores in the 1000-1200 range for both the single-core and multi-core test, while the multi-core test would have been expected to be significantly higher. This appears to reflect a serious deficiency in the memory subsystem of the SoC (which might not only be related tot the LPDDR4 SDRAM controller, but also the on-chip L2 cache) which might also have negative implications for smoothness in every-day use.

Geekbench floating points subtests


Finally, let's look at floating point performance. The Mandelbrot subtest tests pure floating point performance, while the SGEMM and SFFT tests also significantly depend on memory performance.


                Arch      Mandelbrot                 SGEMM         SFFT
                          Single  IPC   Multi  Par.  Single Multi  Single Multi

Snapdragon 410  ARMv7      448    1.10  1794   4.00   245    489    317   1258
Snapdragon 615  AArch32    583    1.14  3611   6.19   303    688    426   2517
MT6732          AArch32    585    1.14  2336   3.99   337    653    430   1727
MT6752          AArch32    661    1.15  5257   7.95   384   1148    481   3870
MT6795          AArch64    823    1.24  6406   7.78   484   1542    618   4764
MT6795T         AArch64    912    1.24  7245   7.94   529   1659    694   5333
Hi6210          AArch32    467    1.14  3509   7.51   264    876    343   2178

Snapdragon 400  ARMv7      405    1.00  1620   4.00   203    634    285   1182
Snapdragon 801  ARMv7      788    0.94  3104   3.94   907   2816    992   3518
Snapdragon 805  ARMv7      848    0.94  3389   4.00  1011   2669   1130   4135
Snapdragon 810  AArch64   1100          5144   4.68   749   1828   1009   3643
MT6582          ARMv7      444    1.00  1765   3.98   230    512    328   1316
MT6592          ARMv7      568    1.00  4430   7.80   282    696    419   3397
MT6595          ARMv7     1284    1.71  5822   4.53   748   2337   1187   4255
Exynos 5430     ARMv7      990    1.61  4745   4.79   657   2491    896   3971
Exynos 5433     AArch32   1174    1.91  4883   4.16   751   2369   1044   4031
Exynos 7420     AArch64   1198          6129   5.12   945   2888   1313   4874

From these numbers its is clear that Cortex-A53 improves floating point performance somewhat when compared to Cortex-A7 at the same clock speed. When eight cores can run in parallel at high speed, multi-core floating point performance is impressive, as demonstrated by MT6752 and MT6795. Snapdragon 801 and 805 are looking a bit dated in this department.

In the memory-intensive SGEMM and SFFT tests, Snapdragon 400 comes close to Snapdragon 410, illustrating the lack of performance improvement by Snapdragon 410. In fact MediaTek's previous generation MT6582 matches the floating point performance of Snapdragon 410 across all tests.

The Cortex-A57 based SoCs have the highest single-core floating point performance, although the Cortex-A17-based MT6595 is also very strong. Exynos 5433 and Exynos 7420 beat Snapdragon 810 in most floating point tests, although the difference is not as large as it used to be with earlier results for Snapdragon 810.

Conclusion


It is clear that octa-core Cortex-A53-based SoCs can deliver strong performance at a relatively low cost, and this particularly true for MediaTek's new chips, MT6752 and MT6795. The MT6795, with its higher clock speed and dual-channel memory interface, can match current high-end chips in most metrics, being not much slower in single-core performance while being superior in multi-core.

One unknown question is whether the high maximum clock frequency of the MT6795 and MT6795T, which deliver impressive performance/dollar, translates to acceptable power consumption and battery life. Observations that power consumption for Cortex-A53 can quickly increase at higher frequencies for the Samsung-manufactured Exynos 5433 have been made, but MT6795 is manufactured on different process at TSMC and probably makes use of specific design optimizations for high clock speeds (ARM POP IP core hardening technology) that make power consumption more acceptable.

Sources: Geekbench Browser

Updated 10 March 2015.

Sunday, March 1, 2015

Samsung announces Galaxy S6 with Exynos 7420 SoC manufactured on "14nm" FinFET process

At the Mobile World Congress today (Sunday 1 March), Samsung announced the Galaxy S6 and Galaxy S6 Edge, featuring a numerous improvements over the previous generation Galaxy S5, including a SoC manufactured on Samsung's 14 nm FinFET-based process. The Galaxy S6 is planned to available in 20 countries starting on April 10th, 2015.

New model implement several improvements


The improvements in the new model include the following:
  • Exynos 7420 SoC manufactured on 14 nm FinFET process with 20 nm interconnects. The CPU is a big.LITTLE configuration with four Cortex-A57 and four Cortex-A53 cores, similar to Exynos 5433. The maximum clock speeds are 2.1 GHz and 1.5 GHz, respectively. Samsung claims 20% better performance and 35% better efficiency for the new chip when compared to Exynos 5433, which is manufactured using Samsung's 20 nm HKMG process.
  • The GPU has been rumoured to be a faster version of the Exynos 5433's Mali-T760 MP6 (either a higher clock rate or an MP8 configuration).
  • Early benchmarks indicate a significant increase in CPU and memory performance combined with a measurable increase in GPU performance (which is required because of the higher screen resolution).
  • Runs in 64-bit AArch64 mode, which has several advantages, as well as some disadvantages.
  • Uses new LPDDR4 SDRAM (3 GB), which has higher memory bandwidth at a given memory bus width due to higher effective clock speeds.
  • The cameras have been improved, including greater light gathering capability.
  • The 5.1" AMOLED screen's resolution is QHD (2560x1440), which is 77% more pixels than the FullHD (1920x1080) screen in Galaxy S5. The higher CPU, GPU and memory performance are essential to keep pace with increased demands caused by the higher resolution.
  • Utilizes the new UFS 2.0 interface for embedded flash memory, providing SSD-like performance according to Samsung.
  • Cat 6 LTE mode.
  • Touchwiz user-interface on top of 64-bit Android 5.0 is said to be more intuitive and less demanding in terms of processing requirements.
At the same time,  Samsung has dropped the MicroSD slot and the battery is non-removable. The battery capacity is also slightly smaller that of the Galaxy S5.

The Galaxy S6 Edge, like the Galaxy Note 4 Edge, features a screen that curves around the edges. It is priced significantly higher than the Galaxy S6, which will not be cheap either.

Quick ramp of 14nm FinFET process brings challenges to Samsung


The initial 14 nm FinFET process used by Samsung has been reported to use 20 nm interconnects with a 14 nm features size. As such it is more of an evolutionary step from 20 nm than full-blooded 14 nm FinFET would be, comparable to some degree with TSMC's 16FF process.

Still, Samsung will face a huge challenge ramping up the process in sufficient volume and acceptable yield rates to equip the high volume of Galaxy S6's expected. Rumours have mentioned low yield for the process in the recent past as Samsung started ramping up (test) production. Given the massive investment in the new process and non-optimal yield rates, it is unlikely that Samsung will significantly benefit financially from production of the chip in the near-term in terms of gross margin and other chip production-related metrics.

However, the performance lead of the Galaxy S6 made possible by the new chip could have significant positive implications for the sales and financial performance of Samsung's smartphone division, allowing Samsung to recoup some of its investment.

A few months ago, Samsung already signed an agreement with Apple whereby Samsung would supply part of the production capacity for future Apple processors. If this bears fruit it would allow Samsung to recoup more of its investment in 14 nm FinFET technology in the future.

Early benchmark performance impressive


In early benchmarks scores reported in Geekbench's result database, a device that probably is the Galaxy S6 shows impressive performance, well ahead of most existing SoCs and devices. In a direct comparison with an Exynos 5433-equipped Galaxy Note 4, the performance gain is fairly significant for most benchmarks (up to 30% for integer tests, higher for floating point), with a few negative outliers such as SHA2 and the Dijkstra integer subtest. The Dijkstra subtest also scores lower on other 64-bit AArch64 platforms, suggesting it suffers from particular AArch64 features such as the doubled size for pointer storage.

Memory performance is also significantly higher, aided by high clock rate and high amount of bandwidth delivered by the LPDDR4 memory interface, which unlike Qualcomm's Snapdragon 810 does not seem to have serious flaws.

Sources: AnandTech (Samsung annnounces the Galaxy S6 and Galaxy S6 Edge), AnandTech (Samsung Unpacked, MWC 2015 Live Blog), Geekbench Browser (Samsung SM-G925F)

Wednesday, February 25, 2015

Qualcomm has announced new SoCs, uses new Cortex-A72 core

Recently, Qualcomm announced a number of new SoCs for the cost-sensitive and performance segments of the smartphone market, namely Snapdragon 415 and Snapdragon 425 in the 400 series, and Snapdragon 620 and Snapdragon 618 in the 600 series.

New Snapdragon 415 an 425 offer mid-range performance features


Qualcomm's product line has been somewhat inconsistent recently, with products from a series for a certain segment actually being used for a different segment. For example, although the Snapdragon 410 SoC is the mid-range 400 series, it has actually been deployed in significant numbers of cost-sensitive entry-level 4G segment devices.

There used to be a gap in Qualcomm's product line, large in terms of performance level, between the lower mid-range Snapdragon 400 and the premium level Snapdragon 801. Not too long ago, Qualcomm addressed this gap with the mid-range Snapdragon 615, featuring a total of eight Cortex-A53 cores, four with maximum frequencies in the 1.5 - 1.7 GHz frequency range and four clocked lower (e.g. 1 GHz) for lower consumption. With the new Snapdragon 415, Qualcomm is bringing a SoC similar to Snapdragon 615 to the cost-sensitive mid-range segment, largely replacing the Snapdragon 410 for that tier (as I have discussed previously, Snapdragon 410's performance is flawed in several ways).

Snapdragon 415 could be a rebranding of 615 to replace 410, or maybe not


In fact, there is a possibility that Snapdragon 415 is actually the same chip and in fact a rebranding of the same product. Both Snapdragon 415 and Snapdragon 615 have a roughly similar CPU set-up (eight Cortex-A53 cores), an identical GPU (Adreno 405) and a Cat 4 LTE modem. Although Qualcomm in its press release mentions commercial availability in end-user devices for new chips will happen the second half of the year,  if it is the same chip it is likely that Snapdragon 415 will appear earlier (since it has essentially already in production for some time as Snapdragon 615), replacing Snapdragon 410. However, in its specifications page for Snapdragon 415, Qualcomm does not mention any distinction in CPU speed between cores, making it likely that it can run all cores at the maximum clock frequency, similar to MediaTek chips already on the market.

Meanwhile, Snapdragon 425 has a CPU configuration similar to Snapdragon 415 with a higher maximum clock speed, and also the same GPU, but has a more advanced modem with Cat 7 LTE, and better ISP functionality for camera processing. A comparison can be made with MediaTek's MT6752 which also has a 1.7 GHz octa-core Cortex-A53 CPU (Snapdragon 425 probably also drops the pseudo-big.LITTLE design of Snapdragon 615). Given the clock speeds, it is likely that Snapdragon 425 is manufactured on a higher performance process than TSMC's 28LP, most likely TSMC's 28HPM, like MediaTek's chips.

Symmetric octa-core CPU configuration has advantages for multi-threaded applications


Although "LITTLE" cores in a big.LITTLE configuration can be taken advantage of in multi-threaded algorithms, most applications and algorithms are designed for and work best with processor cores running at a comparable speed, distributing the workload evenly between cores, favouring symmetric CPU configurations in which every core can run at the same maximum frequency. This shows in the very high multi-core benchmark scores of chips using such a configuration, such as MT6752. It looks like Qualcomm is quickly moving towards such as symmetrical octa-core configuration (pioneered by MediaTek, which already has comparable chips on the market) for the cost-sensitive part of the market, up to the mid-range segment.

Snapdragon 415 and 425 not likely to be cheap in terms of manufacturing cost


Although the eight Cortex-A53 cores are relatively small so their consumption of die space is relatively limited, as I discussed earlier the Adreno 405 GPU, with its medium-level performance, appears to have characteristics of a GPU targeted at higher-end segments (in terms of ALU/shader performance, for example) and is likely have a relatively large die size in relation to the cost-sensitive segments it is addressing. Because of that, Snapdragon 415 seems to be somewhat of a stop-gap measure to replace the successful, but flawed in terms of performance, Snapdragon 410 SoC, as the gross margin on this chip could be relatively small.

The proliferation of chips such as Snapdragon 415 likely to continue Qualcomm's heavy reliance on TSMC's 28LP process, which is lower-performance process technology than 28HPM. Why Qualcomm would place such emphasis on this process for smartphone SoCs is unclear, since the advantages of the 28/20HPM process are very desirable for smartphone SoCs for everything but the entry-level segment, and competitor MediaTek has adopted this process for most of its range. Qualcomm has been using 28HPM and 20HPM for its Snapdragon 801, 805 and 810, although the it is likely Snapdragon 425, 618 and 620 will also be using it.

Little heard from quad-core Cortex-A53 Snapdragon 610


It would have made sense if the Snapdragon 610 (announced as quad-core Cortex-A53 CPU and the same Adreno 405 GPU as the products discussed above) would have trickled down to the 400 series. The fact that this chip has barely appeared on the market and that it is not mentioned in the press release suggests it won't come to market at all, perhaps due to technical problems with the chip or as the result of a strategic decision. An updated quad-core Cortex-A53-based solution would certainly make sense in Qualcomm's product line.

Snapdragon 618 and 620 have premium-level characteristics


Qualcomm also announced two new performance segment processors, Snapdragon 618 and 620. These are the first announced mobile chips to feature the new Cortex-A72 processor core from ARM, which is an improved version of the high-performance Cortex-A57 processor core. Snapdragon 620 has four Cortex-A72 CPU cores clocked up to 1.8 GHz  and four Cortex-A53 cores up to 1.2 GHz in a big.LITTLE configuration, while Snapdragon 618 reduces the number of Cortex-A72 core to two to provide a better balance in terms of cost.

Although on the surface the model numbers of these new SoCs may seem close to Snapdragon 615, their specifications suggest that they are targeting a significantly higher performance segment. The memory interface is a dual-channel interface supporting LPDDR3 up to 933 MHz, clearly a defining feature for a high-end product, and making the support for QHD (2560x1600) displays a sensible feature. They also feature a new, "next-generation" GPU.

In short, despite their model number, Snapdragon 618 and 620 have little to do with Snapdragon 615 and should be thought of as processors in the same segment as processors from the Snapdragon 800 series such as as the Snapdragon 801 and Snapdragon 808. If and when Snapdragon 808 (with two Cortex-A57 cores and four Cortex-A53 cores) will appear on the market is unclear (some test results have appeared in the Geekbench database), the new announcement might suggest that it will quickly be superseeded by Snapdragon 618.

Sources: Qualcomm (SoCs announcement)

Updated February 26, 2015 (Edited and expanded to reflected likelyhood that Snapdragon 415 and 425 use a symmetrical CPU configuration, not pseudo-big.LITTLE like in Snapdragon 615).

Tuesday, February 17, 2015

Cortex-A53 not as power efficient as Cortex-A7

Recent detailed technical review articles published by AnandTech based on a comparison of Samsung Exynos SoCs have elucidated some of the details about the performance of the Cortex-A53 core, including processing performance, power consumption and die size. Overall, it appears that while Cortex-A53 is significantly faster than Cortex-A7 at the same clock speed, die size and power consumption on an equivalent manufacturing process has increased by a greater amount, leading to lower performance/Watt.

Direct comparison of Cortex-A7 and Cortex-A53 on the same process


In a recently published technical review article about the ARM Cortex-A53, Cortex-A57 CPU cores and Mali-T760 GPU core, based Samsung's Exynos-based Galaxy Note 4 model, AnandTech has provided details about the performance, power consumption and die size of the 64-bit Cortex-A53 core relative the its 32-bit predecessor, Cortex-A7. It has done so by comparing measurements of the Cortex-A53 cores inside the Exynos 5433 used in the Note 4 with the Cortex-A7 cores inside the Exynos 5430 used in the Galaxy Alpha. Both SoCs are produced using a similar 20nm process at Samsung, making a direct comparison possible.

Cortex-A7 is an in-order pipeline CPU core with moderate performance but an extremely small die size and very low power consumption. The Cortex-A53 core has been designed by ARM as a logical extension of Cortex-A7 to ARM's 64-bit ARMv8 instruction set with higher performance. However, in doing so die size and power efficiency have suffered somewhat.

CPU performance increased in Cortex-A53


According to the designer of Cortex-A53 at ARM, Cortex-A53 increases SPECint-2000 performance from 0.35 SPEC/MHz to 0.50 SPEC/MHz when compared to the Cortex-A7 core. In Geekbench integer benchmarks, disregarding cryptography benchmarks which a show a large increase, performance is still about 50% higher for Cortex-A53 when compared to Cortex-A7 at the same clock speed, with the biggest gains coming with multi-threaded performance (aided by the increased memory performance).

For floating point benchmarks the performance increase reported by AnandTech is dramatic, with most benchmarks showing a two to three times performance increase. However, there seems to be a discrepancy between these benchmarks results and benchmark results available from the Geekbench results database for Cortex-A53 and Cortex-A7-based devices, showing ony a moderate floating point performance increase for Cortex-A53 over Cortex-A7. Most likely, AnandTech is erroneously reporting Cortex-A57 core floating performance in this case (this matches Geekbench results that I previously tabulated).

Memory performance benchmarks performed by AnandTech show a relative increase in latency for a Cortex-A53 cluster between transfer sizes of 256 KB and 512 KB when compared to a Cortex-A7 cluster, despite the fact that this should fit inside the 512 KB L2 cache. However, as I previously noted in earlier blog articles, the benchmarks show that memory bandwidth has significantly increased with Cortex-A53 when compared to Cortex-A7, virtually doubling. This most likely contributes to the Cortex-A53 core's greater multi-threading performance in practice.

Power consumption of Cortex-A7 greatly reduced with Samsung's 20 nm process


AnandTech has published a detailed chart showing estimates for power consumption of the previous generation 32-bit Cortex-A7 and Cortex-A15 cores on both 20 nm and 28 nm processes at Samsung, based on Samsung's Exynos 5422 (28 nm) and Exynos 5430 (20 nm) SoCs.

While the high-performance Cortex-A15 cores are seeing a power reduction of about 25%, power consumption of the Cortex-A7 cores sees a significant 40% reduction with a 56% reduction at the highest CPU frequency of 1300 MHz. This can be partly explained by Samsung optimizing the Cortex-A7 cores inside Exynos 5430 for low power consumption using ARM's POP IP optimization platform.

Ironically, the excellent power characteristics of the Cortex-A7 at the latest processes such as Samsung's 20 nm process have not been taken advantage of in the market except in Samsung's Exynos big.LITTLE 5430, since Cortex-A7 adoption is mostly limited to 40 and 28 nm and all announced 20 nm SoCs use Cortex-A57 and Cortex-A53 cores. There seems to be an opportunity for ultra-efficient 20 nm Cortex-A7-based SoCs for certain product segments, while there is also a significant opportunity for 20 nm Cortex-A53-only SoCs that should be more power efficient than their 28 nm equivalents.

One could envision a hypothetical octa-core Cortex-A7-based SoC manufactured on TSMC's 20nm HPM process delivering spectacular performance/Watt, with relatively high clock speeds being possible. AnandTech's article notes that TSMC's 28nm and 20 nm HPM processes are most likely significantly more efficient than Samsung's equivalent process technology because they allow CPUs to operate at lower voltage level. A similar argument applies to Cortex-A53-based SoCs manufactured at 20 nm, albeit with lower performance/Watt.

In terms of die size, AnandTech reports a significant reduction of 45% for the the Cortex-A7 cores and 64% for the Cortex-A15 cores in the 20 nm Exynos 5430 vs 28 nm Exynos 5422.

Cortex-A53 has significantly greater power consumption than Cortex-A7


AnandTech has published a detailed chart with power consumption characteristics of the Cortex-A53 cores inside Samsung's Exynos 5433 manufactured at 20nm. In their analysis, AnandTech notes a relatively large increase in power consumption when utilizing multiple Cortex-A53 cores at their highest frequency (1300 MHz on Exynos 5433), when compared to running at 1.0 GHz. This correlates with a voltage bump when going from 1.0 to 1.3 GHz.

Based on this analysis, the article concludes the power consumption is more than twice as large for Cortex-A53 when compared to Cortex-A7 at an equivalent clock speed of 1300 MHz at a similar manufacturing process (Samsung's 20nm process). Although the Cortex-A53 core's CPU performance is greater, it is not twice as great leading to clearly lower performance/Watt for Cortex-A53 when compared to Cortex-A7.

It is possible that the chip errata (hardware bugs) in earlier revisions of Cortex-A53 that I mentioned in previous articles play a role in reducing the measured performance and power efficiency of Cortex-A53. Exynos 5433 uses Cortex-A53r0p1, which is affected by this. The chip errata require more frequent cache flushing as a work-around, which can potentially affect performance as well as power consumption. The non-optimal state of big.LITTLE kernel scheduling code may exacerbate these problems. There is potential for later revisions of Cortex-A53 such as r0p3 to deliver higher efficiency because they are not affected by these hardware problems. Chips with Cortex-A53 revision r0p3 have not yet appeared on the market.

Chip-specific core optimizations makes comparisons more difficult


It should be noted that specific optimization of the processor cores for a particular higher clock frequency target (e.g. in chip like MediaTek's MT6752 and MT6795) or low power consumption at lower clock frequency (for example, in a big.LITTLE configuration), using ARM's POP core hardening technology, has the potential skew the comparison between different chips. MediaTek's MT6752 has already been reported to have acceptable power consumption while running at relatively high maximum clock frequency, which would otherwise be incompatible with the steep rise in power consumption for clock speeds above 1.2 GHz observed in the charts for the Samsung chips.

Die size of Cortex-A53 increased compared to Cortex-A7


The die size of Cortex-A53 cores when compared to Cortex-A7 in Samsung's chips is about 1.75 times greater according to AnandTech, although it is still below one square millimeter, which is still low for a CPU. When looking at the total cluster size, which includes the L2 cache (the same amount of 512 KB for Cortex-A53 and Cortex-A7), the die size of the cluster is 1.38 times greater. The larger die size has consequences for cost-sensitive SoCs for low-end mobile devices and IoT applications, for which Cortex-A7 remains more attractive. Cortex-A7 can also be employed as an embedded CPU in a functional block such as a baseband processor,  just like Cortex-A5 is frequently used.

Consequences for mobile SoCs


The higher performance of Cortex-A53 when compared to Cortex-A7, especially memory bandwidth, makes high-clocked multi-core Cortex-A53-based SoCs suitable for mid-range performance segments. Examples of this are MediaTek's MT6752 and Qualcomm's Snapdragon 615 SoC. These SoCs also have higher GPU performance than that traditionally associated with Cortex-A7-based SoCs.

The increased power consumption and die size of Cortex-A53 causes Cortex-A7 to remain relevant, because it still delivers superior power efficiency, cost and die size, and consequently performance/Watt and performance/dollar are better than Cortex-A53. Hypothetically, a 20nm octa-core Cortex-A7 based SoC would deliver excellent power efficiency with quite acceptable performance due to higher clock speeds, and their may be a market for such a solution for smartphones. The main drawback would be that OS ecosystems such as Android are moving towards 64-bit implementations and can also make use of new cryptography instructions in ARMv8.

Sources: AnandTech (technical Exynos Galaxy 4 Note review)

Updated 1 March 2015 (Add section about core-hardening).

Thursday, January 8, 2015

New mobile SoCs announced at CES

At the Consumer Electronics Show in Las Vegas, USA this week, a large number of new devices as well as chips for various kinds of multimedia devices is being announced, including mobile SoCs for smartphones and tablets. Several of the newly announced SoCs use Cortex-A53 CPU cores.

Rockchip announces octa-core Cortex-A53 tablet SoC


Rockchip announced the RK3368 at the show, which is a tablet processor with eight Cortex-A53 cores clocked up to 1.5 GHz and an unnamed GPU supported OpenGL 3.1. Rockchip also claims 4Kx2K H.264/H.265 video playback capability and HDMI 2.0 display output supporting 4Kx2K resolution. Early information about this chip became available a few months ago, when it was codenamed "MayBach". Rockchip mentions support for Android Lollipop in its materials.

The quoted maximum clock speed of 1.5 GHz is not very high, but an up-to-date revision of the Cortex-A53 core should provide good CPU performance at that speed even for single-core, and the octa-core configuration will provide very good multi-core performance. At which foundry it is being produced in unclear; in the past Rockchip has been using the 28 nm SLP process at GlobalFoundries for its high-performance chips, although plans for chips produced at TSMC have been reported.

Most of the specifications suggest that the chip is targeted at the performance segment, more or less as a replacement for the RK3288 that is more suitable for tablets due to lower power consumption. Based on the fact that DirectX support up to 9.3 is claimed as well as OpenGL 3.1, the GPU is most likely a Mali-T760 GPU. The RK3288 already contains a performance-oriented Mali GPU, of which the exact nature is unclear. The memory interface is likely to be 32-bit dual-channel with support for LPDDR3, similar to the RK3288 and suitable for performance-oriented devices.

Allwinner announces low-cost quad-core Cortex-A53 tablet SoC


Meanwhile, Allwinner, Rockchip's archrival in the Chinese tablet processor market, announced the A64, a new low-cost tablet processor with four Cortex-A53 CPU cores. Allwinner quotes a price of $5 for the chip. The SoC appears to be the logical successor to the recently introduced A33 with Cortex-A7 cores, which is also a low-cost quad-core tablet processor that appears to have been less successful than anticipated. Allwinner also recently introduced an octa-core Cortex-A7-based SoC, the A83T.

The new SoC supports H.265/H.264 decoding in hardware, and is compatible with various types of DDR memory (presumably in a single channel 32-bit configuration). 4K HDMI output is also listed.

MediaTek announces Android TV and wearable device SoC platforms


Outside of the mobile space, MediaTek (which has long being prominent in the digital television SoC space, both through its internal division and through MStar, which it acquired not too long ago), announced a new digital television SoC, MT5595, with support for Android TV.  Sony will be using the chip in new LCD TV models. The chip has a big.LITTLE-type CPU configuration with two Cortex-A17 cores and two Cortex-A7 cores, and has hardware support for HVEC (H.265) and VP9 for 4K2K content streaming at 60 frames per second. As shown by the MT6595 smartphone SoC, MediaTek's Cortex-A17 implementation can provide very high single-core CPU performance, which is probably helpful in providing good performance and response times on the Android TV platform.

MediaTek has also announced an optimized solution for wearable devices based on Google’s Android Wear software. The MT2601 is equipped with a dual-core Cortex-A7 CPU up to 1.2 GHz and a single-core Mali-400 MP GPU, with support for display resolutions up to qHD (960x540). In several respects, these specifications match those of MediaTek's existing low-cost MT6572 smartphone SoC. MediaTek is touting the small die size and power efficiency of the new chip. It can be paired with various external wireless connectivity chips including the recently introduced MT6630 for Bluetooth (MT6630 also integrates advanced WiFi, GPS and FM radio functionality).

Sources: CNX Software (Rockchip RK3368), CNX Software (Allwinner A64), MediaTek (MT5595 announcement), MediaTek (MT2601 announcement)

Tuesday, December 30, 2014

Early benchmarks for Snapdragon 810 show performance flaws

Recently, reports have surfaced, including one from BusinessKorea published on December 4, about Qualcomm's new high-end chip, Snapdragon 810, being affected by performance issues related to heat production and issues with the memory controller. Subsequently, Geekbench results for some Samsung prototype devices using the SoC (MSM8994) have also appeared in the Geekbench results database. Detailed analysis of the Geekbench results seems to confirm the issues with thermal throttling and especially memory controller performance, at least in the early revision of SoC that was used to obtain the mentioned benchmark scores, resulting in sub-par performance for its segment.

Updated (January 5, 2015): A section has been added discussing new Geekbench results from a LG G Flex2 prototype using Snapdragon 810, which shows improvement in some areas.

Snapdragon 810: A departure from Qualcomm's in-house Krait cores


For a long time, Qualcomm has used its own ARM-compatible Krait cores (most recently Krait-400/450 in Snapdragon 801/805) for SoCs targeting the performance segment. However, with Snapdragon 810 (as well as Snapdragon 808 and to a certain extent Snapdragon 615), Qualcomm seems to be migrating to standard ARM cores for performance-oriented SoCs. Some time ago, Qualcomm already transitioned its cost-effective SoCs (such as the Snapdragon 200 and 400 series) to cost efficient ARM cores such as Cortex-A7 (and later Cortex-A53).

Snapdragon 810 contains four Cortex-A57 cores (clocked up to about 1.5 GHz based on current evidence) as well as four Cortex-A53 cores in a big.LITTLE configuration. In this respect the chip is similar to Samsung's Exynos 7 Octa (5433) that has already been shipping for several months in devices such as the Galaxy Note 4 and shows impressive CPU performance. However, Snapdragon 810 is the direct successor to Snapdragon 805 and has a similarly ambitious memory interface with high total bandwidth (pioneering the use of new LPDDR4 SDRAM), which puts it squarely in the very high end category, like Snapdragon 805.

Qualcomm also has a SoC in planning for the more mainstream part of the high-end performance segment, Snapdragon 808, which has two Cortex-A57 cores instead of four while retaining the four Cortex-A53 cores. Importantly, Snapdragon 808 also simplifies the memory interface to dual-channel 32-bit with more standard LPDDR3 memory instead of LPDDR4, reducing cost and being comparable to Snapdragon 801, the current high-end standard.

20nm process and LPDDR4 memory


Snapdragon 810 is Qualcomm's first SoC product to be manufactured using TSMC's 20nm process technology. 20nm, in theory, significantly increases performance and power efficiency when compared to the 28nm process technology that Qualcomm has been using recently for most of its chips.

The SoC also features a LPDDR4 external memory interface in a dual-channel 32-bit configuration, with maximum clock speed of 1600 MHz according to Qualcomm's webpage, resulting in memory bandwidth of 25.6 GB/s, similar to Snapdragon 805, which achieves its bandwidth with a wide 64-bit dual channel memory interface with LPDDR3. This is a very high amount of memory bandwidth for a mobile device, making the chip suitable for driving very high resolutions such as QHD. However, it also increases cost, and the apparent requirement of using higher-clocked LPDDR4 memory instead of mainstream LPDDR3 is also likely to increase cost, despite the reduction in memory bus width allowed by LPDDR4.

Snapdragon 808 likely to be more attractive for high-volume flagship devices


Meanwhile, Snapdragon 808 seems to provide a more practical performance-oriented platform by utilizing standard LPDDR3 in a dual-channel 32-bit at a clock speed up to 933 MHz, resulting in maximum memory bandwidth of 14.9 GB/s. Overall, Snapdragon 808 seems to be much more attractive for high-volume high-end devices as a successor to Qualcomm's popular Snapdragon 801.

Performance flaws evident in early Geekbench database entries


Early Geekbench results database entries show lower-than-expected CPU and memory performance, and detailed analysis of the results seems to confirm the reports about thermal throttling due to heat production as well as lower-than-expected memory performance. In practice, the version of Snapdragon 810 that was benchmarked seems to provide performance lower than even Snapdragon 801 in most respects.

Performance data for Snapdragon 810 in the Geekbench entries is clouded somewhat because of the use of 64-bit Aarch64 mode in Android. Until now, most Cortex-A57 and Cortex-A53 based solutions use AArch32 (32-bit ARMv8 mode, which takes advantage of some of the new features of Armv8 but is not fully 64-bit). Android AArch64 support and performance has been work in progress and is still likely to be not fully optimized. However, in the case of the Snapdragon 810 results, the performance deficit is of such magnitude that is clear that they are caused by flaws in the chip implementation and not AArch64 mode.

In the table in the Appendix below, some Snapdragon 810 and 801 results have been highlighted in bold to show some of the performance differences and in particular the areas where Snapdragon 810 performance is much lower than expected.

There are several entries for the device in the database that show considerable variation between runs, providing evidence that performance throttling caused by heat production is a significant problem. For the analysis below, the best benchmark result among the various entries has been used. There is evidence that some of the later entries impose a CPU clock speed limit of about 1.0 GHz or perhaps only use the Cortex-A53 cores in some cases (these entries are also represented in the table).

Deficits in pure CPU performance, especially multi-core


Compared to Samsung's Exynos 7 Octa (5433), which has a similar CPU configuration, basic integer tests such as JPEG Compress already show somewhat lower than expected performance based on the reported clock speed, with multi-core performance scaling being considerably less than expected, and also clearly lower than Snapdragon 801. The Dijkstra benchmark, which has more external memory access and branching, is more heavily affected and is at least 35% slower than on Exynos 5433, despite a similar clock speed, and slower than Snapdragon 801 as well as Snapdragon 805. However, this may for a large part be due to running in AArch64 compared to 32-bit mode used on the other chips, since the Dijkstra benchmark seems to similarly affected on other platforms that use AArch64.

For floating point performance, pure single-core performance, as shown by the Mandelbrot subtest results, is relatively unaffected, but multi-core performance scaling is much lower than Exynos, resulting in performance comparable to Snapdragon 805 rather than the higher floating point performance expected from Cortex-A57 cores (such as in Exynos 5433).

Memory performance significantly impacted


Memory performance is clearly seriously affected, confirming reported issues with the memory controller. The raw throughput of the Stream Copy subtest is signficantly lower than expected based on the 32-bit dual-channel memory interface with double-speed LPDDR4, being lower than Snapdragon 805 with a similar amount of memory bandwidth and even significantly lower than Snapdragon 801 with its 32-bit dual-channel LPDDR3 interface.

The flaws in memory performance are evident in the SGEMM subtest, which is a floating point test that is heavy on sequential memory access. Snapdragon 810 shows performance for this test barely more than half that of Snapdragon 801 and 805. It is even worse for the multi-core test, where Snapdragon 810 shows performance scaling worse than two times, while Snapdragon 801 and 805 have performance scaling more in line with the four CPU scores they possess.

Finally, in the SFFT test, which is a floating point test with heavy random memory access, only shows roughly half the performance of Snapdragon 801, Snapdragon 805 as well as Exynos 5433. This seems to provide the clearest evidence of performance problems with the memory controller.

Snapdragon 810 likely to be too costly for mainstream high-end devices


In popular technology websites on the internet, Snapdragon 810 has recently frequenty been mentioned as the likely chip used for future high-end models for a diverse range of well-known manufacturers such as Samsung, HTC and LG. However, the high-banwidth LPDDR4 memory interface (which increases device cost) and performance targets seems to put it clearly in the very high end category, comparable to Snapdragon 805, which does not make it ideal for high-volume performance devices that do not have an extremely high screen resolution such as QHD (2560x1440). Other new chips such as Snapdragon 808 and (for mid-range) Snapdragon 615 seems to be more suitable for performance-oriented mainstream devices, including several of the mainstream flagship devices from the mentioned manufacturers.

However, if the performance flaws that are evident in the current Snapdragon 810 are not fixed or if Qualcomm has significant inventory of flawed chips, it is possible that they will be unloaded onto the more mainstream performance segment for a discounted price. It seems likely however that Qualcomm, given its chip expertise, will be able to fix most of the performance issues with the Snapdragon 810 in a future revision of the chip.

Update (January 5): LG prototype shows better multi-core performance


A Geekbench test run was recorded on January 5 for a prototype LG G Flex2 with Snapdragon 810. This result shows some improvements, especially in the overall multi-core score, although it still well below that of Exynos 7 Octa (5433) which has a similar CPU configuration.

A closer look reveals that integer benchmarks, especially the more memory-intensive Dijkstra subtest, has not materially improved over the prior results. Multi-core floating point performance has improved significantly and contributes to the higher total multi-core score.

However, memory tests show mixed results. The Stream Copy subtests are lower than the previous best results from last month, remaining significantly lower than Snapdragon 805 and even Snapdragon 801, suggesting that sequential memory access performance has not improved. This is corroborated by the SGEMM subtest results, which also depend on sequential memory access performance and show results that are very similar to the earlier scores.

Meanwhile, the SFFT scores show a significant uptick, especially for multi-core performance, suggesting that Qualcomm has been able to improve the random memory access performance of the chip. However, the subtest scores are still clearly below those of Exynos 5433, Snapdragon 805 and even Snapdragon 801.

Update (January 10): New prototype entry shows improvements in memory performance


A subsequent Geekbench result entry recorded on January 9 for an unknown device shows further improvements in memory performance, although still falling short of the memory performance of the more mainstream Snapdragon 801 (let alone Snapdragon 805). The single-core JPEG Compress subtest result is also improved, but overall the CPU performance results still suggest that thermal throttling because of overheating is still likely to be a significant problem.

Appendix: Geekbench performance table


The table below is similar to the one published in my previous article. In the bottom half of the table, some relevant benchmark scores for Snapdragon 810 and Snapdragon 801/805 have been highlighted.

For a high-resolution version, view/copy/save the image above using the browser.

Sources: BusinessKoreaGeekbench browser (Samsung SM-N916S results), Qualcomm (Snapdragon 810 page), Wikipedia (Qualcomm Snapdragon)

Updated (January 5, 2015): Add discussion of recent LG prototype Geekbench test results, update performance table (also include Intel Atom results).
Updated (January 8, 2015): Correct DRAM interface of Snapdagon 810 (it is 32-bit dual-channel using LPDDR4, which can be clocked much higher than LPDDR3).
Updated (January 10, 2015): Add discussion of new Geekbench result entry, updated table.

Monday, December 29, 2014

Another look at Cortex-A53 CPU core performance

Several smartphone chips using ARM's new Cortex-A53 and Cortex-A57 CPU cores with the 64-bit-capable ARMv8 instruction set have arrived on the market recently. Cortex-A53-only based SoCs are especially attractive from a performance/dollar standpoint. However, as I described in earlier articles, there exist significant performance differences between different Cortex-A53 implementations, with some early revisions of the core being limited in performance, probably because of design bugs.

32-bit version of ARMv8 seems practical


Most of the Cortex-A57 and Cortex-A53-equipped SoCs currently seem to be running in what can be called "32-bit ARMv8 mode" (AArch32 in Geekbench, as opposed to ARMv7 for older 32-bit devices), taking advantage of some of the features of the ARMv8 instruction set (which is better suited to modern CPU chip architectures) while preventing some of the disadvantages of the full 64-bit model (such as doubled storage space for pointers and addresses).

Whether the full 64-bit instruction model (AArch64) will soon be attractive for Android devices, including lower-end ones such as Cortex-A53-based devices with limited amounts of CPU cache and RAM, is unclear. NVIDIA already uses AArch64 in conjunction with their latest Tegra K1 SoC. Optimizations for AArch64 seem to have been work in progress and early benchmarks for systems running in AArch64 mode were quite poor in comparison to 32-bit mode benchmarks, but progress is been made. Theoretically, more registers are available in AArch64 mode, also to the NEON SIMD unit, which should help performance in some important cases, and may mitigate the disadvantages of increased address storage size.

Snapdragon 410 has crippled first revision of Cortex-A53


Snapdragon 410 (MSM8926) is a SoC with quad-core Cortex-A53 that has been one of the first chips with Cortex-A53 cores to come to market and has already been adopted in significant volume for low-to-mid-range designs, replacing the older Cortex-A7-based Snapdragon 400.

However, it is obvious that the very first public revision of the Cortex-A53 core as used inside Snapdragon 410, Cortex-A53r0p0, is crippled in terms of performance, clearly scoring lower in CPU and memory-intensive benchmarks (even after making the significant correction for clock speed) than SoCs using later revisions of the Cortex-A53 core such as Snapdragon 615 and MediaTek's new chips. Coupled with Snapdragon 410's relatively low clock speed of 1.19 GHz, this results in significant lower performance than the newer mid-range chips mentioned. Performance in complex benchmarks that simulate demanding, typical use such as complex browsing and gaming is even worse.

Advertising of Snapdragon 410 as having 64-bit support is very misleading


The lower performance seems to be partly associated with the fact that Snapdragon 410 (because of the r0p0 revision of Cortex-A53) is completely limited to ARMv7-compatibility mode and is unable to run in ARMv8 mode (32-bit or otherwise). I have yet to see evidence of a shipping Snapdragon 410 chip that is 64-bit or even ARMv8 capable. It functions as nothing else than having somewhat faster 32-bit ARMv7 Cortex-A7 cores. In this sense, labeling the chip as being 64-bit or potentially having support for the 64-bit ARMv8 in a future update is downright misleading or a blatant lie, depending on one's standpoint.

Memory performance seems most impacted


Based on Geekbench results, Snapdragon 410 has about 10% lower pure integer CPU performance per MHz when compared to chip such as Snapdragon 615 and MediaTek's MT6732/MT6752. For pure floating point performance, performance is about 5% lower. The biggest difference is in memory performance, where Snapdragon 410 is about 25% slower than Snapdragon 615 (with r0p1 Cortex-A53) and more than two times slower (even when correcting for clock or memory speed) than MT6732/MT6752 with Cortex-A53 r0p2. Another big difference is found in cryptography performance because of the extra ARMv8 instructions that apparently are not available to Snapdragon 410.

A large part of the lower performance of the Cortex-A53 cores inside Snapdragon 410 may be due to chip design bugs as evident from errata issued by ARM for earlier revisions of the Cortex-A53 core. Some details about these errata, which mostly involve memory coherency issues related to CPU cache memory, can be found when compiling a Linux kernel.

Snapdragon 410 shows poor scores in real-world benchmarks


While Snapdragon 410 delivers somewhat better scores than the Cortex-A7-based Snapdragon 400 at the same clockspeed in pure CPU-specific benchmarks such as Geekbench for single-core performance, multi-core performance does not show much benefit (which is unexpected based on the architectural advantage that the Cortex-A53-based Snapdragon 410 should have).

Even worse is the performance in practical benchmarks that measure performance for web browsing, gaming and other more complex, practical use cases. Based on benchmark results reported by GSMArena (1) (2), Basemark X, which is gaming benchmark that simulates throughput for a more demanding typical usage pattern that uses of the Unity engine, reports a significantly lower score than recent Snapdragon 400-based models such as the Moto G (2014), with the GPU score being similar, pointing to significant flaws in (multi-core) CPU and memory performance.

In Rightware's Browsermark 2.1, a browser benchmark with use of advanced web standards such as HTML 5, WebGL and advanced JavaScript, performance is downright disappointing, with a score less than half that of Snapdragon 400-based devices. Other browser benchmarks show similar results. Scores in Rightware's overall-use Basemark OS II benchmark are also typically relatively disappointing, not surpassing those of Snapdragon 400-based devices.

Hardware bugs likely cause of crippled performance


These lower than expected benchmark results for more complex, typical use benchmark are compatible with hardware bugs in the Cortex-A53 implementation of the Snapdragon 410 being a bottleneck and significantly degrading especially multi-core performance. In particular, work-arounds for cache consistency and coherency issues have the potential to significantly degrade performance, for example by forcing the kernel to frequently flush CPU caches.

The Linux kernel source shows commits to handle errata for Cortex-A53 up to r0p2 relating to cache clean operations, with the work-around being to promote cache clean to cache clean and invalidate. This could mean that revision r0p3 of Cortex-A53 may see further improvements. These commits do not explain the performance difference between r0p0, r0p1 and r0p2, since the work-around is the same for all three revisions.

Third revision of Cortex-A53 (r0p2) seems to improve memory performance


Some of the hardware or performance bugs that plagued especially the first version of the core (r0p0 as used in Snapdragon 410) have most likely been fixed in later revisions, contributing to a significant performance increase at the same clock speed.

SoCs with the third revision (r0p2) of the Cortex-A53 core seem to have much better memory performance as shown by Geekbench results, especially impressive given the bandwidth limitations of a 32-bit external memory interface. Most likely, this improvement is derived synergistically with ARM IP such as the Mali-T760 GPU as well as other IP blocks, which are implemented inside chips such as MT6732 and MT6752.

Since a SoC such as MT6732 is on the surface essentially comparable to Snapdragon 410 in the sense of having four Cortex-A53 CPU cores, there seem to be major performance improvements in the later revisions of the Cortex-A53 core and associated system architecture, especially with regard to memory performance. The difference is made more pronounced by the fact that the MT6732 is manufactured using TSMC's higher performance 28HPM process rather than 28LP and also clocked significantly higher.

Octa-core Cortex-A53 configurations provide impressive multi-core performance


Octa-core Cortex-A53-based SoCs such as MT6752 and to a lesser extent Snapdragon 615 are already showing impressive multi-core CPU performance, while single-core performance has also improved considerably over prior cost-effective CPU architectures. Multi-core performance, both in terms of pure CPU integer and floating performance, for the MT6752 significantly surpasses (by tens of percent in many benchmarks) the much more expensive Snapdragon 801, while single-core performance is catching up, being about 30% slower for integer operations and 15% slower for floating point. This high level of performance comes at a fraction of the cost (primarily because of the small die size and low power consumption of the Cortex-A53 cores).

Memory bandwidth still a bottleneck


However, when the memory subsystem truly comes into play, high-end chips such Snapdragon 801 still show much greater performance because of their much higher external memory bandwidth (because of the wider memory interface) as well as larger CPU cache. This is apparent in the Geekbench subtest SGEMM (which is heavy on sequential memory access), for which high-end SoCs such as Snapdragon 801 are more than twice as fast.

In practice, memory performance is important for how fast a device feels, impacting response times and also being very important for GPU performance. High screen resolutions also put heavy demands on the memory subsystem. In that sense, SoCs such as MT6752 and Snapdragon 615 still perform best at a resolution like 1280x720, with the best performance at 1920x1080 and higher still reserved for high-end SoCs.

There seems to be great potential for performance-oriented Cortex-A53 SoCs with a memory interface wider than 32-bit, comparable with other performance-oriented SoCs. This would be the "best of both worlds" in several respects (lower cost because of small die size of the CPU cores, low power consumption, while still having the memory bandwidth to drive high resolutions). MediaTek has announced such a chip that was expected to have such as configuration, the MT6795, but it has not quite appeared on the market yet and might be delayed. However, similar solutions certainly look likely to become popular for performance-oriented devices in the not too distant future.

Appendix: Table with detailed Geekbench CPU benchmark results


Presented here is a table with detailed benchmark result information for the mentioned SoCs, also including several other SoCs on the market. Included is information about the CPU cores used, their clock speed, the smartphone model and Geekbench result entry used as a reference, and scores for several benchmarks. Indexed results (relative to a 1.0 GHz Cortex-A7) are shown for several of the benchmarks, as well multi-core performance scaling indices. Results relevant to the discussion above have been highlighted in bold. The following Geekbench subtests have been included:

  • JPEG Compression (single/multi-core). A useful integer benchmark that seems to strongly depend on pure CPU performance (CPU core type and clock speed) with less dependence on the memory subsystem (including L2 CPU cache).
  • Dijkstra (single/multi-core). A more complex integer benchmark that probably includes more memory access and may branch a lot. Notable for this benchmark is that Cortex-A53 performs better than Cortex-A15 at the same clock speed, with both Cortex-A17 (MT6595) and Cortex-A57 being significantly faster still.
  • Mandelbrot (single/multi-core). A pure floating point benchmark, highly dependent on the combination of CPU core type and clock speed.
  • Stream copy (single/multi-core). An important metric for memory performance (especially sequential external RAM performance).
  • SGEMM. A floating point matrix multiplication benchmark that heavily depends on sequential memory access. The memory bandwidth available to the SoC makes a critical difference for this benchmark.
  • SFFT. A floating point benchmark that heavily uses random memory access.
For a high-resolution version, view/copy/save the image above using the browser.

Sources: Geekbench browser, Primate Labs website

Updated January 2, 2015 (Add section of low typical-use benchmark scores for Snapdragon 410).
Updated January 5, 2015 (Update Geekbench performance table).
Updated January 10, 2015 (Update performance table).
Updated February 11, 2015 (mention and link Linux kernel Cortex-A53 errata).