Saturday, October 18, 2014

Samsung's 64-bit Exynos 5433 SoC renamed to Exynos 7 Octa, used in some Galaxy Note 4 models

Recently, Samsung renamed its Exynos 5433 SoC to Exynos 7 Octa. The new Exynos chip is used by Samsung in the new Galaxy Note 4 smartphone, although how material actual shipments are has been unclear because most regions were first served primarily by Qualcomm Snapdragon 805-based versions of the Galaxy Note 4. However, evidence from the Geekbench result database suggests roughly a quarter of models currently sold are Exynos versions.

Signs of actual adoption of Exynos 7 Octa in high volume becoming apparent

Samsung has in the past frequently announced the use of Exynos SoCs in prominent smartphones, but shipments were often limited to very low volumes for smaller regions such Korea, with the vast majority of shipments using Snapdragon SoCs. During the last two years, only Samsung's tablets have seen widespread use of Samsung high-performance mobile SoCs. Although Samsung has recently ramped mid-range chips such as Exynos 3470 in presumably high volume for the Galaxy S5 Mini, strong evidence would be required to establish that the situation will be different this time around in terms of a high profile Exynos SoC (Exynos 7 Octa) being actually used in high volume in smartphones.

However,  searching for Galaxy Note 4 models on the Geekbench Browser provides evidence that at least one quarter of units currently sold contains the new Exynos chip, with the other three quarters or so using Snapdragon 805. Exynos versions are primarily represented by the SM-N910C, SM-N910S and SM-N910K models, while Snapdragon versions are mainly represented by SM-N910A, SM-N910T, SM-N910F and several other models.

Number of Geekbench entries for each Samsung Galaxy Note variant as of 24 October:
  • SM-N9100: Snapdragon 805, 7 entries
  • SM-N9109W: Snapdragon 805, 4 entries
  • SM-N910A: Snapdragon 805,  635 entries
  • SM-N910C: Exynos 5433, 425 entries
  • SM-N910F: Snapdragon 805, 496 entries
  • SM-N910H: Exynos 5433, 24 entries
  • SM-N910K: Exynos 5433, 73 entries
  • SM-N910L: Exynos 5433, 33 entries
  • SM-N910R4: Snapdragon 805, 23 entries
  • SM-N910P: Snapdragon 805, 238 entries
  • SM-N910S: Exynos 5433, 197 entries
  • SM-N910T: Snapdragon 805, 559 entries
  • SM-N910V: Snapdragon 805, 69 entries
  • SM-N910W8: Snapdragon 805, 10 entries

For the listed models, the total count is 752 Exynos and 2041 Snapdragon, representing an Exynos proportion of about 27%.

All things being equal, one would expect Samsung to prefer to use the internally manufactured Exynos chipset if enough supply is available, although with four Cortex-A57 cores the SoC is likely to be relatively expensive to manufacture. On the other hand, there are significant performance differences, with the Exynos platform clearly faster in terms of CPU processing but with a question mark in terms of power efficiency, while Snapdragon 805 can be regarded as mature, stable technology. Qualcomm may also be able to enforce a certain quotum of Snapdragon chips based on its leverage of patent royalties and licensing fees (which are considerable for a high-end smartphone).

Some anomalies are evident in the chips used for certain models. For example, a number of the SM-N910S results (which officially uses the Exynos 5433) in the Geekbench database show the use of an APQ8064 (Snapdragon 600) SoC clocked at 1.89 GHz, which is significantly slower that Exynos 5433 (or Snapdragon 805). Similarly, for the SM-N910C, starting from October 30 a not insignificant number of results labelled as SM-N910C show the use of the aging Exynos 4412 SoC (also used in old models such as the Galaxy S III) with four Cortex-A9 cores clocked at 2.0 GHz, much slower than Exynos 5433. These anomalies probably represent counterfeit production by Chinese manufacturers (both APQ8064 and Exynos 4412 have been common in the supply chain in the past). For models that officially use Snapdragon 805, no anomalies are evident.

Update as of December 5, 2014

Reassessing the share of Exynos 5433 vs Snapdragon 805 in the Geekbench database after a few months of production should be informative about whether Samsung is really serious about ramping Exynos production for smartphones. The following is apparent:
  • The Exynos-based SM-N910C count has increased from 425 to 4390.
  • The Exynos-based SM-N910S count has increased from 197 to 578.
  • The Exynos-based SM-N910K count has increased from 73 to 212.
  • The Exynos-based SM-N910H has increased from 23 to 757, while SM-N910L has increased from 33 to 91.
  • The new Exynos-based SM-N910U shows a count of 1062.
  • The Snapdragon 805-based SM-N910A count has increased from 635 to 2258.
  • The Snapdragon 805-based SM-N910T count has increased from 559 to 2089.
  • The Snapdragon 805-based SM-N910F count has increased from to 496 to 3857.
  • The Snapdragon 805-baed SM-N910P count has increased from 238 to 1162.
  • The Snapdragon 805-based SM-N910R4 has increased from 23 to 61, SM-N9100 from 7 to 58, SM-9109W from 4 to 20, SM-910V from 69 to 1685, and SM-910W8 from 10 to 636.
  • The new Snapdragon 805-based SM-N910G shows a count of 903, SM-N9106W shows 22, SM-N9108V shows 1.

For the listed models, the total count is 7090 Exynos and 12752 Snapdragon, representing an increased share of Exynos-based models in the Geekbench database from about 27% to about 36%, clearly suggesting that the share of Exynos-based models is increasing, and recent production may already have a much greater proportion of Exynos-based models.

First 20nm ARMv8 SoC targeting Android

One of the first smartphone SoCs manufactured using a 20nm process, at Samsung's own fabs, the Exynos 7 Octa is the first chip featuring ARM's Cortex-A57 and Cortex-A53 cores in a big.LITTLE configuration to appear on the market. The Cortex-A5x cores support the 64-bit ARMv8 instruction set, although using the 32-bit variant of the ARMv8 instruction set also appears to bring benefits while avoiding the performance degradation (related to increased memory use for pointers and addressing) that is associated with going to full 64-bit.

It is not the first 20nm SoC to support the ARMv8 instruction set, since Apple's A8 chip has already ramped to high-volume production during most of the year at TSMC for use in the iPhone 6 models. And already in 2013, Apple introduced the first ARMv8 chip with the Apple A7. As I have explained in an earlier article, there are reasons to believe the CPU cores in the Apple A7/A8 may have great similarities to ARM's Cortex-A57 CPU core, and in that sense the Exynos 7 Octa technically may not actually be the first SoC with Cortex-A57 cores to hit the market.

Fast, but power efficiency may be a problem

Reviews of Exynos 7 Octa-based devices such as the Galaxy Note 4 are still scarce. Already several months ago, early benchmarks results showed Exynos 5433 (as it was known then) providing the highest performance in the mobile space, significantly outscoring Snapdragon 805 in most benchmarks. This is not unexpected given the use of high-performance Cortex-A57 cores at a fairly high clock frequency.

However, there are signs that maintaining power efficiency with higher-clocked Cortex-A57 cores may be a challenge. Some early hands-on preview have suggested relatively high power consumption and mediocre battery life for an Exynos 5433-based Galaxy Note 4. More definite test results should clarify the situation.

Setting maximum clock frequency creates dilemma

Software techniques such as the use of efficient Global Task Switching with preference for the economical Cortex-A53 cores and throttling down of the clock frequency may be vital to maintain acceptable battery life. Analysis of Geekbench results for the Exynos 5433-based SM-N910C shows a multi-core performance scaling factor of about 4.45 for the largely CPU-bound JPEG Compress test, suggesting that Global Task Switching is implemented in such a way that not just the Cortex-A57 cores are utilized but the Cortex-A53 cores as well when high CPU performance is required.

High-performance CPU cores such as Cortex-A57 tend to have relatively high power consumption that increases as the clock frequency increases. This creates a dilemma for a manufacturer, because for acceptable power consumption with practical use there is little reason to set the maximum clock speed at the relatively high level that desirable for marketing purposes; a speed similar to the one used in Apple's Cyclone cores (e.g. 1.4 GHz) provides more than enough speed for most applications while limiting the excessive power consumption (and potential stability problems) associated with higher frequencies. A similar dilemma is often associated with SoCs with Cortex-A15 CPU cores (such as Samsung's Exynos 5430 used in the Galaxy Alpha) that have performance characteristics (high performance, but low performance/Watt) comparable to Cortex-A57, although Cortex-A57 is likely be more efficient.

Providing superior synthetic benchmark performance can be a matter of high prestige for a company and its marketing department to the extent that an unbalanced high maximum clock frequency may still be used in actually shipping devices, to the detriment of the user experience. Associated with this dilemma is the attraction of "cheating" on benchmarks by detecting when synthetic benchmarks are run and the switching to higher, sustained clock frequencies with reduced heat throttling, which has been demonstrated to be widespread in the past by websites such as AnandTech.

Evidence suggests Exynos 5433's Cortex-A57 cores are already clocked at a relatively low but efficient speed of about 1.4 GHz

Exynos 5433 may in practice already be clocked at a relatively low maximum speed to conserve power. Geekbench consistently reports 1.3 GHz as the clock frequency for all Exynos 5433 devices, however for some devices, including Samsung's big.LITTLE Exynos 5430 with Cortex-A15, Geekbench seems to report the maximum clock speed of the slower LITTLE cores, so the Cortex-A57 are probably clocked higher. However, even for the Cortex-A57 cores in the Exynos 5433, which have dramatically higher performance/cycle than the LITTLE Cortex-A53 cores, a relatively limited maximum speed in the range of 1.3 GHz would by no means be inappropriate for a smartphone platform.

Looking more closely at cross-platform Geekbench results for the Exynos-based Note 4 and the iPhone 5S and iPhone 6, and assuming that Apple' s Cyclone and Cortex-A57 are cores with similar performance characteristics at given clock speed (at least the little available evidence puts metrics like IPC and DMIPS in the same ballpark), gives indications that Exynos 5433 may on average actually be clocked at an effective 1.4 GHz, comparable to the 1.4 GHz of the iPhone 6. However, it can not be ruled out that in the case of the Exynos 5433 the frequency is the average resulting from thermal speed throttling (variation of the CPU speed based on power consumption and heat production).

Apple's SoC architecture is also different because it is a dual-core compared to the big.LITTLE configuration of the Exynos 5433 with four Cortex-A57 cores and four Cortex-A53 cores, and Apple' s cache memory architecture is very different with a large L3 cache and likely highly optimized but smaller L2 cache, and the Apple device has higher external RAM performance. Additionally, the software model (Apple's  64-bit AArch64 vs 32-bit ARMv8 AArch32 used with Exynos 5433) also complicates things, however some conclusions may still be drawn looking at specific benchmarks.

Comparison with Apple A7 and A8 benchmarks provides clues

Performing a detailed comparison of representative results for a SM-N910C and an iPhone 6 with Apple A8 on the Geekbench browser page provides interesting information. On first glance the results are all over the place with some benchmarks (including single-core ones) being faster on Exynos and others on the Apple A8, while Exynos obviously has an advantage for multi-core tests.

However, one can look for sub-benchmarks that are less likely to be affected by a large L3 cache on the Apple device, specifically benchmarks that do not have a large memory working set and source data or do not constantly perform random read access on a large set but do perform a lot of processing, possibly writing (but not reading) a lot of data. Some stream-type algorithms such as common data and image compression and decompression benchmarks fit the bill, because they generally steam the source data sequentially, perform a relatively high amount of CPU processing based on a relatively limited working set (a small part of the stream/file), and write the resulting data sequentially.

This type of benchmark puts the Exynos 5433 somewhat lower but fairly close to the Apple A8 in single-core CPU performance. Further information can be gained from iPhone 5S (Apple A7) results.

Benchmark results: Galaxy Note 4 (SM-N910C) vs iPhone 5S vs iPhone 6, relative speed advantage of iPhones compared to SM-N910C:
Test name                           SM-N910C  iPhone 5S       iPhone 6
BZip2 Compress:                     1187      1109 ( -6.5%)   1187 ( +8.5%)
BZip2 Decompress:                   1366      1394 ( +2.0%)   1538 (+12.6%)
JPEG Compress:                      1378      1196 (-13.2%)   1372 ( -0.0%)
JPEG Decompress:                    1598      1583 ( -0.9%)   1855 (+16.1%)
PNG Compress:                       1391      1427 ( +2.6%)   1577 (+13.4%)
PNG Decompress:                     1490      1301 (-12.7%)   1498 ( +0.5%)
Sobel (image local edge detection): 1701      1584 ( -6.9%)   1922 (+13.0%)
The Apple A8 chip in the iPhone 6 scores somewhat higher than Exynos 5433 in most tests, while Exynos 5433 is on average faster than the Apple A7 in the iPhone 5S. All of this is consistent with the CPU cores in all of the devices having comparable single-core CPU performance, and when making the assumption that Cortex-A57 and Cyclone (which seems to have a lot of architectural similarities with Cortex-A57) have comparable performance per cycle (at a given clock frequency), consistent with a clock frequency for the Exynos 5433 that is similar to the one used in the Apple devices (around 1.3 to 1.4 GHz).

The largely CPU-bound JPEG Compress test, which appears to be closedly tied to clock speed on other chip platforms with limited dependence on factors outside the CPU core, provides evidence that the isolated single-core CPU performance of Exynos 5433 may be close to that of the Apple A8 in the iPhone 6, consistent with a similar effective clock frequency of about 1.4 GHz. To what extent thermal throttling plays a role for the Exynos is not entirely clear. Most of the Geekbench results for SM-N910C for the JPEG Compress test are very close (a score around 1375), suggesting that at least for this test the maximum clock speed is generally maintained, which would be compatible with this speed being about 1.4 GHz.

PNG Decompress seems to be somewhat of a negative outlier for the Apple A7 and A8, but it is consistent across different iPhone results and is probably related to the high amount of memory writes (decompressed image data) associated with the benchmark, which can be affected by the extra layer in the memory subsystem represented by the L3 cache.

One significant caveat for the comparison above is that the Apple devices run in AArch64 mode, while Exynos 5433 in the Note 4 runs in AArch32 mode (the 32-bit version of the ARMv8 instruction set). AArch64 can take advantage of more instructions, in particular instructions operating on 64-bit registers, while the increased pointer/address storage size can decrease performance somewhat. However, the source code for the Geekbench test is likely to be identical (without extensive use of 64-bit integer variables) for AArch64 and unlikely to be specifically optimized, with any optimizations for AArch64 in the generated code depending on the compiler.

Sources: Samsung (Exynos 7 Octa), Geekbench Browser

Updated (24 October 2014): Update with information about proportion of Exynos models based on Geekbench database, and provide performance comparisons with Apple processors.
Updated (30 October 2014): Language tweaks, improve Geekbench comparison table and fix PNG Decompress score for iPhone 5S.
Updated (2 November 2014): Update discussion about clock speed of Exynos 5433, expand description of use of GTS, make note of counterfeit models in Geekbench database.
Updateed (5 December 2014): Update Exynos model share statistics for Galaxy Note 4.

No comments: