Saturday, August 30, 2014

Potential signs of loosening of tight wafer supply?

Loosening of supply for non-leading-edge wafers?

Recently, DigiTimes, which often provides reasonably accurate information from sources close to the supply chain, but also sometimes publishes inaccurate infomation from vague sources, has published two partly contradictory articles reflecting potential loosening of the wafer supply situation at foundry houses in Taiwan.

On 28 August, it posted an article titled "Production schedules at 12-inch fabs begin to loosen", saying that "while most 8-inch fabs of major foundries are expected to run at full capacity until the end of 2014, production schedules at 12-inch fabs, particularly those of second-tier foundry houses, are said to have begun to loosen as some IC players have been reducing their wafer start orders, according to industry sources". However, it also added that TSCM's 12-inch fabs continue to run at full capacity due to production of Apple's A8 processor.

The fact that TSMC continues to run at full capacity in its 12 inch fabs puts any loosening of supply into perpective. As it happens, all major smartphone SoC vendors (Qualcomm, MediaTek and Apple) are currently single-sourcing the overwhelming majority of their smartphone chips from precisely TSMC's 28/20nm 12-inch fabs. So there a few signs yet that the tight capacity for smartphone SoCs, and other chips depending on advanced processes at TSMC such as NVIDIA's GPUs, will be resolved in the near future.

UMC likely seeing reduced demand for 40nm and above

UMC, the other, smaller foundry in Taiwan, until recently (Q2 2014) had a 28nm wafer capacity proportion in the low single digits, meaning that its production capacity for 28nm was until recently roughly on the order of 30 to 50 times lower than TSMC, a negligable amount, although it is currently attempting to increase that capacity. The DigiTimes article may refer to reduction in demand for non-leading-edge process nodes produced at UMC's 12-inch fabs, such as 40nm and 55nm.

This reduction could be stemming from so-called "second-tier" design houses in China and Taiwan targeting consumer electronics, such as low-end chips from Chinese chip designers targeting tablets and other devices, and Taiwanese companies like Sunplus and Realtek. On example of a product segment for which there is likely to be decreasing demand is stand-alone WiFi chips used in tablets and smartphones. As SoCs integrating much of the required WiFi functionality (such as the digital processing part) from vendors such as MediaTek become dominant, stand-alone WiFi chips for applications such as tablets are seeing much lower demand. Reduction in demand for older or mature product lines from MediaTek (involving segments such as DVD players, optical storage and feature phones) and Qualcomm still produced at UMC could also be involved. TSMC's 40nm and above 12-inch capacity could also be seeing lower demand, but is continuing to be converted to 28nm.

The article specifically notes that trailing-edge 8-inch fab capacity, where products such as LCD driver ICs and power ICs are produced, remains tight across the board.

ASPs said to be trending down due to increased competition

In another article on 29 August, DigiTimes reported that average selling prices of 3G/4G smartphone SoCs, touch controller and LCD driver ICs are trending down in 3Q14, due to increased competition. The article also mentions that abundant supply of wafers from foundry houses contributes to price reductions. This statement seems to partly contradict other recent articles by DigiTimes, such as one referenced above that notes continuing tightness of 8-inch fab production (used for many LCD driver ICs), as well as 12-inch production at TSMC that continues to be fully utilized, and an earlier article saying that TSMC's production capacity was already fully booked for the year (also see my earlier blog post about this). However, more available capacity at UMC's 12-inch fabs (involving nodes such as 40nm) could be a reason for the some the noted trends.

The article also notes a slow-down in domestic demand for handsets in China in Q3, that has led Qualcomm, MediaTek, Marvell and Intel to reduce pricing. It also notes that new low-cost 3G smartphone SoCs from Spreadtrum that are rolling out are potentially contributing to competitive pressures, especially for MediaTek which dominates the segment.

Potential reduction of leading edge wafer requirements at TSMC from more economical chip designs

Even for TSMC's leading-edge (28/20nm) capacity, for which there is likely to still be a significant shortage, there are developments that could reduce capacity requirements and thus eventually largely resolve the shortage of capacity, and it is possible that the DigiTimes articles are hinting at this development.

One factor is that Apple's A8 SoC chip production volume likely to have a peak around now, in order to build inventory in time for the expected September launch of the iPhone 6 and the end-of-the-year peak selling season, and may be somewhat reduced (but still significant) going towards the end of 2014. That in itself can reduce capacity tightness as the year progresses.

Qualcomm transitioning away from uneconomical Snapdragon 800/801 platform

At the same time, Qualcomm appears to be aggressively transitioning a large part of its higher-end production from the high volume Snapdragon 800/801 platform using Krait-400 CPUs to more economical platforms such as the Snapdragon 610 and 615, which use ARM Cortex-A53 CPU cores, and later the high-end Cortex A57 + A53 based Snapdragon 808/810. Snapdragon 800/801 series SoCs have a notably large die size (118 square mm for MSM8974, even at 28nm HPM) mainly due to the large size of the Krait-400 cores, as well as other design features such as the GPU and other factors. Because Qualcomm has had a virtual monopoly in the high-end smartphone SoC segment, the high manufacturing cost of Snapdragon 800/801 SoCs has not been a major issue for Qualcomm because of the ability to sell it at a very high ASP.

Cortex-A53 CPU cores, even in an octa-core configuration, have a die size that is significantly smaller than that of a quad-core Krait-400. With likely additionally reduced size of other components such the GPU, Snapdragon 610/615 SoCs are likely to be significantly smaller than Snapdragon 800/801, as small as half the size or smaller. In this way the transition away from high volume, wafer-consuming Snapdragon 800/801 SoCs can immediately reduce Qualcomm's wafer requirements significantly, assuming a similar level of unit shipments, effectively reducing the severity of the shortage of capacity at TSMC.

Some time ago, Qualcomm already executed a similar transition by successfully transitioning its mid-range Snapdragon 400 platform from dual-core Krait-300 to quad-core Cortex-A7 CPUs, likely significantly reducing cost while improving performance. Qualcomm is currently further transitioning its mid-range platform to a quad-core Cortex-A53 configuration with the production ramp of the Snapdragon 410.

Market share gains by MediaTek for mid-range and high-end segments could also effectively reduce overall wafer capacity requirements for smartphone SoCs, because of the tendency of MediaTek SoCs to have a significantly smaller die size and cost compared to competing solutions.

Sources: DigiTimes, MEPTEC

Thursday, August 28, 2014

MediaTek well positioned for smartphones, widely adopts Cortex-A53 for upcoming platforms

MediaTek has announced several new smartphone chips this year, most of them exclusively using ARM's efficient new 64-bit Cortex-A53 core in various configurations. This includes major platforms targeting the entry-level and mainstream segments scheduled to be shipping this year, and a new high-end platform scheduled to be shipping near the end of the year. These are also the first MediaTek SoCs to integrate a 4G LTE baseband.

Existing low-end 3G platforms

MediaTek currently dominates the worldwide cost-sensitive smartphone market, shipping cost-effective, power-efficient SoCs that integrate 3G baseband and other functionality. These solutions will reach shipments of hundreds of millions this year.

The MT6572 is a dual-core Cortex-A7 SoC targeting ultra-low-cost smartphones that started shipping towards the end of 2013. The combination of Cortex-A7 CPU cores and 28nm process technology ensures acceptable performance and power efficiency. Cost is reduced by incorporating a relatively small 256 KB L2 cache, cost-effective Mali-400 MP1 GPU and integrating additional digital connectivity functionality relating to WiFi and other standards. The MT6572M is a lower-clocked version of the MT6572. In 2014, MediaTek released the MT6571, a cost-reduced platform based on the MT6572 targeting low-cost 2G/EDGE devices in emerging markets.

The MT6582 is a quad-core Cortex-A7 SoC targeting low-cost smartphones that started shipping towards the end of 2013. Manufactured at 28nm, it is power-efficient with good performance for its segment. Apart from the higher number of CPU cores, the MT6582 also has a larger L2 cache (512 KB) and more GPU cores (Mali-400 MP2) when compared to the MT6572. It also integrates connectivity functionality such as WiFi. The MT6582M is a popular lower-clocked but cheaper version of this chip.

First "true" octa-core CPU

The MT6592 is an octa-core Cortex-A7 SoC clocked at a fairly high frequency (1.7 GHz) targeting the higher segments, which is a significant departure from MediaTek's previous focus on lower segment devices. It started shipping in the beginning of 2014 and has seen widespread adoption among Chinese manufacturers. The power efficiency and small die size of the Cortex-A7 keeps cost and power consumption in check despite the presence of eight cores.

Compared to octa-core chips following ARM's big.LITTLE design that existed at the time, in which the number of active cores was limited to four, the MT6592 was the first chip allowing concurrent use of all eight cores. big.LITTLE-based chips at the time also had problematic power consumption (which to a large degree continues to be the case today) that limited their viability, especially for smartphones. Although the use of eight CPU cores was ridiculed by some competitors and observers at the time, subsequent developments in smartphone SoC CPU technology suggest that this chip was in fact revolutionary and a sign of things to come.

Like the low-end SoCs, the MT6592 uses a 32-bit DRAM interface, clocked up to 666 MHz, which limits memory bandwidth and is a bottleneck for performance given the otherwise higher performance characteristics of the chip, such as the octa-core CPU and Mali-450 MP4 GPU. It has been reported that, probably due to the impact of limited memory bandwidth, MT6592-based devices suffer from reduced performance and power-efficiency when used with a 1080p display because of the associated higher demands on the memory subsystem, meaning that a 720p or lower display results in much more balanced performance. The commonly used MT6592M is a lower-clocked version of the MT6592.

MediaTek currently shipping 4G using stand-alone baseband

MediaTek has been late with the introduction of basebands with 4G network support, and is still not shipping a SoC with integrated 4G baseband, a feat that competitor Qualcomm already achieved almost a year ago. However, it already supports 4G smartphones with an alternative solution, a separate baseband/modem chip.

MediaTek is currently shipping in 4G LTE-enabled platforms using its stand-alone MT6290 2G/3G/4G baseband with support for LTE R9. This chip is used in conjunction with MediaTek's existing cost-effective quad-core Cortex-A7 MT6582/MT6582M SoC platform and its mid-range octa-core Cortex-A7 MT6592/MT6592M platform. The use of two chips is inherently more costly than a SoC with an integrated 4G baseband, and essentially means the chipset contains duplicated 2G/3G baseband functionality, although the MT6582 is a chip with low manufacturing cost which means that MediaTek may still achieve acceptable margins even with the additional chip. Judging from early reviews, the use of two chips does not seem to have a significant impact on power efficiency.

Update (September 7, 2014): New models introduced by smartphone vendors at the IFA trade show widespread adoption of Snapdragon 400 and 410 SoCs with integrated 4G in lower-end models, including models from manufacturers targeting lower-end segments (such as Alcatel) where MediaTek previously supplied chips for the vast majority of models. This suggests that the two-chip solution with the MT6290 is not very appealing to the market for reasons such as cost, as well as being a bad match for the very tight capacity situation at TSMC.

Cortex-A53-based platforms targeting entry-level to mainstream

The MT6732, with a quad-core Cortex-A53 CPU at 1.5 GHz, is targeted at the higher regions of the entry-level segment with display sizes up to 720p. Manufactured using TSMC's 28nm HPM process, it includes a Mali-T760 MP2 GPU and integrates MediaTek's new 4G baseband supporting LTE R9 Cat 4. The SoC uses of a single-channel 32-bit LPDDR3 DRAM interface clocked up to 800 MHz, which similar to the DRAM configuration used in existing cost-sensitive platforms, although the maximum clock frequency has been increased. Volume availability is expected in Q4 2014.

The MT6752 is mid-range platform with a symmetric octa-core Cortex-A53 CPU at 1.7 GHz. MediaTek already has experience with a similar CPU configuration with its MT6592 with octa-core Cortex-A7. It is targeting the mainstream segment with a display size up to 1080p. Also manufactured using TSMC's 28nm HPM process, it includes a Mali-760 MP4 GPU, with twice the number of cores compared to the MT6732, and integrates MediaTek's new LTE R9 Cat 4 baseband. It has the same DRAM interface as the MT6753, keeping cost down, but limiting memory bandwidth which potentially impacts performance, especially for larger display sizes such as 1080p. Volume availability is expected for Q4 2014.

It is possible that the MT6732 and/or MT6752 already make use of the frame buffer compression or smart composition technologies offered by ARM, which are part of new platforms associated with recent CPU cores such as Cortex-A53, existing and upcoming Mali GPUs, and newly introduced ARM video (VPU) and 2D graphics (DPU) cores. These techniques can significantly reduce memory bandwidth requirements, which is important given the limitations on memory bandwidth in the MT6732 and especially the higher-performance MT6752.

Finally, the MT6735 is a derivative of the MT6732 that adds EVDO network technology, making it suitable for markets such as the US where some carriers continue to use network technology with an origin in CDMA. This is a market that Qualcomm has historically dominated. This development was reportedly made possible by MediaTek obtaining licenses from VIA Technologies. It is scheduled for shipment in 2015.

High-end Cortex-A53-based platform

The MT6795 is a high-end platform with a symmetric octa-core Cortex-A53 CPU up to 2.2 GHz, targeting the high-end segment with display size up to 2560x1600. Manufactured using the 28nm HPM process, it reportedly has a high-performance PowerVR G6200 GPU, and also integrates a 4G LTE baseband. A critical feature of this chip is the use of a dual-channel LPDDR3 DRAM interface, clocked up to 933 MHz. The dual-channel interface doubles memory bandwidth compared to other MediaTek platforms, greatly improving performance potential, in line with current high-end SoCs such as Qualcomm's Snapdragon 800/801/805 platforms. MediaTek has announced that devices with the MT6795 will be commercially available by the end of 2014.

It does not look like a coincidence that the model number of this chip is similar to the big.LITTLE-based MT6595 using Cortex-A17 and Cortex-A7 cores that was announced much earlier, which has a similar GPU and DRAM interface, strongly suggesting that the MT6595 will have relatively limited viability and that the MT6795 is in fact the smarter, cheaper and more power-efficient replacement for it. This provides strong evidence that MediaTek, despite earlier prominent announcements about its heterogeneous multi-processing architecture dating back to 2013, is currently de-emphasizing big.LITTLE architectures in favor of less complicated, much more power-efficient and more cost-effective multi-core configurations.

Update (January 9, 2015):

The MT6595 has been somewhat more succesful than estimated in this article, with at least two high volume smartphone models adopting it (Lenovo Vibe X2 and Meizu MX4), although it is not been very widely adopted. Meanwhile, not much further information has yet been heard about the MT6795, which could point to possible delays or reflects NDAs and a policy of secrecy regarding this product on the part of MediaTek. MediaTek continues to use big.LITTLE CPU designs for smart TV and some tablet products, but I believe for the mobile market (smartphones and tablets), Cortex-A53-based SoCs have a much greater potential for success.


MediaTek's early adoption of power-efficient, in-order pipeline Cortex-A7 cores means it has significant experience with the type of CPU architecture (four or eight extremely low-power, medium performance cores) that is becoming increasingly important even for high-end platforms. The expertise gained from the octa-core MT6592 is likely to serve MediaTek well for upcoming platforms. Even Qualcomm is quickly moving into this direction, but MediaTek is still likely to have some competitive advantage due to its previous experience. This will give MediaTek the opportunity to further extend its reach into higher-end segments.

At the same time, MediaTek has faced challenges. It is obvious that its prominent investment in big.LITTLE heterogeneous multi-processing architectures (such as the high-end MT6595 smartphone SoC and the MT8135 tablet SoC announced earlier) has brought a limited amount of benefit in the mobile space, with somewhat limited shipment potential due to cost, relatively high power consumption and other issues.

The late introduction of SoCs with an integrated 4G baseband has already affected MediaTek's market share in the upper part of the low-end segment, which is being encroached by Qualcomm's Snapdragon 400 platform, which has already offered an integrated 4G baseband for some time, and has been further updated with the Cortex-A53-based Snapdragon 410 platform.

MediaTek faces wafer capacity shortages at TSMC, where a large part of advanced chip manufacturing capacity has been taken up by Apple. Apple, as well as Qualcomm, are large TSMC clients with very deep pockets and can invest many billions of dollars in purchase commitments, and have done so, potentially shutting out MediaTek to a certain degree. The tight chip supply situation is likely to affect MediaTek's sales and flexibility of production.

The smartphone SoC market is also complicated by Qualcomm's large patent/IP position, based on which it strives to levy high royalties on smartphones, especially those using chips from competitors. This situation has resulted in Qualcomm obtaining an overwhelmingly large share of the high-end smartphone SoC/baseband market and the failure of many traditional competitors. However, recent developments in China have undermined Qualcomm's ability to enforce royalties and licensing fees, and Samsung also seems to be moving away from its uneasy strong dependence on Qualcomm. This may already have resulted in increased demand for MediaTek chips, although this demand may be higher than MediaTek's current ability to supply.

Sources: MediaTek websiteMediaTek (MT6795 press release), Wikipedia (MediaTek), ARM (Cortex-A53 page), ARM (ARM Frame Buffer Compression), ARM (Smart Composition)

Updated (January 9, 2015):  Update status MT6595 and other big.LITTLE designs from MediaTek, and status of MT6795.

Wednesday, August 27, 2014

ARM Cortex-A53 core emerges as a viable solution for a wide range of performance targets

Cortex-A53 seeing strong adoption for upcoming SoCs

We have already seen that ARM's Cortex-A7, an extremely low cost in-order pipeline CPU core optimized for power efficiency, has been a perfect match for current high-performance mainstream 28nm process technology, and multi-core Cortex-A7 CPUs currently drive the vast majority of smartphones in the market, except the premium segment.

At the same time, the new Cortex-A53 is seeing very strong adoption for new SoCs for which production is currently ramping up as well as upcoming platforms. The Cortex-A53, an in-order pipeline CPU that is basically an extension of the Cortex-A7 to ARM's 64-bit ARMv8 instruction set architecture with somewhat higher performance, has recently been adopted for mainstream platforms by leading smartphone SoC providers Qualcomm and MediaTek, in products spanning virtually the entire spectrum from entry-level to premium-level.

Issues remain with ARM's 32-bit performance cores such as Cortex-A17

Meanwhile, the latest revision of ARM's established high-performance 32-bit core, the Cortex-A15 r3p3, seems to have finally become mature enough to be used in a leading smartphone platform, the Samsung Galaxy Alpha, in the form of the big.LITTLE-based Exynos 5430 manufactured using Samsung's leading-edge 20nm process. The more efficient heterogeneous multi-processing (Global Task Swiching) implementation of big.LITTLE also seems have reached maturity and actually usability in practice. However, the power efficiency of any Cortex-A15 core, even after numerous optimizations, is still likely to be mediocre.

Doubts still abound the newer 32-bit ARM cores that were supposed to fix the high power consumption of the Cortex-A15, namely the Cortex-A12 and Cortex-A17. Few new chips using these cores have reached stable volume production yet, with suggestions of continuing power consumption issues and a learning curve to achieve stable volume production. Leading SoC companies like MediaTek, Allwinner and Rockchip earlier announced SoCs with these cores in a big.LITTLE configuration or as a straightforward quad-core, but the path to market arrival already appears to be longer than expected for most of these platforms, with several of them not likely to ship commercially at all or only into low volume non-mobile applications where power consumption is less of an issue.

Cortex-A53 applicable to a wide range of performance points

The issues with cores such as the Cortex-A17 seem to have accelerated the move towards the Cortex-A53, ironically not because of the Cortex-A53's support for the 64-bit ARMv8 architecture, but much more because it is conveniently positioned as as faster version of the proven Cortex-A7, well suited to the latest process nodes, allowing it to be clocked higher than Cortex-A7 in addition to being intrinsically faster. The Cortex-A53 is likely to used mainly as a 32-bit CPU in the short term (being fully compatible with the 32-bit ARMv7-A instruction set architecture).

It can provide high performance with relatively low power consumption in configurations with a substantial number of cores (such as eight), making it suitable even for the premium devices, while being suitable for lower segments in configurations with a smaller number of cores (such as four). Clock frequency targets can be adjusted for a particular segment (ARM offers specific support for optimizing a particular core either for more speed or for better power efficiency) . At the same time, because the die size is not much greater than the Cortex-A7 (which has a very small die size), and several times smaller than more performance-oriented CPU cores, a high number of Cortex-A53 cores can be used without serious implications for manufacturing cost. In effect, the move to  Cortex-A53-based architectures for performance-oriented SoCs is likely to dramatically lower cost while also greatly improving power efficiency.

Leading platforms using Cortex-A53 targeting volume production in 2H 2014

Already, several configuration types of SoC using only Cortex-A53 cores have been announced, spanning much of the performance spectrum:

Quad-core Cortex-A53:
  • MediaTek MT6732, speed quoted as 1.5 GHz, targeting entry-level devices, expected to arrive this year.
  • Qualcomm Snapdragon 410 (MSM8916), 1.4 or 1.2 GHz, was scheduled to be sampling in Q2 and likely already in production. As of the IFA trade show early September, numerous new smartphones using this platform have already been announced and are starting to become commercially available.
  • Qualcomm Snapdragon 610 (MSM8396), 1.8 GHz, sampling planned for Q3 2014.
Octa-core Cortex-A53 (symmetric, all cores can clock up to same maximum speed):
  • MediaTek MT6752, 1.7 GHz, targeting mainstream devices, also expected to arrive this year.
  • MediaTek MT6795, speed quoted as 2.2 GHz, targeting premium devices, scheduled for a Q4 2014 introduction. It does not look like a coincidence that the model number of this chip is similar to the big.LITTLE MT6595 using Cortex-A17 and Cortex-A7 cores that was anounced much earlier, strongly suggesting that the MT6595 will have relatively limited viability and that the MT6795 is in fact the smarter, cheaper and more power-efficient replacement for it.
Octa-core Cortex-A53 in a pseudo-big.LITTLE configuration (four cores clocked higher, four clocked lower):
  • Qualcomm Snapdragon 615 (MSM8939), 1.8 GHz x4 + 1.0 GHz x4, sampling planned for Q3 2014.
Qualcomm has also announced SoCs using the Cortex-A53 in combination with higher-performance Cortex-A57 cores in a big.LITTLE configuration:
  • Qualcomm Snapdragon 810 (MSM8994), quad-core Cortex-A57 + quad-core Cortex-A53, sampling 2H 2014.
  • Qualcomm Snapdragon 808 (MSM8992), dual-core Cortex-A57 + quad-core Cortex-A53, sampling 1H 2015.
However, it remains to be seen whether the Cortex-A57 will improve upon the performance characteristics and suitability for leading-edge processes of cores such as the Cortex-A17, on which it is likely to be based. If we make the assumption that the two CPU cores inside the Apple A7 (manufactured at 28nm) have characteristics that are close to the Cortex-A57, which is not at all illogical, then that would mean that the Cortex-A57 is unlikely to be significantly more power efficient than preceding cores like Cortex-A15 and A17, although the jury would still be out for the potential improvement on a more advanced process such as 20nm.

Impact on multi-threading in major mobile operating systems

It is already becoming clear that many-core (for example octa-core) CPU configurations consisting of low-to-medium-performance, but very power-efficient cores like the Cortex-A7 or Cortex-A53 is the most cost-effective and power-efficient way to pursue higher peformance levels in modern mobile devices. In that respect, MediaTek's MT6592 octa-core Cortex-A7-based SoC released at the end of 2013, far from deserving some of the scolding it received from certain competitors and other observers, was in fact a revolutionary design and a sign of things to come.

In the Android OS, the presence of more CPU cores has a significant positive effect on performance and usability, without increasing power consumption and in practice actually facilitating low-power devices. To what extend an increase in the number cores from quad-core to octa-core contributes to performance has been debated.

The are suggestions that symmetrical (not big.LITTLE) octa-core CPU configurations, such as implemented in the MT6592, can in fact provide significant performance benefits for common use-cases. For example, although SoCs typically have a specific, proprietary video decoding/encoding core (VPU) to accelerate video playback with minimal power consumption by the CPU, the set of video standards supported by the VPU is often limited, and variations within a specific media format or the playback window configuration may require a full or partial fall-back to software decoding by the CPU. On the Android platform, the ffmpeg/libav platform used for software video decoding can readily take advantage of the extra cores, essentially doubling processing capacity, which can easily make difference between smooth or unacceptably stuttering video playback. Another example is a multi-window UI as offered with certain Android platforms, allowing multiple applications to run in the "foreground" concurrently, each potentially using several threads/cores. Finally, the ubiquitous Chrome browser, a very common use-case, is inherently multi-threaded.

For other operating systems, the situation may be less clear. For example, there is likely to be much less emphasis on multi-threaded applications in Apple's iOS, based on a long legacy of application processors limited to one or two cores with increasing performance, the latest of which was the Apple A7. If the Apple A8 in the upcoming iPhone 6 and other upcoming Apple devices in fact uses Cortex-A53 "class" CPU cores in a many-core configuration (which I would not rule out at all), then that would have repercussions for iOS application development by stimulating the use of a much higher degree of multi-threadedness to take better advantage of the new processor.

Sources: Wikipedia (Snapdragon (system on chip)), Wikipedia (MediaTek), ARM (Cortex-A53), ARM (Cortex-A7), ARM (Cortex-A17), ARM (POP IP), DigiTimes (64-bit AP shipments growing fast in 2H 2014)

Updated September 19, 2014.
Updated January 9, 2015 (rephrase statement about viability of MT6595).

Tuesday, August 26, 2014

Overview of the tablet SoC market, with a focus on China

Price pressure on tablet application processor SoCs

When introduced in 2012, Alwinner's single-core A10 tablet SoC offered a higher level of integration than previous tablet application processors, enabling cheaper tablets, also helped by the fact the A10 was reported to be priced as low as $10, which was revolutionary at the time. However,  as of mid-2014, Allwinner, which is based in China, is ramping up its quad-core A33 tablet SoC and selling it for $4. The much lower chip price contrasts with the increased processing power of the new chip, illustrating both technological progress and intense competition (leading to commoditization) in the tablet processor space.

Rise of cellular data-enabled tablets

Part of the reason for the steep price decline is that traditional tablet application processor companies serving Chinese manufacturers such as Allwinner and Rockchip are being squeezed by the increasing worldwide demand for 3G (celullar)-enabled tablets, for which they cannot offer cost-effective solutions. The share of cellular network-enabled tablets has been rising quickly and currently represents between one quarter and one half of total unit shipments.

MediaTek takes advantage

This has allowed competitors that do have cost-effective integrated cellular modem technology, particularly Taiwan-based MediaTek, to take market share. MediaTek also generally has well-optimized, power-efficient solutions that can enhance the user experience (even for non-3G tablets, MediaTek chips are generally well optimized and very cost-effective).

MediaTek's product offerings for 3G-enabled tablets are mostly based on very similar chips used in smartphones. For example, the dual-core MT8312 corresponds to the MT6572 smartphone chip, while the quad-core MT8382 corresponds to the MT6582. In many cases MediaTek may be selling the same physical chip, just with a different number on it. It is also common practice in China to procure MediaTek smartphone chips for use in 3G tablets, for example the octa-core MT6592 is currently popular for use in tablets.

Even for WiFi-only tablets, MediaTek can offer more cost-effective solutions, because of its ability to integrate WiFi (and also Bluetooth and GPS) processing (at least the digital part) into the SoC, which reduces the cost of the external RF chip implementation and the cost of the PCB.

However, MediaTek is likely to be severely affected by the shortage of 28nm production capacity at TSMC on which it depends, limiting its ability to take more market share.

(Click on image to enlarge)

Chinese tablet SoC providers facing challenges

Allwinner has been in decline since their A10 family of single-core chips became uncompetitive in the first half of 2013. They had a moderate new product success with the A31/A31s in the first half of 2013, which was targeted at a higher-end segment with lower volume. Mainly because Allwinner's intended successor products for low-end tablets (such as the dual-core A20 and A23) encountered severe issues, Allwinner lost market share, having been passed by Rockchip at the end of 2013 and MediaTek as of Q2 2014. Allwinner hopes to recover some its market position with its new A33 chip.

Although Rockchip led the market in terms of unit market share since Q4 2013 up to at least Q2 2014, it is not immune to the price pressures on tablet processors. In particular, Rockchip has continued to rely fairly heavily on its more performance oriented RK3188(T) chip, which is not likely to be have a production cost that allows it to compete with, for example, the price quoted by Allwinner for the A33. Rockchip also has a low-end dual-core chip, the RK3168, that was delayed for quite a while until it started appearing in end devices last quarter. However, it is not likely a very cost-effective or power-efficient chip, and will immediately be pressured by Allwinner's A33 (apart from the pressure already exerted by MediaTek and other players due to the adoption of 3G connectivity). Part of Rockchip's problem may be that it is using the aging Cortex-A9 core and not the Cortex-A7 CPU core, which has become by far the most cost-effective and power-efficient CPU core for mobile applications. Rockchip may also have been affected by the quality/technology issues reported at its foundry partner GlobalFoundries this year, although the production of the RK3188 using 28nm HKMG seemed to go fairly smoothly starting from 2013. Nevertheless, Rockchip has continued to ship high volumes of the RK3188 and lower end chips, all of which use Cortex-A9 cores.

Additionally, over the last year both Rockchip and Allwinner seem to have diverted a lot of attention to ambitious higher-end products (RK3288 and A80, respectively) that have proven to be a challenge to bring to market in working order, apart from being unsuitable for most of the tablet market due to high power consumption. At the moment neither product appears to have much potential for success because of unsuitability for the tablet market.

Other Chinese players

Other players continue to be active in the Chinese tablet processor market. Actions Semiconductor, which has a relatively long history and was in the past a successful supplier of MP3 player chips, when it incubated the engineers that would later start Allwinner, has been competing at the bottom of the tablet market with generally lower-performing chips. Although their ATM7029 was the first cost-effective quad-core processor for Chinese tablets in 2013, it uses low-performance Cortex-A5 cores, which created controversy as Actions for a long time maintained that it was using Cortex A9 (or later "Cortex A9-class") cores in this chip, clearly at odds with performance and closer examination of the product, and Actions went as far as modifying the kernel/OS to cover their tracks. Nevertheless, the ATM7029 sold in fairly high volume, and later Actions replaced the Vivante GPU with a somewhat less problematic PowerVR SGX540 in their ATM7029B, and no longer seems to deny the presence of Cortex-A5 cores. However, their new product, the ATM7039, still hasn't quite appeared on the market. The reduced chip prices are likely to make it difficult for Actions to make a profit on their chips.

There are also additional Chinese companies with expertise in the smartphone space, such as Spreadtrum (long established as supplying significant volumes of mobile phone chipsets) and Leadcore Technology (an emerging player) that are targeting the tablet space (especially cellular-enabled tablets). Possessing a clear advantage over traditional players such as Rockchip and Allwinner, they may put additional pressure on them.

Intel renews efforts to penetrate Chinese tablets

One additional development is the significant investments Intel is making to gain a foothold in the mobile space. Using a "contra-revenue" strategy, Intel is targeting tablets with subsidized Atom chip offerings to gain market share. Although Intel's products, thanks to an advanced fabrication process, generally have excellent performance (both CPU and GPU, as well as battery life), the Android software ecosystem is still more geared towards ARM-based platforms, with potential incompatibilities. Although earlier expectations that Intel would heavily target the low-cost white-box tablet market in China (affecting the above-mentioned companies) have not quite come true yet, Intel has already gained design wins for more lucrative platforms for brand name tablets in China as well as Taiwan (the likes of Lenovo, Asus and Acer). At this point, market share figures do not present evidence that Intel is already bulldozing its way into low-cost Chinese tablets.

However, Intel has recently increased its focus on Chinese white-box tablet manufacturers, striving to ship a total of 25 million tablet processors in the second half of 2014. Design wins based on new, more cost-effective platforms, such as the quad-core Atom Z3735 series, which includes models with a more economical 32-bit DRAM interface and other cost improvements, are expected to be in production by October. Intel will also push its SoFIA platform towards the end of 2014, extending its offerings for 3G-enabled tablets and smartphones. Seperately, Intel also has an agreement with Rockchip involving Rockchip's integration of an Atom processor and Intel 3G modem for a SoC product targeting the tablet space in 2015.

High-end and captive players: Qualcomm, Samsung, and Apple

Qualcomm, the dominant provider of smartphone SoCs for mid to high-end platforms, also targets tablets, and has recently been used in models from major brands such as Samsung, Sony and Amazon, especially 4G cellular data-enabled models. Additionally, both its Snapdragon 800 and 801 series include SoC versions without cellular modems, specifically targeting WiFi-only tablets.

Samsung has also been developing tablet processors for some time. Over the years, its Exynos SoCs have been used several of its own tablet models, and Samsung has a history of offering selected models (such as Exynos 4412 and the recent Exynos 5260) to Chinese manufacturers.

Finally, Apple has been using custom-designed application processors in its iPads, as well as iPhones, for some time. Generally these have been high-end designs, targeting performance more than lower chip production cost, because the margins on Apple devices are of such magnitude that a higher cost chip has little influence. Up until Apple's A7 used in the iPhone 5S, Apple concentrated on application processors manufactured at Samsung, with the cellular modem functionality typically provided by a seperate Qualcomm baseband/RF chipset. As of 2014, Apple has been ramping its new Apple A8 processor at TSMC using a leading 20nm process, and there has been speculation that this new SoC probably contains a baseband modem as well.

(Click on image to enlarge)

Sources: EE Times, DigiTimes (1H 2013 shipments), DigiTimes (Q4 2013 shipments), DigiTimes (Q1 2014 shipments), DigiTimes (Q2 2014 shipments), DigiTimes (MediaTek benefits from move to cellular functions in tablets)MEPTEC (integration of WiFi/Bluetooth/GPS reduces cost), DigiTimes (Intel push in 2H 2014), AnandTech (Intel SoFIA platform)

Updated September 19, 2014.

Monday, August 25, 2014

Rockchip and Allwinner: Between a rock and a hard place

Origins of chip design companies in China

Starting from about 2004, an increasing number of chip design companies has been appearing in China, mostly targeting the high-volume consumer devices for which China has a dominant role in terms of manufacturing capacity.  Additionally, some Chinese device companies have started to gain brand recognition and grown very large, with several of the largest smartphone brands now being based in China, for a large part based on export sales.

Some of the earliest Chinese chip companies that were succesful targeted earlier product trends that have since waned, such as the MP3 players that peaked around 2006. Actions Semiconductor was one of those companies. Several companies that now target newer markets such as tablet processors have their origins among the older MP3 chip player companies. For example, Rockchip and Actions Semiconductor, which still exists, both started as MP3 player chip companies, and Allwinner Technology was started by former Actions employees. The three mentioned companies have been among the largest providers of tablet processors for Chinese manufacturers in recent years, covering a significant amount of worldwide unit shipments of tablet processors.

Another area where Chinese chip companies achieved success is basic and feature phones. Speadtrum and RDA Microelectronics achieved significant sales levels with complete chipsets, including cellular baseband and radio, for the numerous low-cost mobile phones manufactured in China, with many of being them exported to other parts of the world. Both Spreadtrum and RDA Microelectronics have since been acquired by the government-affiliated Tsinghua Unigroup. Spreadtrum has also developed smartphone SoCs, while another company in this segment is Leadcore Technology. HiSilicon, Huawei's chip division, is a well funded chip designer that, amongst several other product areas, designs SoC chips incorporated into Huawei smartphones.

Allwinner and Rockchip have dominated Chinese tablets

Allwinner and Rockchip have traded places a few times as the top supplier of SoCs for Chinese tablets (and indeed worldwide in terms of unit volume). Rockchip was prominent up to the beginning of 2012 when volumes were still low. In 2012 Allwinner gained the upper hand with their cost-effective, integrated A10 series of processors, targeting mostly the lower priced segment of the market. Subsequently, building from its foothold in higher-performing processors Rockchip regained lost ground and was reported to be the largest provider of tablet processors from Q4 2013 until recently.

A few years ago, the tablet processor market was profitable. However, competition has increased and profit margins have come under pressure. Apart from being affected by inventory cycles among Chinese tablet manufacturers (the white-box tablet market stalled significantly at the end of 2013, with significant inventories), competition between tablet processor providers has eroded the selling price of the chips.

(Click on image to enlarge)

Cellular data functionality transforms market

Relatively recent competitors, most prominently Taiwanese company MediaTek that dominates the smartphone chip market in China, have provided further pressure because of their ability to integrate functionality into a single chip that reduces external components (such as WiFi functionality and 3G modems), while also having advantages in power consumption and performance (also through better device drivers and firmware). Other Chinese companies, such as Leadcore and Spreadtrum have also introduced tablet processors with integrated 3G modems. Increased demand from various parts of the world has made tablets with 2G/3G cellular data or voice capability an increasingly large part of the market (between 25% and 50% currently).

Crucially, Rockchip and Allwinner do not possess wireless baseband technology, or are only in the early stages of licensing or obtaining it (for example, Rockchip has made a deal to integrate an Intel 3G modem and Atom processor into a chip targeting tablets). Even for WiFi-only tablets, the ability by competitors to integrate WiFi functionality results in a cost disadvantage for Rockchip and Allwinner. This puts great pressure on their ASPs, so it is not surprising that Allwinner announced that their new quad-core A33 chip targeting low-cost tablets has a selling price of only $4.

Rockchip unlikely to make much profit on chip sales

In this context, you can ask whether Rockchip will continue to be able to reap any profit from selling chips such as its existing RK3188(T), which is manufactured at GlobalFoundries using a 28nm HKMG process and uses Cortex A9 cores that are relatively expensive in terms of manufacturing cost (as well as non-optimal for power consumption). Rockchip's lower-end products (such as RK3168 and RK3026) also use Cortex-A9 cores. Nevertheless, Rockchip continues to ship very high volumes, leading to questions about how it is able to continue to do so despite the potential for significant negative cash flow on its product sales.

However, based on a reported roadmap of upcoming chips, even Rockchip is finally moving to more cost-effective and less power-consuming chip architectures. Rockchip's  RK3126 and RK3218 are low-cost quad-core Cortex-A7-based tablet processors with Mali GPU, closely matching Allwinner's new A33 chip. Additionally, the future "MayBach" SoC will contain an octa-core ARM Cortex-A53 CPU, the type of configuration that has already started to show very impressive performance test results in early benchmarks while likely relatively low power-consumption, which is likely to be much more viable than big.LITTLE for the higher-performance segment. However, it still take to time for these new designs to appear in the market and fully replace existing Rockchip tablet processors, during which Rockchip will continue to have limited profit margins on its chips and moderate power efficiency when compared to chips from competitors.

Intel threatening to conquer Chinese white-box tablets

An additional challenge may be provided by Intel, which is increasing its focus on the Chinese white-box tablet market with more cost-effective products, with signficant investments being made according to its "contra-revenue" strategy. Largely because Intel manufactures these SoCs using an advanced 22nm process, they have considerably higher CPU and GPU performance than competitors. Meanwhile, the competitive disadvantage within the Android software ecosystem of being a non-ARM platform is diminishing. Low-cost product such as the Atom Z3735E and Z3735G with a 32-bit memory interface, with lower but probably still respectable performance, which Intel is currently pushing in China, may be attractive for manufacturers even without significant "contra-revenue" subsidies.

Lack of focus, failed product introductions

Allwinner, to its credit, has been showing more interest in other applications apart from Android tablets, such as development boards used by the open source community as well as companies targeting specific applications. Allwinner chips were already relatively popular for this type of application due to reasonable driver support, source code availability, and flexible system boot ability of Allwinner-based devices, and a grass-roots volunteer development community for Allwinner chips had already existed for some time. However, outside of the Raspberry Pi, the chip sales volume for this segment is negligable compared to even one successful high-volume tablet model from a large Chinese manufacturer.

Meanwhile, both Allwinner and Rockchip have seen several major planned product introductions go pretty badly. Allwinner's dual-core A20 processor introduced in 2013, presumably manufactured at 55nm in order to be pin-compatible with the older A10, encountered hardware and software issues that prevented it from being a success in the tablet market. The use of a trailing-edge 55nm process for Cortex-A7 cores was unusual, and may have contributed to the problems. Their A23 also wasn't very successful. Rockchip for much of the last year has been relying on its higher end quad-core RK3188 and older devices, as its lower-end dual-core RK3168 was severely delayed (it has started appearing in devices in the last few months).

Risky, unfocused investment in high-end products

Both companies have also been ambitious in their design for new higher-performance chips for this year. Allwinner has been planning to introduce the A80, a ARM big.LITTLE configuration with four Cortex-A15 and four Cortex-A7 cores and PowerVR Rogue GPU, while Rockchip has been trying to ready their RK3288, which according to most reports is using a quad-core Cortex-A17 CPU and Mali-T764 GPU, although I would be inclined to have some doubts about that because of a previous indications (1) (2)  from Rockchip that the cores could be Cortex-A12, which would fit well with an announcement from Rockchip's foundry partner GlobalFoundries a few months ago that it was the first foundry to have started manufacturing chips using the Cortex-A12. Both chips have CPU cores and GPU cores that are much larger and power hungry than cores previously used by both companies, and power consumption has already been reported to be a major issue. This is problematic, because this may make the products unsuitable for the high-volume tablet market that is the bread and butter of both companies, while causing problems even for lower-volume, less power-sensitive applications such as media boxes.

Both chips were supposed to be already broadly established in the market by now, but instead have been continuously delayed, with more careful examination suggesting that these chips may have had some hardware issues requiring redesigns, apart from problematic power consumption. On the internet, information on the availability status of these chips is chaotic since the definite availability or arrival of these chips is frequently trumpeted based on the fact that a pre-order of a vaguely described device with conflicting specs using the new chip can be made using some unreliable Chinese webshop. Rockchip appears to have significant numbers of half-functional RK3288 chips in the hands of customers, who have made limited production runs of devices, sometimes with attempted work-arounds for hardware bugs, many of which also appear for sale. However, a true chip respin and subsequent ramp of volume production can take several additional months, and the financial risk is even higher when material volume production has already occurred of a chip with defects apparently missed during earlier verification, because those chips can be effectively useless.


In summary, these Chinese tablet processor companies have seen the market for tablet processor SoCs targeting WiFi-only tablets commoditize, with greatly reduced profit margins. They currently lack the technology to effectively compete with competitors that already offer higher amounts of integration. They have also seen challenges with key product introductions, are currently having problems introducing ambitious, higher-performance chips that were announced some time ago but have been proven hard to bring to market in fully working order. When they eventually come to market, high power consumption/heat or other issues related to their complexity are likely to constrain their success in terms of sales volume. Allwinner appears to be doing the sensible thing with the new A33, a quad-core processor designed for low production cost. If the ramp goes smoothly, it will be popular and help Allwinner recover its market position to some extent, although the high profit margins of earlier days are long gone.

Sources: Digitimes, Wikipedia (Rockchip article), Wikipedia (Allwinner Technology article), Wikipedia (Actions Semiconductor article), Wikipedia (Spreadtrum article), CNXSoft, CNXSoft (Rockchip roadmap)

Updated September 28, 2014 (Add information about upcoming, more efficient Rockchip SoCs).

Sunday, August 24, 2014

ARM Mali-400 more succesful than ever, dominating the cost-sensitive GPU segment

Long history of adoption, increasing success

The ARM Mali-400 MP GPU core was introduced many years ago as the world's first OpenGL ES 2.0 conformant multi-core GPU, and was the first ARM-developed GPU core to see widespread adoption, particularly in configurations of multiple Mali-400 MP cores. The longevity of the Mali-400 MP continues to be remarkable. The scalability through the number of cores has allowed it be targeted at different segments, ranging from low-end to mid-range, and it continues to be used for much of that range. As of 2014, many more Mali-400 MP cores are shipping than in any point in its history.

Although introduced well before the emergence of current mainstream process technology such as 28nm, adoption of the Mali-400 MP at the 28nm process node has been very strong. Small die-size and low power consumption has allowed the use of multi-core configurations clocked much higher than early implementations of the Mali-400, and the ability to increase the size of the L2 cache inside the GPU (although limited to 256KB) has provided further performance flexibility. The GPU has also benefited from increases in memory bandwidth in modern devices.

Features of Mali-400 MP

The Mali-400 MP GPU core uses ARM's first-generation Utgard GPU architecture. Like other mobile GPU architectures, it uses a tile-based rendering architecture which reduces memory bandwidth requirement and power consumption. It allows good quality full-scene anti-aliasing (FSAA) without a significant effect on performance. The Mali-400 cuts some corners in shader floating point precision compared to competing solutions, supporting only the minimum precision required by the OpenGL ES 2.0 standard, although this is unlikely to be highly visible on the relatively small displays used in most mobile devices.

Pixel fill-rate has been its strongest point, while historically having lower triangle throughput than competing GPUs from Imagination's PowerVR series. The pixel fill-rate scales with the number of cores used, while maximum triangle throughput depends only on the clock frequency of the GPU. The increase in display resolutions in mobile devices, causing increasing pixel fill-rate requirements, can be addressed by increasing the number of cores. Typical clock frequencies used for the Mali-400 MP include 250MHz for 40nm and 500MHz for 28nm HPM.

Displacement of competitors

MediaTek's shift from mainly PowerVR GPUs to mainly Mali-400 GPUs in the second half of 2013 and their success in the cost-sensitive smartphone market and subsequent penetration of the tablet market has significantly increased the unit market share of ARM's Mali GPUs, while impacting the market share of Imagination's PowerVR. Another company using Mali-400 cores in significant volume for smartphones is Spreadtrum, which targets low-end smartphones.

Apart from smartphones, an increasing adoption of Mali-400 can also be observed in the high-volume tablet market, especially for low-end platforms, at the expense of mainly Imagination's PowerVR, while Vivante's GPU cores have increasingly been marginalized. Most of the highest volume tablet SoCs over the last few years have been equipped with Mali-400, such as Allwinner's A1x and the currently ramping Allwinner A33, Rockchip's RK3066, RK3188 as well as Rockchip's low-end platforms.

Other Chinese chip companies that are well-funded or have potential for growth, such as HiSilicon and Leadcore Technology, currently also concentrate on Mali-400 series cores.

Mali-450: A faster, more efficient Mali-400

The Mali-450 MP is a more recent GPU core which, while remaining mostly limited to the feature set of the Mali-400 MP (such as no support for OpenGL ES 3.x), is significantly faster than the Mali-400 MP and is likely to be relatively efficient. Vertex processing throughput (triangles) is doubled compared to an identically-clocked Mali-400 MP, and although pixel fill-rate per core is similar to that of the Mali-400 MP, Mali-450 MP cores can generally be clocked higher. It also includes additional architectural optimizations designed to minimize power use and memory bandwidth requirements. Compared to the Mali-400 MP, the Mali-450 MP increases the maximum number of cores from four to eight.

MediaTek has adopted this core for some smartphone and tablet chips. Due to continuing reliance by mobile graphics applications on OpenGL ES 2.0 as the de-facto standard, the Mali-450 MP GPUs are likely to remain viable as above-average performance GPUs and have benefits in terms of power consumption and cost, while avoiding the inherent overhead associated with the need to support Open GL ES 3.x and other APIs in modern GPUs such as Mali-T6xx, Mali-T7xx and the PowerVR Rogue series.

However, ARM's Mali-T7xx series, and particularly the upcoming lower-end Mali-T72x series, are designed to work in tandem with other new IP cores from ARM (including CPU cores, video processing cores and 2D graphics cores) and provide potentially significant power saving and performance improvement from the use of framebuffer data compression techniques, which reduces unnecessary memory access associated with unchanged regions of the screen or framebuffer, reducing memory bandwidth requirements. This may increase performance in low-end devices without requiring any costly upgrade from the typically used 32-bit DRAM interface with memory clocked at power-friendly frequencies.

Limited memory bandwidth is already likely to be a bottleneck for the Mali-450 GPU as implemented in chips such as MediaTek's MT6592 and MT8127, which have a 32-bit memory interface, similar to the interface used in lower end chips with a Mali-400 GPU. Chips using next-generation GPUs such as Mali-T6xx/T7xx or PowerVR Rogue typically have a dual-channel or 64-bit DRAM interface. It would be interesting to see to what extent a Mali-450 implementation could benefit from a similar higher-performance memory interface.

Overview of notable SoCs using Mali-400 or Mali-450


(Click on image to enlarge)

Sources: ARM (Mali-400 MP page), ARM (Mali-450 MP page), GPU GFLOPS

ARM Cortex-A7 dominates cost-sensitive mobile CPUs from low-end to mid-range

Introduction and features of Cortex-A7

ARM's Cortex-A7 core is described by ARM as the most power-efficient processor it has ever developed, which has led the multi-core revolution for entry-level and mid-range smartphones, shipping in huge volumes. Examination of the market shows that this is not an exaggeration by ARM.

The Cortex-A7 is related to the older Cortex-A5, which also targets power-efficiency, building on the latter's efficient 8-stage pipeline. It is an in-order pipeline, non-symmetric dual-issue processor with a pipeline length between 8 and 10 stages. L1 instruction and data caches are configurable from 8KB to 64KB, and a L2 cache up to 1MB is supported. Cortex-A7 cores are typically configured with ARM's NEON SIMD engine.

When introduced a number of years ago, the Cortex-A7 was described by ARM as a power-efficient CPU core with a focus on its use in a big.LITTLE configuration with both power-efficient Cortex-A7 cores and high-performance, but less power efficient Cortex-A15 cores, with which the Cortex-A7 is architecturally compatible. Although significantly slower than the Cortex-A15, the Cortex-A7 is described as being 2.3x to 3.8x more power-efficient than the Cortex-A15 on a performance/Watt basis. ARM also mentioned that the Cortex-A7 itself has considerable performance potential, being faster than the older Cortex-A8 core for a fraction of the power.

Another critical feature of the Cortex-A7 is its extremely small die size (area used for the core within a chip), being as small as 10-20% of the size of previous generation Cortex-A8 or Cortex-A9 cores, with significant implications for chip cost, as well as contributing to its power efficiency, and allowing configuration such as octa-core without major cost implications.

Success in multi-core Cortex-A7-only CPU configurations

Somewhat unexpectedly, its greatest success by far has come from the adoption of symmetric Cortex-A7-only configurations (such as dual-core, quad-core or even octa-core) for cost-sensitive, low-power applications. The Android OS and its applications benefit from the presence of multiple very power-efficient CPU cores, which generally can also be powered or clocked down as needed, and the performance limit on single-thread performance has not proven to be a bottleneck in practical terms.

As more advanced processes such as low power 28nm came into production, the competitive advantages of the Cortex-A7 became very apparent, with its very small die size (0.45mm^2 per core) and low power consumption (< 100 mW) enabling very cost-effective and power-efficient application processors for use in SoCs targeting smartphones and other devices. Clock frequencies at 28nm have so far ranged from 1.2GHz to 1.7GHz, with an increase over time due to process and design improvements. The  advantages of multi-core Cortex-A7 configurations in terms of die size and power-efficiency have significantly advanced the capabilities and lowered the cost of low-end smartphones SoCs, while greatly decreasing the cost of smartphone SoCs targeted at the mid-range segment. The cost of chip platforms for such segments has also been reduced because of the increased potential to integrate external components and functionality into the SoC, and can also be associated with significant increases in performance within a given chip cost budget (for example, use of the Cortex-A7 may allow a larger L2 cache, larger/faster GPU, or better video decoding core).

More efficient and less problematic than other ARM cores

A few years ago, the potential of multi-core Cortex-A7-only processor configurations was not immediately obvious, and several chip companies gave it little attention, instead continuing to focus on the older Cortex-A9 core or exclusively using it in big.LITTE configurations with Cortex-A15 cores, or designing chips using only Cortex-A15 cores, even for mobile applications. In general, companies that did not adopt Cortex-A7 were significantly less successful in the marketplace, as well as suffering higher product cost.

The high power consumption of the Cortex-A15 has proven to be particularly problematic. ARM's big.LITTLE system has taken considerable to mature and become viable in the market, with significant volumes largely limited to Samsung's Exynos product line, and until recently very limited use in the highest volume smartphone segment. ARM later attempted to address the power-efficiency concerns associated with the Cortex-A15 with the Cortex-A17 and Cortex-A12 cores.

Competitive advantage for early adopters of Cortex-A7

Companies that benefitted from timely adoption of the Cortex-A7 include MediaTek, which from 2013 significantly improved the usability and performance of low-cost smartphones with power-efficient and cost-effective Cortex-A7 based SoCs. MediaTek also pioneered octa-core configurations of the Cortex-A7, allowing it to address higher-priced segments. Qualcomm, the leading smartphone chip company dominating the mid-to-high-end segment, largely abandoned its proprietary Krait cores for the more cost-sensitive part of the market, including mid-range, in preference for low-cost, power-efficient SoCs with quad-core Cortex-A7, as implemented in the Snapdragon 400 series.

Some notable companies using the Cortex-A7:
  • Prominent Chinese tablet processor supplier Allwinner Technology introduced the A31/A31s (with quad-core Cortex-A7) in 2012. Manufactured at 40nm for the high-end of the Chinese tablet market (it also contained a unusually powerful GPU), this chip saw some success in the first half of 2013 and was one of the first Cortex-A7-based chips to come to market. The 40nm process limited performance and power-efficiency, and Allwinner had more serious problems problems when they attempted to adopt the Cortex-A7 in their dual-core A20 processor manufactured at 55nm to be pin-compatible with their older A1x line. With the recently announced A33 processor, Allwinner has finally arrived at the proven low-cost combination of a quad-core Cortex-A7 with Mali-400MP2 GPU that has already been succesful for MediaTek and others, although Allwinner's chip is still manufactured at 40nm (which could be an advantage in the current tight capacity environment for 28nm).
  • MediaTek widely adopted the Cortex-A7 in 2013, first in the quad-core MT6589, then in the dual-core MT6572 and quad-core MT6582, which currently dominate the low-end smartphone market, shipping hundreds of millions of units this year. The octa-core Cortex-A7-based MT6592 has seen success in the higher-priced segment in China. MediaTek has also penetrated the tablet market with Cortex-A7-based SoCs.
  • Qualcomm has widely adopted the Cortex-A7 in its volume-driving Snapdragon 400 series, in preference over its own Krait cores.
  • Samsung first adopted Cortex-A7 as part of the big.LITTLE configuration in its Exynos SoCs, primarily shipping in tablets. Recently, Samsung has also started adopting a Cortex-A7-only architecture for high-volume smartphone applications.
Examples of companies not adopting Cortex-A7:
  • NVIDIA did not adopt the Cortex A7, instead focusing on the more performance-oriented Cortex-A9, Cortex-A15 and the development of its own ARMv8 implementation. Despite significant investment, NVIDIA has failed to gain traction in the smartphone market.
  • Rockchip, a Chinese company that until recently led the market for tablet processors in China, did not adopt the Cortex-A7. While its Cortex A9-based RK3188T (manufactured using a relatively advanced 28nm HKMG process), and to a lesser extent its RK3168, have shipped in significant volume, price pressure and the relatively high production cost of its processors is likely to have impacted its profit margins.
  • Texas Instruments developed the OMAP 5 series using the Cortex-A15 before virtually discontinuing the product line.


Cortex-A53 promises to extend ARM's lead

The logical successor of the Cortex-A7 for ARM's 64-bit AArch64 architecture, the power-efficient Cortex-A53, has strong similarities in design and has seen widespread adoption for upcoming SoCs, also promising to further extend the domination of standard ARM cores into the high-end segment.
Sources: Wikipedia (ARM Cortex-A7), ARM (Cortex-A7 page), ARM (big.LITTLE white paper)

Samsung ships Galaxy S5 Mini in volume using Exynos 3470

S5 Mini shipping with lower performance SoC

Not long ago, Samsung announced the Galaxy S5 Mini, a smaller and somewhat cheaper version of the Galaxy S5, which similar external design and software features, which is now starting to ship in volume Europe.

Although marketed not far below the Galaxy S5's spot in the higher-priced segment, the Galaxy S5 Mini contains Samsung's Exynos 3470 SoC, a chip with a CPU and GPU configuration more reminiscent of lower-priced devices, with considerably less performance potential than the Qualcomm Snapdragon 800 series chips used in the Galaxy S5, and potentially lower than competing mid-range devices utilizing chips from Qualcomm's Snapdragon 400 series.

Whether this will affect its sales performance is debatable, because most end users are not likely to notice lesser maximum performance in demanding games or benchmarks, as long as performance, smoothness and battery life remain appealing for every-day use, while the Samsung Galaxy S5 branding and design is likely to be attractive in the market. Some early reviews seem to be positive regarding user experience.

Features of Exynos 3470

Although the Exynos 3470 is labeled as being part of the Exynos 3 series, which further includes an ancient single-core Exynos 3110 application processor SoC that is several years old, it is in fact a newly designed chip manufactured using Samsung's 28nm process (probably not much different from the one used for the much more complex Apple A7 chip). It is the first Exynos chip to exclusively use a quad-core ARM Cortex-A7 CPU configuration (clocked at 1.4 GHz), that due its superior performance characteristics (consisting of low cost and die area, low power consumption and adequate performance) has already been widely adopted by MediaTek and Qualcomm and currently dominates the entire entry-level and mainstream segments of the smartphone market.

Current information points to the use of an integrated 2G/3G/4G baseband inside the Exynos 3470, which is a first for a mainstream mobile Samsung SoC. In fact Samsung's web-page describing its new Exynos ModAP series, although very general, seems to match the specifications of the Exynos 3470 and S5 Mini. Being a new modem implementation of unknown origin, closer examination of the Exynos 3470 and S5 Mini's cellular radio performance (such as voice and data reception) will be enlightening. Along with Samsung's use of an Intel modem chip in the Galaxy Alpha, the S5 mini seems to confirm a clear trend away from Qualcomm baseband (as well as application processor) technology at Samsung. The timing will be helpful for Samsung and the industry because it can alleviate the current shortage of smartphone SoCs produced at TSMC.

S5 Mini design choices balancing performance with cost

The use of a single 1.5GB LPDDR3 DRAM package (presumably containing one 12Gbit chip) is economical, approaching the current sweet spot for gains in Android system performance with lower cost than 2GB DRAM. The maximum 6.4GB/s memory bandwidth mentioned by Samsung points to the use of a single-channel 32-bit LPDDR3 memory interface at a maximum rate of 800 MHz, lower than high-end dual-channel platforms such Snapdragon 800, but probably somewhat higher (depending on the actual memory clock used in the S5 Mini) than typical configurations used by Snapdragon 400 and MediaTek platforms.

By user a lower resolution 720p display instead of 1080p used in the Galaxy S5,  Samsung significantly reduces the burden on the CPU, GPU and memory subsystem, which is highly important given the much lower performance headroom of the Exynos 3470 platform, especially the GPU and memory subsystem.

Mali-400 MP GPU still viable, even for mid-segment

The new chip illustrates the remarkable longevity of ARM's Mali-400 MP GPU core, many years after its introduction. With its ARM Mali-400 MP4 GPU (a four core configuration), the Exynos 3470 uses the same GPU architecture as much older Exynos chips that were already used in devices such as the Galaxy S II several years ago, although at the higher clock speed permitted by the current 28nm process. The Mali-400 MP GPU architecture currently has a dominant position in the SoC market for entry-level smartphones through chips such as MediaTek's MT6572 (single Mali-400 core) and MT6582 (two cores) as as well as most of the tablet market through Rockchip's RK3188(T) (four cores) and various other products from companies such as MediaTek and Allwinner. It will probably continue to be used for some time, and is still being designed into new chips such as Allwinner's A33 with Mali-400 MP2 targeting low-end tablets.

Update of December 5, 2014

Making a count of Geekbench entries of Exynos 3470 vs Snapdragon 400-based models after a number of months of production should given an indication of the level of production of Exynos 3470 for the Samsung Galaxy S5 Mini. The following is apparent:
  • The Exynos 3470-based SM-G800R4 has a count of 4, SM-G800M a count of 17, SM-G800Y a count of 7, SM-G800F a count of 383.
  • The Snapdragon 400-based SM-G800A has a count of 8, SM-G800H a count of 210.
Although Exynos 3470-based entries hold the clear majority, the overall number of entries a relatively low, many times lower than that of the Galaxy Note 4 also significantly lower that the number of entries for the Galaxy Alpha. However, the average buyer of a Galaxy S5 Mini is probably much less likely to run Geekbench on their device, so the actual number sold in comparison with the mentioned high-end models may be greater than it seems.

Sources: Wikipedia, iFixit, Samsung, CHIP (German)

Updated December 5, 2014.

Saturday, August 23, 2014

Samsung announces 20nm Exynos 5430, used in Galaxy Alpha

Exynos 5430 built with improved process, higher efficiency

It was recently reported that Samsung has announced the Exynos 5430 SoC, manufactured using a new 20nm process. This chip, which has strong similarities with previous generation Exynos chips such as Exynos 5420 and 5422 manufactured at 28nm, is likely to be an application processor that integrates CPU, GPU and other processing cores, but does not integrate a baseband and other RF-related interfaces like modern smartphones SoCs from Qualcomm and others.

Like Exynos 542x, the chip utilizes ARM's big.LITTLE architecture for the CPU with heterogeneous multi-processing capability (HMP), also called Global Task Switching (GTS), containing four fast but relatively power-consuming Cortex-A15 cores and four slower but power-efficient Cortex-A7 cores. HMP allows all eight cores to run simultaneously without significant restrictions (previous big.LITTLE implementations use cluster migration that only allows the use of either the A15 or the A7 cores at the same time, which is less efficient). Unlike previous Exynos chips such as 542x, actual application of HMP in practice is reported to be more feasible with the new chip.

On the GPU side, Samsung continues to use a Mali T6xx series GPU, more specifically the Mali-T628 MP6 GPU. Although reasonably fast and supporting OpenGL ES 3.x according to the specifications, this GPU is likely to be less economical (more die space) and less power-efficient than next generation Mali T7xx cores, and has seen little use outside of Samsung Exynos chips. Additionally, because this GPU core came to market before the finalization of the OpenGL ES 3.1 specification, support or performance for this standard may not be optimal. However, as long as OpenGL ES 2.0 remains the dominant API for mobile devices, this may not be very apparent to most end users.

Samsung strongly motivated to use more Exynos SoCs

Samsung has several reasons to try to increase internal SoC production for devices such as smartphones and tablets, including excess capacity in their logic fabs to due the loss of Apple application processor orders, and tight supply of Qualcomm SoCs that have recently been used in the vast majority of Samsung's smartphones.

Additionally, relying less on Qualcomm chips may reduce the effective amount of patent royalties that Samsung has to pay to Qualcomm. Although Qualcomm strives to collect significant royalties on all smartphones, its licensing policy has recently come under pressure in China and other countries, and not using Qualcomm chips may make it easier to contest royalty payments. And finally, the internal production cost of the Exynos chip (assuming reasonable yield rates) added to the selling price of Intel's modem chip, may still be lower than the price that Qualcomm asks for integrated SoCs such as the Snapdragon 800 series, since it has a virtual monopoly in that segment.

Several, but not all models of Galaxy Alpha use Exynos 5430

Samsung is reportedly using the 5430 in at least some variations of the new Galaxy Alpha smartphone. Other variations, such as models targeted at the US market, still use a Qualcomm processor. In the Exynos 5430-based models, LTE modem functionality is provided by a seperate Intel XMM7260 modem chip, which is a significant departure from Samsung´s almost exclusive use of integrated Qualcomm basebands/modems recently, and marks one of the rare occasions of Intel achieving a prominent design win for chips in smartphones.

Although Samsung in the past also announced use of an Exynos chips in selected models of the Galaxy S4 and S5, in practice these were little more than token announcements, involving negligable unit volume, and you have to go back to the Galaxy S3 and S2 for more material use of Exynos chips. However, it is likely that this time, for several reasons, Samsung is again, as it needs to be, more serious about shipping material amounts of Exynos chips in smartphones.

Technological improvements over Exynos 542x

Inside the 20nm Exynos 5430, some improvements over the previous Exynos 542x chips have been reported, other than the process feature size reduction from 28 to 20nm. The Cortex-A15 cores, infamous for relatively high power consumption, have been improved from the previous r2p4 to the more recent r3p3 revision, improving power characteristics, and likely facilitating the use of HMP. ARM already offers inherently more power-efficient developments of the A15 core, such as the Cortex-A17 and Cortex-A12, but these are relatively new and still largely untested for high volume production.

Also mentioned are additional process technology improvement in Samsung's 20nm HKMG process over 28nm HKMG, including the use of gate-last instead of gate-first for high-K metal gate formation and other improvements. Samsung is quoting a 25% reduction in power consumption from the process shrink alone, and the additional improvements may further reduce power consumption. To what extent these improvents result in good power consumption characteristics and battery life in devices such as the Exynos-powered Galaxy Alpha remains to be seen.

Risks involved, and Samsung still behind on integration

Actual yield rate, process maturity and ability for high-volume production is still unclear, with a limited planned production of one million Galaxy Alphas being reported. It is obvious that Samsung is well behind TSMC in 20nm process timing, capacity ramp, process maturity and quality, since TSMC has been ramping 20nm for several months in increasingly high volume for Apple for complex, highly integrated chips that are more complex than Exynos.

SoCs manufactured by Samsung for smartphones and tablets have generally been limited to application processors, while TSMC already has been producing more complex and technologically challenging SoCs incorporating basebands and other RF-related interfaces for quite some time in high volume for customers such as Qualcomm and MediaTek, spanning the low-end to the high-end. Although Samsung has been successful with production of application processors like the A6 and A7 for Apple, the higher amount of integration offered by TSMC improves cost and power consumption significantly is probably the major reason for Apple's recent move away from Samsung to TSMC. That said, Samsung is now shipping a new cost-effective SoC chip, the Exynos 3470, that likely has an integrated baseband.

Update: Global Task Switching utilizes all cores, Cortex-A15 clocked up to 1.8 GHz

Analysis of Geekbench results for the Exynos 5430-based SM-G850F (Galaxy Alpha) shows a multi-core performance scaling factor of about 4.66 for the largely CPU-bound JPEG Compress test, suggesting that Global Task Switching is indeed implemented so that not just the Cortex-A15 cores are utilized but the Cortex-A7 cores as well when high CPU performance is required. The results (in particular the JPEG Compress subtest, when compared with other devices using Cortex-A15) are also consistent with a maximum CPU frequency for the Cortex-A15 cores of 1.8 GHz.

Update of December 5, 2014

Making a count of Exynos 5430 vs Snapdragon 801-based models after a number of months of production should given an indication of the level of production of Exynos 5430. The following is apparent:
  • The Exynos 5430-based SM-G850F has a count of 1202, SM-G850M a count of 6, SM-G850Y a count of 50, SM-G850FQ a count of 39, SM-G850L a count of 85, SM-G850K a count of 26, SM-G850S a count of 72.
  • The Snapdragon 801-based SM-G850A has a count of 139, SM-G850W a count of 38, SM-G8508S a count of 3.
Based on these statistics, Exynos 5430-based models dominate with a share of 89% of Geekbench entries. However, the Galaxy Alpha has shipped in far smaller numbers that the Galaxy Note 4, which has at least ten times as much Geekbench entries.

Sources: AnandTech

Updated (November 2, 2014): Provide update based on Geekbench results (GTS, CPU clock speed).
Updated (December 5, 2014): Fix Geekbench result link (was pointing to Exynos 7 Octa entry), add Exynos share analysis based on Geekbench database.

Shortage of capacity for smartphone chip production at TSMC getting worse

TSMC fully booked for the year, Apple major factor

Last week, DigiTimes reported that TSMC's production capacity for the fourth quarter of 2014 was already almost fully booked, a scenario that had not occurred for many years. The tight capacity was described as being caused by a ripple-effect due to TSMC landing CPU orders from Apple, which has also brought in other peripheral IC orders for iPhone, iPad and iWatch devices. This has forced chip suppliers for mobile devices to scramble for more wafer production capacity.

TSMC dominates low power smartphone chip production

TSMC, based in Taiwan, is the leading independent semiconductor foundry in the world, and currently dominates the production of advanced lower-power, integrated smartphone SoCs with its 28nm process technology. A significant lead with respect to process technology and production quality (especially the ability to bring leading-edge processes to high volume production well before competitors, and with higher levels of integration) has led to the proportion of smartphone SoC chips manufactured at TSMC, which was already high, to increase signficantly in 2014, primarily because of the shift by Apple of the production of its SoC chips used in iPhones and iPads (such as the A8 used in upcoming iPhone 6 models) from Samsung Semiconductor to TSMC.

TSMC has already for some time produced the vast majority of SoC chips for leading smartphone chip companies Qualcomm and MediaTek, who have a virtual duopoly for the supply of smartphone chips for the Android platform that dominates the industry (with Qualcomm serving primarily the high-end and MediaTek primarily the low-end), and with the addition of Apple now manufactures the overwhelming majority of smartphone chips worldwide.

Shortage has already affected industry, high stakes involved

However, already in April 2014 it was reported that the lead time for advanced 28nm processes (used for the overwhelming majority of Qualcomm's and MediaTek's smartphone chips) had extended to several months. The allocation of significant capacity to TSMC's new 20nm process (which is an extension of the existing 28nm process technology used by TSMC) for Apple's new chips has made the shortage of capacity even more acute.

Behind the scenes, it is likely that a high-stakes power struggle for production capacity at TSMC has been unfolding between the leading smartphone chip companies Qualcomm, Apple and MediaTek, as well as other parties depending on TSMC capacity such as NVIDIA, Marvell and Broadcom. Apple and Qualcomm are two of the richest technology companies in the world, with tens of billions of dollars of cash, and can commit billions of dollars for securing capacity (and have done so). Meanwhile, MediaTek is close neighbour of TSMC in Taiwan with intricate ties at several levels, so is not likely to be displaced.

Sources: DigiTimes