In order to make cores fasters, more execute units are added to the cores, cache and buffer sizes are increased, frequency is increased etc. But as the chips get bigger, they will have more transistors and require more power to switch or maintain state. Although, the newer Cortex A15 improves device performance, but during its peak utilization, it draws more power than Cortex A7. Even at The idea of big.LITTLE is to pair up a tiny and ultra-low power Cortex A7 core with a faster Cortex A15 core. Now when background task or some small task runs, the A7 can take care of it alone without requiring the muscles of the A15 core. A7 and A15 cores are architecturally compatible. As shown in Figure 4, both types of cores access the same memory sub-system but they, individually retain their internal cache memory allowing the cores to be hot-swappable. It takes approximately 20,000 cycles inswitching which is negligible, given the device can hit one or two billion cycles per second. Classic Dynamic Voltage and Frequency Scaling (DVFS) is used to choose when to migrate tasksWhile using big.LITTLE architecture, over 50% energy savings measured for common activities like Web browsing and music playback (from ARM’s own tests results). For these tasks, the duo A15/A7 hits the same level of performance but with 50% energy savings than the Cortex A15 alone. Note that this data assumes no use of graphics processor or GPU. Some Web Browsers now use the GPUs to accelerate the graphic workload. Back in 2013, Huawei, Samsung and others launched octo-core chips (8 cores) in the smartphones. These chips don’t have 8 individual cores that work together on a computing task. Instead, octo-core chips follow big.LITTLE architecture and have two sets of four cores, out of them some are big and some are small, that take turns to execute the tasks. This follows the most efficient power envelope. Although factually correct but four Cortex A7/A15 duos should not be called octo-core (8-core). In the current implementations, only four of them can be activated at a single point of execution. In effective terms, when it comes down to real workload, those 8-core processors are really just 4-core processorsThe results of above Benchmark testing concludes that there’s still plenty of optimization left to be done for both the runtimes. At the moment, ART can provide slightly better battery life and performance than Dalvik. By the results of the tests, it is clear that we won’t see massive gains. Moreover, while a lots of apps on android have been optimized to work with ART, some of the apps don’t work at all. ART has to be switched with Dalvik for using them on KitKat. In conclusion to the power efficiency segment, with the current device usage pattern where the device sits idle for most of the time, big.LITTLE architecture should help the average battery life. The chip makers may choose to enable all the 8 cores at once, but this has its own trade-offs. In near future, this model is most likely to be avoided for end-user applications, but may be used in Enterprise applications. The Absolute “8-core” will be the central marketing message of coming future when it comes to mobile processors CodeShoppy
A new runtime was introduced with Android 4.4 i.e. KitKat, which should eventually replace the Dalvik runtime. ART (Android Runtime) and Dalvik as the runtime executes the Dalvik Executable format and Dex bytecode specification. In other words, when an app is run on android, it goes through a runtime. Previously, Android’s runtime was Dalvik. While it performed well, it was still a bottleneck as it only ran the code at the moment it needed to, with a JIT compiler (Just-in-time). AOT (Ahead-of-time) compilation paradigm is followed by ART to process application instructions before they are even required. In the next section, the background of compilation on various architectures is described. In next section, ART is introduced with its new features. In next section, ART has been tested against traditional Dalvik runtime. Then before the conclusion, ARM big.LITTLE architecture is described with some results.The first gen translators to convert assembly code to machine code were Assemblers. Since the translation was without any intermediate step, assemblers were fast. Then came the generation of compilers, which translates the high level code into assembly codes, then use assemblers to translate that assembly code into machine code. However the execution of the program was almost as fast as assembly code, the compiler was slower than assemblers for obvious reasons. C compiler is from this generation. In this approach, the problem was the code not being cross platform. The next generation was interpreters which translates the code while executing it. It reads a line and converts it into a binary command and executes it, then jump to the next line. The execution was slow, since the translation happens at runtime.