SiSoftware Logo
  Home   FAQ   Press   Download & Buy   Rankings   Contact  
New: SiSoftware Sandra 2015!
DE DE FR FR IT IT JP JP RU RU 

Intel Icon

Benchmarks : Intel Android Phone (Xolo X900 / San Diego): Beating ARM at its own game?


What is it?

The Lava Xolo X900* is the first mass-market smart-phone with an Intel x86 SoC, designed to compete against the ARM SoCs that have comprehensively dominated this segment for a very long time (smart-phones and tablets). The "Medfield" SoC is based on the Atom core (Z-Series) that has powered low-cost netbooks for some years now.

As netbooks have largerly disappeared, it makes sense for Intel to target the still growing smart-phone and tablet segment. This is x86's first, but major, push into this segment just as ARM pushes into notebook/laptop segment; two architectures collide after decades of dominating their own segments: two enter, only one leaves?

The X900 runs the popular Android OS (2.3.x aka "Gingerbread", with 4.0.x "Ice Cream Sandwitch" expected soon) that has seen tremendous growth in the past few years - in both installed base and no. of apps. It is entirely based on Intel's reference device, thus there will be other re-branded phones (e.g. Orange's San Diego (ex-Santa Clara)) that are exactly the same. The price (approx. GBP 200) seems pretty competitive, though prices of competition have come down quite a lot since their respective launches.

In this article we test CPU, GPU and memory performance, against popular dual-core ARM SoCs. We are *not* comparing the phones themselves (e.g. screen, battery life, etc.) except where they directly affect the scores. As Atom is a single-core CPU - albeit supporting 2 threads - it would not be fair to compare it to recently launched quad-core ARM SoCs; we will test those against future dual-core Atoms.

Phone Specifications

We are comparing the Intel reference device (aka Lava Xolo X900 / Orange San Diego) with popular dual-core handsets and tablets.

Device (Phone/Tablet) Lava Xolo X900 / Orange San Diego HTC One S Sony XPeria S (reference) Motorola Xoom (tablet) Comments
CPU (Core / Threads @ Speed) Intel Atom Z2460 (Penwell) 1C 2T @ 1.6GHz Qualcomm MSM8960 ("Krait", Snapdragon S4) 2C @ 1.54GHz Qualcomm MSM8660 (Snapdragon S3) 2C @ 1.54GHz nVidia Tegra 2 2C @ 1GHz Atom is the only single-core with HT - thus supporting 2 threads also. Snapdragons are ASMP (asymmetric multi-processors) really with 1 unit (CPU) generally powered down; they do not use a standard Cortex A8/A9 core but a modified Qualcomm design. Tegra 2 is a conventional Cortex A9 dual-core.
GPU (Core / Threads @ Speed) / OpenGL ES PowerVR SGX 540 @ 400MHz / OGL ES 2.0 Adreno 230 / OGL ES 2.0 Adreno 220 / OGL ES 2.0 nVidia Tegra 2 / OGL ES 2.0 None of the GPUs are GPGPU capable, but they all support OpenGL ES 2.0 and thus programmable shaders which we can use for compute purposes until OpenCL ES (or CUDA ES?) comes to Android.
Display 600x1024 (614kPix) 240dpi 68Hz 16-bit 540x960 (518kPix) 240dpi 60Hz 32-bit 720x1280 (921kPix) 320dpi 60Hz 32-bit 800x1280 (1Mpix) 160dpi 60Hz 32-bit The display has great resolution and density but cannot be expected to match the latest (AMOLED) high-resolution dense (320dpi aka "retina" in Apple speak) displays on the latest Android phones.
Memory (Size / Width / Speed) 1GB (LP)DDR2, 64-bit @ 400MHz 1GB (LP)DDR2?, 64-bit @ ? 1GB (LP)DDR2, 32-bit @ ? 1GB (LP)DDR2, 32-bit @ 400MHz Both Atom and Krait use "dual-channel" memory interfaces that should help with multi-threaded loads. It's good to see all devices with 1GB memory as standard.
Android OS / Linux Kernel 2.3.7 / 2.6.35.3 4.0.3 / 3.0.16 2.3.6 / 2.6.35.11 4.0.4 / 2.6.39 One S and Xoom should benefit from ICS (Android 4.x) but Xolo and XPeria S run pretty much the same version of Android. Both of them should get ICS soon and we will re-run the tests for all. HTC is the only one using the latest 3.x Linux kernel rather than the 2.6.x branch. Xoom, being a Google experience device (GES) has the very latest Android version.
Best Price UK, Pay-as-you-Go/Sim-Free / May 2012 GBP 200 (Orange San Diego) GBP 380 GBP 380 GBP 250-280 The Xoom has been launched 1 year ago thus has gone down in price a lot; both One S and XPeria S are new phones and have still to significantly drop. While XPeria S is the flaghship Sony phone, HTC One S is bested by the quad-core One X. San Diego will launch soon at this price.

Hardware Specifications

We are comparing the Atom Z2460 against Qualcomm's Snapdragons, nVidia's Tegra 2 and other* popular dual-core SoCs.

Processor (CPU) Specifications Intel Atom Z2460 (Penwell) Qualcomm MSM8960 ("Krait", Snapdragon S4) Qualcomm MSM8660 (Snapdragon S3) nVidia Tegra 2 Comments
Cores (CU) / Threads (SP) 1C / 2T 2C (Asymmetric SMP), Cortex A9-compatible 2C (Asymmetric SMP), Cortex A9-compatible 2C, Cortex A9 core While they all support 2 threads, Atom is the only single-core design - albeit with HT. While HT improves core resource utilisation - sometimes a lot for "in-order" designs - it cannot match an extra physical core, but is a "bonus".
Speed (Min / Max) 1.6GHz (600-1600MHz) 1.54GHz (245-1530) 1.54GHz (245-1530) 1GHz (-1000) While Atom does run 4% faster than the Snapdragons, it is unlikely to make a huge difference, thus have a clock-for-clock comparison. Tegra runs 60% slower but has a higher power envelope being optimise for tablets; thus it should sustain maximum performance states for longer unlike phone SoCs which are optimised for very low power consumption.
SIMD instructions / Width MMX, SSE, SSE2, SSE3, SSSE3 / 128-bit NEON / 128-bit NEON / 128-bit none Atom supports all x86 SIMD instructions up to Core 2, just like notebook versions; while SSE 4.x would be nice it is not essential. Snapdragons support ARM's NEON while Tegra 2 has to use VFP non-SIMD; fortunately nVidia added support for NEON in Tegra 3 but all the existing Tegra tablets are out of luck.
Cache sizes (L1 / L2) 24kB / 512kB 2x 32kB / 512kB 2x 32kB / 1MB 2x 32kB / 1MB L2 is twice as big for Krait and Tegra; Atom has the smallest caches, as L1D/I caches are shared by the two threads. This may matter in virtualised (i.e. Java) tests as the JVM itself takes up resources, including cache blocks.

Native Processing Performance

We are testing native arithmetic and SIMD performance using the highest performing instruction sets (NEON, SSE2, etc.).

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Native Benchmarks Intel Atom Z2460 2T @ 1.6GHz Qualcomm MSM8660 2C @ 1.54GHz Qualcomm MSM8960 2C @ 1.54GHz nVidia Tegra 2 2C @ 1GHz Comments
Native CPU Arithmetic (Integer) Dhrystone 4004 MIPS [-20%] 5035 MIPS (baseline) 7674 MIPS [+52%] 3913 MIPS [-22%] Even with HT, Atom cannot beat its dual-core rivals (20% slower) but still scores respectably; with future SSE4.x (string manipulation) support performance should improve. S4/Krait does amazingly well, 52% faster than its older S3 brother.
Native CPU Arithmetic (Double FP64) Whetstone 33 MFLOPS [+57%] SSE2 21 MFLOPS (baseline) VFP 30 MFLOPS [+42%] VFP 25 MFLOPS [+19%] VFP Atom does very well here (57% faster), beating even S4/Krait into 2nd place, which itself improved 50% over its S3/MSM8660 sibling. The x86 architecture has always included relatively powerful FPU/SIMD units while for ARM they are still a relatively new addition. For FPU workloads, x86 looks good. Tegra's score shows that the standard math libraries are not NEON optimised.
Native CPU Multi-Media SIMD Integer 4130 MPix/s [+13%] SSE2 3667 MPix/s (baseline) NEON 3807 MPix/s [+4%] NEON 1868 MPix/s [-49%] SSE2 helps Atom beat Qualcomm's NEON by ~10%, helped by S4/Krait lack of significant performance improvement (+4%) over its older brother. Without any SIMD, Tegra's albeit fast cores cannot keep up. While on x86, SSE/SIMD code can be mixed with "normal" code, NEON on ARM incurs significant lantencies transferring data from FPU to CPU registers, thus is best used for long SIMD calculations.
Native CPU Multi-Media SIMD Float (FP32) 6632 MPix/s [+72%] SSE2 3846 MPix/s (baseline) NEON 4467 MPix/s [+16%] NEON 1295 MPix/s [-66%] VFP SSE2 helps even more with floating-point, beating both Qualcomm CPUs decisively (60-72%) even when equipped with NEON. S4/Krait improves a bit more here (+16%) but not enough. Tegra is left in the dust, 66% slower; nVidia saw sense and added NEON to Tegra 3. It is still surprising that even with long SIMD calculations NEON does not help as much as SSE does on x86.
Native CPU Multi-Media SIMD Double (FP64) 1215 MPix/s [-8%] SSE2 1326 MPix/s (baseline) VFP 1348 MPix/s VFP 1204 MPix/s [-9%] VFP NEON does not support FP64, thus VFP is used for ARM. SSE2 does support FP64 but it does not seem to help Atom overta its ARM competition; Tegra's lack of NEON does not matter here and it ties with everybody else.
Native Crypto: AES-256 Encrypt/Decrypt 31 MB/s [-52%] 64 MB/s (baseline) ? 27 MB/s [-57%] HT cannot help Atom make up for a physical core and is 52% slower against the older S3 and thus even slower against the new S4. Atom could do with the AES hardware acceleration instructions from its desktop Core brothers. Tegra loses due to its low memory bandwidth (see later).
Native Crypto: SHA2-256 Hashing 39 MB/s [-54%] 85 MB/s (baseline) ? 45 MB/s [-47%] HT cannot help Atom here either, it's 54% slower than S3. It could could do with SSE4.x here as we use it to accelerate SHA hashing on its Core brothers. Tegra is faster but no match for Qualcomm's finest.
Native Multi-Core Efficiency 837MB/s - 294ns 27MB/s - 536ns (baseline) ? 78MB/s - 409ns Atom's HT means both threads share both the L1D and L2 caches, thus inter-thread data transfer is very fast. In contrast, Qualcomm's ASMP design means data must go to memory and back which is very slow; Tegra's true dual-core design does much better but is still is 10 times slower. It is likely that Atom's L1D and L2 caches are also faster, see the Cache Benchmarks / Cache Latencies.

The scores of the native code benchmarks show the actual CPU differences rather than JVM performance that most other benchmarks how: we can see where Atom's strengths and weaknesses lie and what Intel might use in the future to try to beat ARM on its own turf:

  • More Cores Needed: One Atom core cannot beat modern dual-ARM-cores but with HT it is competitive. However, Intel would need to release dual-core Atoms to compete with the latest quad-ARM-core SoCs.

  • Hyper-Threading Helps: Being an in-order design, HT helps fully utilise CPU resources (we will study HT effect in a future article). However, non-HT Atoms will thus not be able to keep up with dual-ARM-core competition which just about all modern phones and tablets have.

  • SSE beats NEON: The plethora of SSE instruction sets (SSSE3, SSE3, SSE2, SSE and old MMX) provide greater flexibily as well as lower ALU<>SIMD unit latencies than NEON. A huge number of SSE optimised libraries can be deployed against the smaller set of NEON optimisations. Atom clearly beat its competitors where SSE2 was deployed against NEON and this is where x86 shines.

  • AVX may kill future NEON: Intel can deploy already established instruction sets (SSE 4.2, 4.1, AVX, FMA, AES) in future Atom CPUs and blow NEON2 out of the water. AVX brought huge gains on the desktop and if these gains were replicated on the Atom, even quad-ARM-core CPUs will have a hard time competing against it. "NEON2" (on Cortex A15) is not out yet and thus will take some time to write and optimise for.

  • Efficient inter-thread data transfer: As both threads share both L1D and L2 on Atom, transferring small data blocks between threads is extremely fast. No ARM design can compete here, with ASMP designs (Qualcomm) doing especially badly. Multi-threaded apps transferring lots of data between threads should especially fly.

Java Processing Performance

We are testing Java JVM (Dalvik) virtualisation performance as this is what Android apps run on; how well the JVM is optimised and able to take advantage of the latest instruction sets matters a lot.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Java Benchmarks Intel Atom Z2460 2T @ 1.6GHz Qualcomm MSM8660 2C @ 1.54GHz Qualcomm MSM8960 2C @ 1.54GHz nVidia Tegra 2 2C @ 1GHz Comments
Java CPU Arithmetic (Integer) Dhrystone 757 MIPS [-38%] 1235 MIPS (baseline) 1734 MIPS [+40%] 1101 MIPS [-11%] Atom loses badly here, being almost 40% slower; with S4/Krait being 40% faster this is not good news; even Tegra is only 11% slower. HT does not seem to help as much as in native code. It could be that the JVM is not as optimised on x86 as on ARM, we shall see.
Java CPU Arithmetic (Double FP64) Whetstone 9 MFLOPS [-10%] 10 MFLOPS (baseline) 18 MFLOPS [+80%] 9 MFLOPS [-10%] Atom does better here, most likely due to the powerful FPU - well, at least compared to ARM's VFP. With the new S4/Krait being 80% faster things are not looking so rosy. It could be a JVM optimisation issue.
Java CPU Multi-Media Integer 851 MPix/s [-37%] 1368 MPix/s (baseline) 1437 MPix/s [+5%] 1019 MPix/s [-25%] Atom is slower again by the same amount (40%) as Dhrystone; as S3/MSM8660 uses the same version it is not good news. Integer workloads on Java do not do well on Atom.
Java CPU Multi-Media Float (FP32) 596 MPix/s [+6%] 561 MPix/s (baseline) 585 MPix/s [+4%] 673 MPix/s [+20%] Atom's powerful FPU helps with floating-point workloads again, being 6% faster than its direct competition and matching even S4/Krait. Apps and games doing maths in Java code - rather than using native SIMD code - should still fly on Atom.
Java CPU Multi-Media Double (FP64) 432 MPix/s [-30%] 618 MPix/s (baseline) 680 MPix/s [+10%] 621 MPix/s FP64 is slower than FP32 on Atom and Tegra but, surprisingly, faster on Qualcomm's ARM VFP. Mobile apps generally do not use FP64 as it is generally not supported by hardware and is either cast down to FP32 or emulated in software. On desktop, the reverse happens with FP32 cast up to FP64.

Java code is Atom's achille's heel: integer performance is 40% slower than its direct competitor (S3/MSM8660) - this is the kind of code apps use for the most part. Floating-point (FP32) does much better - thanks to the relatively more powerful FPU x86 brings.

The Gingerbread (2.3.x) Dalvik JVM (1.4.0) may not yet be optimised on x86. Let's hope the updated Dalvik (1.6.0) in ICS (4.0.x) x86 brings much needed performance improvements. While the ICS upgrade is rumoured to arrive soon but it is disappointing that the Xolo did not ship with it by default as just about all 2012 Android phones and tablets (except Sony as always) ship with ICS. Then again, there is no guarantee when it will ship or what performance improvements it might bring.

Most Popular Phones, Tablets

Most popular Mobile Phones and PDAs as benchmarked by users (past 30 days):   Most popular Tablets and Media Players as benchmarked by users (past 30 days):
1.0%WD WD6000HLHX-01JJPV0WD WD6000HLHX-01JJPV0149.99 USD
2.0%SiImageSiImage18.99 USD
3.0%Seagate ST500DM002-1BD142Seagate ST500DM002-1BD142269.00 USD
4.0%Seagate ST3500320ASSeagate ST3500320AS269.00 USD
5.0%Seagate ST3500418ASSeagate ST3500418AS269.00 USD
 
For a complete list of statistics, check out the Most Popular Hardware page. For a list of more products, see SiSoftware Shopping.

Memory Performance

We are testing the memory bandwidth, cache bandwidth (L1D, L2) as well as cache and memory latencies using all the access patterns supported by Sandra (in-page random, full random and sequential/linear access patterns).

Results Interpretation: Higher values (MB/s) mean better performance.

Base 2 Multipliers: 1MB/s = 1024kB/s, 1kB/s = 1024bytes/s, 1byte = 8bits, etc.

Memory/Cache Benchmarks Intel Atom Z2460 2T @ 1.6GHz Qualcomm MSM8660 2C @ 1.54GHz Qualcomm MSM8960 2C @ 1.54GHz nVidia Tegra 2 2C @ 1GHz Comments
Memory Bandwidth 3.3 GB/s SSE2 [+113%] 1.5 GB/s NEON (baseline) 3.3 GB/s NEON [+113%] 0.4 GB/s [-74%] Atom and S4/Krait score the same here, both using 2x32-bit memory interfaces; the older S3 uses a 32-bit interface and thus has less than 1/2 bandwidth. Tegra does especially badly here with very low memory bandwidth which would explain why tests using large memory blocks are that slow.
Cache Bandwidth (L1D / L2) 15GB/s - 9.2GB/s SSE2 [+50%] 10GB/s - 6.2GB/s NEON (baseline) ? NEON 3.9GB/s - 2.2GB/s Atom's caches are about 50% faster than S3 and much faster than Tegra which, without NEON, must transfer data in smaller blocks. This goes some way in explaining the fast inter-thread data transfers observed in the Multi-Core Efficiency benchmark.
L1D Cache Latency (In-Page, Full Page, Sequential) 3 - 3 - 3 clocks [-25%] 4 - 4 - 4 clocks (baseline) ? 4 - 4 - 4 clocks L1D cache seems 1 clock faster on Atom. It may not seem much but that's 25% lower latency. (lower is better)
L2 Cache Latency (In-Page, Full Page, Sequential) 10 - 10 - 18 clocks 40 - 40 - 40 clocks (baseline) ? 31 - 44 - 55 clocks L2 seems 3-4 times faster on Atom, which would explain the fast inter-thread data transfers. Intel has a thing or two to teach ARM about making caches at least.
Memory Latency (In-Page, Full Page, Sequential) 82 - 243 - 17ns 113 - 265 - 109ns (baseline) ? 175 - 225 - 162ns Atom's memory prefetchers do a great job, with sequential accesses being 10 times faster. TLB performance (in-page) is also better as out-of-page performance is the same as competitors.

Atom's memory and cache performance is exemplary: bandwidths are higher (only S4/Krait matches them) and latencies much lower. Any apps dealing with large amounts of data really fly. Its memory prefetchers handle non-random access patters (e.g. sequential access) with ease, reducing latency by 10 times!

Graphics (GPGPU) Performance

None of the GPU cores here have GPGPU capabilities. Google's RenderScript C-like language that was supposed to run on GPUs or CPUs - thus similar to OpenCL - has not taken advantage of any GPUs yet. Even nVidia's latest Tegra 3 does not support CUDA (ES?) - naturally Tegra 2 does not either.

The very latest ARM SoCs do have GPUs with GPGPU support and that should be enabled in future Android (5.0?) or Windows 8 for ARM (WARM). Until then, OpenGL ES 2's programmable shaders can be used to execute compute workloads just like OpenGL was used in the past before DirectCompute and OpenCL brought compute support to the mainstream. Let's hope that Android can use some GPGPU goodness soon!

Results Interpretation: Higher values (MB/s) mean better performance.

Base 2 Multipliers: 1MB/s = 1024kB/s, 1kB/s = 1024bytes/s, 1byte = 8bits, etc.

GPU Benchmarks PowerVR SGX 540 Adreno 220 Adreno 230 nVidia Tegra 2 Comments
Video OpenGL Shading: Float FP32 2436 MPix/s [+10%] 2224 MPix/s (baseline) 4590 MPix/s [+106%] ? The SGX 540 GPU core (Atom) is competitive against Adreno 220 (10% faster) but not a match for Adreno 230 (S4/Krait) that is twice as fast.
Video OpenGL Shading: Float FP64 (emulated) 176 MPix/s [-50%] 351 MPix/s (baseline) 617MPix/s [+76%] 140kPix/s Complex shaders - emulating FP64 in FP32 is somewhat complex - do not do as well on SGX 540 - it's twice as slow as Adreno 220 while Adreno 230 is four times faster! Thankfully most games do not use such complex shaders nor require FP64 precision, thus this result is not that important - mobile apps don't generally use nor need FP64 in CPU code either.

The SGX 540 GPU used by Atom holds its own against its direct competitor, the Adreno 220; Adreno 230 (S4/Krait) is approx. twice as fast which is not good news. Future Atom releases should switch to a better SGX GPU core (550+) in order to match modern ARM competition.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

For a brand-new platform (Lava Xolo X900) on a new architecture (for Android), Intel's Atom SoC impresses. A single-core CPU, albeit with HT, can go toe-to-toe in most (native code) benchmarks with modern dual-ARM-core competitors. In SSE SIMD tests it even beats them by a large margin. Memory, cache and inter-thread bandwidths are much higher and latencies much lower - nothing comes close.

Java performance under Dalvik's JVM is lacking for normal, integer code though floating-point (FP32) performance is fine. Whether this is due to Gingerbread's Dalvik (1.4) not being fully optimised on x86 remains to be seem. Lack of ICS at launch is a disappointment for a 2012 phone - whether an update will be available soon and what it will bring remains to be seen.

With top-of-the-range smart-phones now sporting quad-ARM-cores (e.g. Tegra 3), Atom will need dual-core - preferably with HT - to match them. Whether this can be achieved within the same TDP remains to be seen, with tablet versions most likely.

With future ARM Cortex A15 and future NEON not yet released and thus not optimised for, Intel has the opportunity to deploy proven technologies from Core platform (e.g. AVX, SSE 4.x, etc.). While Intel has not improved Atom for netbooks much since launch - no doubt happy to let it die in order to sell more expensive Core ULV laptops - it will have to do so here if it wants to be competitive.

The price (at least in the UK) is very competitive (GBP 200) though dual-core competitors (e.g. Qualcomm S3 MSM8660 based like HTC Evo 3D) have come down in price (approx. GBP 200-250); for users it is good news as newer dual-core phones (e.g. S4/Krait based like HTC One S) should come down in price (currently approx. GBP 400) as their high premium is no longer justified.

While apps using native code will need to be updated for x86, ISVs should find easier porting high-performance code to Android x86 as well as use the same - Intel - tools to optimise and debug apps on Windows, OS X, Linux and now Android. We (SiSoftware) were able to easily port all our assembler x86 code to Android/Linux x86, while ARM support had to be written from scratch.

All in all, it is a great first effort from Intel.

News | Reviews | Twitter | Facebook | privacy | licence | contact