iPad Pro Wolfram Player update

By: Maynard Handley (name99.delete@this.name99.org), January 6, 2019 2:56 pm
Room: Moderated Discussions
Now that I have an iPad Pro, I reran my benchmarks on the new iPad and an iMac Pro.
Recall that this is a test of Wolfram Player on the iPad, and Mathematica on x86.
As before, the most prominent immediate feature is that Wolfram are very definitely
- not using an efficient bignum library (for ints or reals)
- not parallelizing a lot of code that is parallelized on x86 (though they parallelize some, and it's kinda weird what is and is not)
- not providing vector support for some stuff that is vectorized on x86

It's not clear that any of this has large-scale improved over the past year (and three Wolfram Player update). There is one strange thing, in that when I ran the large integer multiplication tests, almost always I got the slow results I would expect, but one occasion (repeatable within that kernel launch session) the numbers were about 4x faster. So maybe Wolfram is randomly (on launch) A/B testing a better bignum library?

Another thing that is performance-relevant strange is that for the few places where Wolfram does automatically parallelize, they seem to parallelize more often than not to three rather than four cores. (I would not expect them to waste time trying to ship work to the small cores.) Who knows what's going on, but it seems like some of the code is set up to correctly query the OS [or some hardwired table?] for the number of cores while the rest of the code is locked to a max of three cores from the old iPad?

So given all that weirdness, same as before, has anything improved?
The app is definitely a lot more stable, and on the A12X iPad Pro, it's even more stable. (It could be hitting memory limits on the A10X, but it doesn't look like that --- the memory monitor doesn't show a wild spike to 100% RAM. So it's unclear. Perhaps the byte code is optimized differently in the app store for the two different cores, and there's a very specific bug in the A10X optimization path that's rarely triggered, and that Apple hasn't yet fixed [or had not fixed by the time Wolfram submitted the latest release]?)
Anyway it's still slightly irritating working with Player, and trying to benchmark, on the A12X, but much better than before.

Apart from that, is there anything specifically interesting to report?
Not really, the conclusion is much the same as last year.

Summarizing things to a single number, the A12X is about 50% faster than the A10X, and most of the time this is not the result of the extra core. Frequency jumped very little, so this is mostly better IPC, presumably mostly from better memory system.

The new iMac is likewise maybe 50% faster than its predecessor (so that's 4.2GHz Xeon W 2017 vs 3.6GHz Haswell 2012), about the same sort jump as the A10X to A12X. But that's being very specific in limiting code to non-vectorized non-parallelized. Obviously for parallel code and dense linear algebra it does rather better, more like 2x as fast overall. (Good yes, but not the 3x or more you would expect from naive frequency/IPC scaling plus cores plus AVX512 --- thermals really kick in and limit the extent to which everything can crank up to 11 along every dimension...)

Finally we get the infamous relative IPC (or if you prefer "work/cycle"). Yeah, yeah, feel free to throw out the usual lectures about how you can't compare work at 4.2GHz to 2.5GHz, whatever. You know my feelings on the subject, and why it's interesting.
For the relevant benchmarks (ie strip out those with bignums, parallelization or vectorization, etc) and I'd eyeball it at about 2x now (up from about 1.5x when I did this last year comparing 10X to the i7 Haswell).

In terms of absolute performance, A12X is frequently ahead of the iMac Pro, and one suspects it would be ahead on pretty much every benchmark (except maybe dense linear algebra) if identical algorithms were in place in Player and Mathematica. Even the dense linear algebra advantage is not as great as you might expect given the core slowdown, then the AVX512 slowdown. Certainly we've come a long way from the initial "ARM will never match Intel" to "Maybe it can match Intel for mobile" to "Well, sure it can match for IPC, but not for absolute performance"to flat out superiority.
Yes yes, I'm sure I can buy a K x86 system with liquid nitrogen cooling that will still be faster than this iPad Pro for this type of single threaded code. Console yourself with that if you like.

(It's an interesting question as to why this Wolfram algorithm disparity continues. Maybe Wolfram have business reasons for not making Player work especially well --- or of you prefer, not bothering to optimize beyond where they are?)

Certainly to me it feels like anyone using Mathematic doesn't have to be worried that they'll be going backwards with an ARM mac, even at the high end. (This assumes, of course, that Wolfram WILL implement a proper bignum library and full parallelization for such a target; and one can't see why they wouldn't.)

As for the other aspects of the new iPad.
It's remarkable how much smaller it feels, even with the same screen size.
The case is a better, tighter fit in a good way. (You think you don't care, but it feels a lot better in the hand.)
FaceID is great, as magical as on the iPhone.
BUT there is one huge flaw:

The iPhone uses its motion core to wake on raise -- you pick it up, it wake up, it looks at your face, you're ready to go.
iPad does not offer raise on wake (I don't know why? I'd assume the HW can handle it, so it's just no-one ever though to hook up the software?)

Apple seems to expect the usage pattern to be that you lift the cover, that wakes it, FaceID kicks in, ready for use. And that works great when you start a session. But you switch to something else, you go to pee, talk to someone, whatever, and the iPad goes to sleep with the cover already behind it. It just feels natural that picking it up should wake it, like an iPhone, and it doesn't. Yeah, yeah, zero'th world problems, I know. But when you pay a lot for all this magic, it's disappointing the small ways in which it fails. Like I say, hopefully fixed in iOS13?

To be fair, the iMac Pro is also a really nice machine compared to my 2012 iMac. A single thread may not be THAT much faster (like I said, maybe 50%) but having enough more of them (8 rather than 4 cores) and (most likely) having an insanely fast SSD and a much better GPU, really make it feel like a much nicer machine, more like an iPad, less like an older Mac, in terms of less random waiting for tiny disk delays. And it is silent! I was worried about fan noise, but I don't think I've ever heard the fans. They are there, and are spinning (right now at about 1000rpm) but they're not noticeable.

So there we are. Hopefully the next time I do this, I'll be comparing an ARM mac as one of the contenders!
 Next Post in Thread >
TopicPosted ByDate
iPad Pro Wolfram Player updateMaynard Handley2019/01/06 02:56 PM
  Walls of text to try to prove that Apple is the best?anon2019/01/07 02:39 AM
    Walls of text to try to prove that Apple is the best?Maynard Handley2019/01/07 09:59 AM
      Walls of text to try to prove that Apple is the best?Brett2019/01/07 01:51 PM
        Walls of text to try to prove that Apple is the best?Maynard Handley2019/01/07 04:43 PM
          Walls of text to try to prove that Apple is the best?Maynard Handley2019/01/07 04:43 PM
            Walls of text to try to prove that Apple is the best?Richard S2019/01/08 02:17 PM
              Walls of text to try to prove that Apple is the best?Maynard Handley2019/01/08 02:48 PM
                AVX-512 has vdivps (NT)Anon2019/01/09 02:56 AM
                  divps/divpd was in SSE2 (NT)Yuhong Bao2019/01/09 06:30 PM
                Walls of text to try to prove that Apple is the best?Eric Bron2019/01/09 04:34 AM
                Walls of text to try to prove that Apple is the best?none2019/01/09 08:08 AM
                  Walls of text to try to prove that Apple is the best?Maynard Handley2019/01/09 10:31 AM
Reply to this Topic
Body: No Text
How do you spell purple?