By: Anon (no.delete@this.email.com), October 21, 2017 9:56 pm

Room: Moderated Discussions

Maynard Handley (name99.delete@this.name99.org) on October 21, 2017 5:51 pm wrote:

> Anon (no.delete@this.thanks.com) on October 21, 2017 4:20 pm wrote:

> > Maynard Handley (name99.delete@this.name99.org) on October 20, 2017 1:34 am wrote:

> > > I mentioned some weeks ago that Wolfram was going to ship a Mathematica Player for iPad,

> > > and that it would be an interesting performance comparison against x86 of a "serious"

> > > app. Well the Player has been released and I've spent a few hours playing with it.

> > >

> >

> > Sorry, not clipping the rest for any other reason that readability, all quite interesting.

> >

> > However, I just wonder.

> > What makes you think that you will ever measure anything other than 'how much does Wolfram feel

> > like spending on different platforms'? I think your early measurements bear this out strongly.

> >

> > These systems are so high level that there is no 'porting over' between systems, as almost certainly they

> > use 3rd party libraries for matrix work, etc and the quality, and even of those will vary hugely.

>

> I think you have a drastically diminished idea of just how large and impressive Mathematica

> is, what it does, and how it does it, and the rest of what you say stems from that.

>

> It is not correct to say that "These systems are so high level that ..."

> What is high level is the code I write to run on Mathematica, but Mathematica itself exploits as much low-level

> knowledge of the machine as possible. The analogy is that my code is like Java, Mathematica is like the JVM.

> A JVM is not high level code; it is code that requires intimate knowledge of the CPU on which it runs.

Thats funny, having used Mathematica since, from memory 1989 - I am thinking somewhere in the 1.x series, I dont think you are even close to the mark, but what would I know apparently.

>

> Mathematica is vastly larger, and vastly more sophisticated than something like Matlab

> or Octave. This is not the place to educate you in that, but remember a few points

> - it has been around (though obviously started off a lot smaller) since 1988

> - it aspires to cover all of mathematics (though, again,

> obviously that can only be approached, not achieved)

Please educate me, oh please, as you are obviously SO much more knowledgeable.

>

> (You can get a feel for the size here:

> http://blog.wolfram.com/2016/08/08/today-we-launch-version-11/)

>

> This means that your fundamental data structures are not something as limited as vectors

> and matrices. Even if you consider only that class of objects, you're dealing with arbitrary

> rank tensors, whose elements can each be anything from machine-precision doubles to integers

> to arbitrary precision floats/integers/rationals to variables to symbolic expressions. And

> these mixed objects can be appropriately added, contracted and suchlike.

So, you dont think they use the readily available and highly optimised matrix libraries that exist for their basic core functions? Care to explain why in a way any deeper than 'its big and complicated?' I would suggest that they may, on the simple basis that you see some pretty big performance steps when you introduce structures that do not readily map to such libraries (at least on x86). That certainly indicates to me that they drop back to more generic algorithms for corner cases which do not map. I have spend many many hours having to straighten out cases where I was falling in to such performance holes, and once I had the structure back to something that did map over better, performance improves hugely.

It is of course very impressive that it often CAN achieve such a mapping - and nothing I say should detract from the capabilities of this system.

> This means in turn that there are not trivial single points in the system where you just slot in "high performance

> matrix multiplication" or whatever. Rather what has happened is, like I said, there are EXTREMELY generic (written

> by Wolfram) algorithms throughout the system and, over time, these are each specialized in an on-going fashion.

> Sometimes this specialization is CPU-generic (for example specializing off a fully general matrix addition routine

> down to a specialized routine for adding together two matrices of machine-precision doubles); other times it

> moves to the more CPU-specific (this would include, for example, bignum handling).

I'm sorry, perhaps you could point out where I was implying they just 'slot in' performance libraries at 'trivial single points'? I certainly claim they are quite probably using them, however I am far from implying that they are some kind of thin layer over the top of them.

However, not I am confused, as what you are describing in your second part above appears to be almost EXACTLY what I was suggesting in the first place, and you seem to take such offense to.

Or do you think that 'adding together two matrices of machine-precision doubles' is not a good point to hand off to say Intels optimised routines?

> So I don't think your starting point ("Wolfram is ignoring someone or other's high performance

> BLAS routines") is correct; that doesn't accurately model how Mathematica works as a codebase,

1a - I'm impressed, when did you get access to their codebase?

2 - Please point out exactly where I said "Wolfram is ignoring someone or other's high performance BLAS routines" or, in fact anything even slightly similar?

> or how Wolfram works as a company. There are other, more subtle things going on.

1b - And to their internal development process, thats quite an impressive level of knowledge about what they do.

> For example we know that some code has been specialized to use

> parallel routines on the iPad, but only a limited set.

And I'm not surprised they do, as I have little doubt that some of their existing work is reasonably easy no move over, and some is not - I would suggest that a lot of what is not is because of the huge effort required in either reimplementing large sections of hand optimised code and/or availability of external libraries - hence I pointed out that what you will be measuring is that commitment, rather than anything to do with the CPU itself.

> One's natural expectation would be that there's be a fairly high-level flag in the codebase that you could

> flip to have this kick in for all routines that have been so specialized, but we're clearly not seeing

> that. It seems unlikely that this is because all those routines are engaged in intricate RCU algorithms

> that only work on the precise x86 memory model; more likely I'm guessing is something like the flag was

> flipped, various things failed (someone's bug, maybe Wolfram's, maybe Apple's, maybe the compilers?) but

> anyway the flag was flipped back and a few experimental routines had the flag flipped back on again to

> test out that it works in these cases, and after we've shipped we'll figure out the generic case.

Have you worked on many large and old softrware development project? I only ask as that seems r particularly unique view of such situations. I have a lot of due respect for Wolframs codebase, however I doubt much of it is anything near that clean.

> Likewise there appears to be absolutely no use of vectorization, which again suggests something like "we

> tried it, tests failed, we're working with XCode/LLVM to fix it, but for now it's switched off".

Or, just perhaps, they used hand optimised assembly routines and/or external acceleration libraries in places.

I would suggest it is very naive to assume that they rely on compiler vectorisation heavily.

> Wolfram appear to be taking this seriously

> http://blog.wolfram.com/2017/10/04/notebooks-in-your-pocket-wolfram-player-for-ios-is-now-shipping/

> and have also been frustrated at how long it has taken them. My guess is that the issue

> is very much the sorts of things I have described, rather than your sort of analysis.

Good to know your guess is so much more valuable.

> > so you end up with one of two situations.

> > Either Wolfram does, rather surprisingly, ignore existing high performance numerical

> > libraries and roll their own, in which case huge and long effort will be required to

> > maximise performance on each platform (and each cpu generation..), so you are just measuring

> > their commitment to those platforms in the form of developer investment.

> > Or Wolfram use the existing libraries, and you are measuring the relative quality of

> > whichever they choose, and how well it maps to the requirements of Mathematica.

> >

> > this seems to me to fall into the same trap as the inclusion of encryption runs using acceleration

> > (if present) in certain 'benchmarks', only even more so. It measures little of actual use.

> >

> > If your intention is to measure some form of relative cpu performance for anything

> > other than Mathematica, I wonder what other applications you feel it would map to?

> >

> > Wouldnt it be much more sensible to test more controllable kernels if you were actually looking

> > for some form of numerical benchmarking? However good luck with even that, it is more slippery

> > than an eel, as every single application tends to have very different requirements.

> >

> > I suspect the most interesting part of this would be some insight

> > as to what features Wolfram actually gets around to using..

>

> Anon (no.delete@this.thanks.com) on October 21, 2017 4:20 pm wrote:

> > Maynard Handley (name99.delete@this.name99.org) on October 20, 2017 1:34 am wrote:

> > > I mentioned some weeks ago that Wolfram was going to ship a Mathematica Player for iPad,

> > > and that it would be an interesting performance comparison against x86 of a "serious"

> > > app. Well the Player has been released and I've spent a few hours playing with it.

> > >

> >

> > Sorry, not clipping the rest for any other reason that readability, all quite interesting.

> >

> > However, I just wonder.

> > What makes you think that you will ever measure anything other than 'how much does Wolfram feel

> > like spending on different platforms'? I think your early measurements bear this out strongly.

> >

> > These systems are so high level that there is no 'porting over' between systems, as almost certainly they

> > use 3rd party libraries for matrix work, etc and the quality, and even of those will vary hugely.

>

> I think you have a drastically diminished idea of just how large and impressive Mathematica

> is, what it does, and how it does it, and the rest of what you say stems from that.

>

> It is not correct to say that "These systems are so high level that ..."

> What is high level is the code I write to run on Mathematica, but Mathematica itself exploits as much low-level

> knowledge of the machine as possible. The analogy is that my code is like Java, Mathematica is like the JVM.

> A JVM is not high level code; it is code that requires intimate knowledge of the CPU on which it runs.

Thats funny, having used Mathematica since, from memory 1989 - I am thinking somewhere in the 1.x series, I dont think you are even close to the mark, but what would I know apparently.

>

> Mathematica is vastly larger, and vastly more sophisticated than something like Matlab

> or Octave. This is not the place to educate you in that, but remember a few points

> - it has been around (though obviously started off a lot smaller) since 1988

> - it aspires to cover all of mathematics (though, again,

> obviously that can only be approached, not achieved)

Please educate me, oh please, as you are obviously SO much more knowledgeable.

>

> (You can get a feel for the size here:

> http://blog.wolfram.com/2016/08/08/today-we-launch-version-11/)

>

> This means that your fundamental data structures are not something as limited as vectors

> and matrices. Even if you consider only that class of objects, you're dealing with arbitrary

> rank tensors, whose elements can each be anything from machine-precision doubles to integers

> to arbitrary precision floats/integers/rationals to variables to symbolic expressions. And

> these mixed objects can be appropriately added, contracted and suchlike.

So, you dont think they use the readily available and highly optimised matrix libraries that exist for their basic core functions? Care to explain why in a way any deeper than 'its big and complicated?' I would suggest that they may, on the simple basis that you see some pretty big performance steps when you introduce structures that do not readily map to such libraries (at least on x86). That certainly indicates to me that they drop back to more generic algorithms for corner cases which do not map. I have spend many many hours having to straighten out cases where I was falling in to such performance holes, and once I had the structure back to something that did map over better, performance improves hugely.

It is of course very impressive that it often CAN achieve such a mapping - and nothing I say should detract from the capabilities of this system.

> This means in turn that there are not trivial single points in the system where you just slot in "high performance

> matrix multiplication" or whatever. Rather what has happened is, like I said, there are EXTREMELY generic (written

> by Wolfram) algorithms throughout the system and, over time, these are each specialized in an on-going fashion.

> Sometimes this specialization is CPU-generic (for example specializing off a fully general matrix addition routine

> down to a specialized routine for adding together two matrices of machine-precision doubles); other times it

> moves to the more CPU-specific (this would include, for example, bignum handling).

I'm sorry, perhaps you could point out where I was implying they just 'slot in' performance libraries at 'trivial single points'? I certainly claim they are quite probably using them, however I am far from implying that they are some kind of thin layer over the top of them.

However, not I am confused, as what you are describing in your second part above appears to be almost EXACTLY what I was suggesting in the first place, and you seem to take such offense to.

Or do you think that 'adding together two matrices of machine-precision doubles' is not a good point to hand off to say Intels optimised routines?

> So I don't think your starting point ("Wolfram is ignoring someone or other's high performance

> BLAS routines") is correct; that doesn't accurately model how Mathematica works as a codebase,

1a - I'm impressed, when did you get access to their codebase?

2 - Please point out exactly where I said "Wolfram is ignoring someone or other's high performance BLAS routines" or, in fact anything even slightly similar?

> or how Wolfram works as a company. There are other, more subtle things going on.

1b - And to their internal development process, thats quite an impressive level of knowledge about what they do.

> For example we know that some code has been specialized to use

> parallel routines on the iPad, but only a limited set.

And I'm not surprised they do, as I have little doubt that some of their existing work is reasonably easy no move over, and some is not - I would suggest that a lot of what is not is because of the huge effort required in either reimplementing large sections of hand optimised code and/or availability of external libraries - hence I pointed out that what you will be measuring is that commitment, rather than anything to do with the CPU itself.

> One's natural expectation would be that there's be a fairly high-level flag in the codebase that you could

> flip to have this kick in for all routines that have been so specialized, but we're clearly not seeing

> that. It seems unlikely that this is because all those routines are engaged in intricate RCU algorithms

> that only work on the precise x86 memory model; more likely I'm guessing is something like the flag was

> flipped, various things failed (someone's bug, maybe Wolfram's, maybe Apple's, maybe the compilers?) but

> anyway the flag was flipped back and a few experimental routines had the flag flipped back on again to

> test out that it works in these cases, and after we've shipped we'll figure out the generic case.

Have you worked on many large and old softrware development project? I only ask as that seems r particularly unique view of such situations. I have a lot of due respect for Wolframs codebase, however I doubt much of it is anything near that clean.

> Likewise there appears to be absolutely no use of vectorization, which again suggests something like "we

> tried it, tests failed, we're working with XCode/LLVM to fix it, but for now it's switched off".

Or, just perhaps, they used hand optimised assembly routines and/or external acceleration libraries in places.

I would suggest it is very naive to assume that they rely on compiler vectorisation heavily.

> Wolfram appear to be taking this seriously

> http://blog.wolfram.com/2017/10/04/notebooks-in-your-pocket-wolfram-player-for-ios-is-now-shipping/

> and have also been frustrated at how long it has taken them. My guess is that the issue

> is very much the sorts of things I have described, rather than your sort of analysis.

Good to know your guess is so much more valuable.

> > so you end up with one of two situations.

> > Either Wolfram does, rather surprisingly, ignore existing high performance numerical

> > libraries and roll their own, in which case huge and long effort will be required to

> > maximise performance on each platform (and each cpu generation..), so you are just measuring

> > their commitment to those platforms in the form of developer investment.

> > Or Wolfram use the existing libraries, and you are measuring the relative quality of

> > whichever they choose, and how well it maps to the requirements of Mathematica.

> >

> > this seems to me to fall into the same trap as the inclusion of encryption runs using acceleration

> > (if present) in certain 'benchmarks', only even more so. It measures little of actual use.

> >

> > If your intention is to measure some form of relative cpu performance for anything

> > other than Mathematica, I wonder what other applications you feel it would map to?

> >

> > Wouldnt it be much more sensible to test more controllable kernels if you were actually looking

> > for some form of numerical benchmarking? However good luck with even that, it is more slippery

> > than an eel, as every single application tends to have very different requirements.

> >

> > I suspect the most interesting part of this would be some insight

> > as to what features Wolfram actually gets around to using..

>

Topic | Posted By | Date |
---|---|---|

Mathematica on iPad | Maynard Handley | 2017/10/20 01:34 AM |

Mathematica on iPad | dmcq | 2017/10/20 07:26 AM |

Mathematica on iPad | Maynard Handley | 2017/10/20 01:41 PM |

Mathematica on iPad | Maynard Handley | 2017/10/20 08:16 PM |

Does this give better formatting? | Maynard Handley | 2017/10/20 08:20 PM |

Does this give better formatting? | anon | 2017/10/20 09:37 PM |

Does this give better formatting? | Maynard Handley | 2017/10/20 10:29 PM |

Does this give better formatting? | anon | 2017/10/21 12:52 AM |

Does this give better formatting? | Maynard Handley | 2017/10/21 09:48 AM |

Does this give better formatting? | anon | 2017/10/21 10:01 AM |

Mathematica on iPad | Adrian | 2017/10/21 01:49 AM |

Sorry for the typo | Adrian | 2017/10/21 01:51 AM |

Mathematica on iPad | dmcq | 2017/10/21 07:03 AM |

Mathematica on iPad | Maynard Handley | 2017/10/21 09:58 AM |

Mathematica on iPad | Wilco | 2017/10/21 07:16 AM |

Mathematica on iPad | Doug S | 2017/10/21 09:02 AM |

Mathematica on iPad | Megol | 2017/10/22 05:24 AM |

clang __builtin_addcll | Michael S | 2017/10/21 11:05 AM |

Mathematica on iPad | Maynard Handley | 2017/10/21 09:55 AM |

Mathematica on iPad | Anon | 2017/10/21 04:20 PM |

Mathematica on iPad | Maynard Handley | 2017/10/21 05:51 PM |

Mathematica on iPad | Anon | 2017/10/21 09:56 PM |

Mathematica on iPad | Maynard Handley | 2017/10/22 12:23 AM |

A quick search shows that Mathematica is using Intel MKL | Gabriele Svelto | 2017/10/21 11:38 PM |

A quick search shows that Mathematica is using Intel MKL | Anon | 2017/10/22 05:12 PM |

A quick search shows that Mathematica is using Intel MKL | Maynard Handley | 2017/10/22 06:08 PM |

A quick search shows that Mathematica is using Intel MKL | Doug S | 2017/10/22 10:40 PM |

A quick search shows that Mathematica is using Intel MKL | Michael S | 2017/10/23 05:32 AM |

Mathematica on iPad | none | 2017/10/22 06:06 AM |

Mathematica on iPad | dmcq | 2017/10/23 03:43 AM |