By: Michael S (already5chosen.delete@this.yahoo.com), October 17, 2012 6:56 pm

Room: Moderated Discussions

Robert Myers (rbmyersusa.delete@this.gmail.com) on October 17, 2012 4:34 am wrote:

> anon (anon.delete@this.anon.com) on October 17, 2012 1:17 am wrote:

>

> >

>

> > Exactly. This is why

> > low bandwidth, high latency memory and

> communications is not the problem, but

> > the *solution*. Together with

> caches and changed software assumptions, of

> > course.

> >

>

> Not to

> mention changed physics. If you can find an advanced physics text that

> ultimately does not lean heavily on the ability to go back and forth with

> facility between physical and momentum space (or whatever you choose to call it)

> using the transform that diagonalizes the momentum operator (the derivative),

> I'll be impressed. Since you seem to think that caches and changed software

> assumptions can address all problems of importance, you may have to be told

> explicitly that the transform in question is the Fourier transform. The last

> time I was paying close attention, Blue Gene could use all of 512 of its tens of

> thousands of processors effectively in doing a volumetric FFT. I'll say more

> later in the day.

>

According to my understanding, you are talking about IBM research paper from 9 years ago that investigated calculation of relatively tiny volumetric FFT (N=128, total dataset = 32 MB).

BG/Q of today is very different machine from BG/L of 2003. Today's tightly coupled 32-node "compute drawer" is almost as big, when measured by FLOPs, caches or memories, as 512-node BG/L from then. But the question is - why bother with parallelizing such small data set over so many loosely coupled computing elements?

Is it in same way similar to what you want to do? From one of our previous discussions on comp.arch I got the impression that you are interested in much bigger cubes that likely have very different scaling characteristic on BlueGene type of machines. And it's not obvious to me that their scaling characteristics are worse than small cube.

> anon (anon.delete@this.anon.com) on October 17, 2012 1:17 am wrote:

>

> >

>

> > Exactly. This is why

> > low bandwidth, high latency memory and

> communications is not the problem, but

> > the *solution*. Together with

> caches and changed software assumptions, of

> > course.

> >

>

> Not to

> mention changed physics. If you can find an advanced physics text that

> ultimately does not lean heavily on the ability to go back and forth with

> facility between physical and momentum space (or whatever you choose to call it)

> using the transform that diagonalizes the momentum operator (the derivative),

> I'll be impressed. Since you seem to think that caches and changed software

> assumptions can address all problems of importance, you may have to be told

> explicitly that the transform in question is the Fourier transform. The last

> time I was paying close attention, Blue Gene could use all of 512 of its tens of

> thousands of processors effectively in doing a volumetric FFT. I'll say more

> later in the day.

>

According to my understanding, you are talking about IBM research paper from 9 years ago that investigated calculation of relatively tiny volumetric FFT (N=128, total dataset = 32 MB).

BG/Q of today is very different machine from BG/L of 2003. Today's tightly coupled 32-node "compute drawer" is almost as big, when measured by FLOPs, caches or memories, as 512-node BG/L from then. But the question is - why bother with parallelizing such small data set over so many loosely coupled computing elements?

Is it in same way similar to what you want to do? From one of our previous discussions on comp.arch I got the impression that you are interested in much bigger cubes that likely have very different scaling characteristic on BlueGene type of machines. And it's not obvious to me that their scaling characteristics are worse than small cube.