By: Alberto (git.delete@this.git.it), August 26, 2015 1:13 pm
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on August 26, 2015 11:23 am wrote:
> Alberto (git.delete@this.git.it) on August 26, 2015 6:26 am wrote:
> > juanrga (nospam.delete@this.juanrga.com) on August 26, 2015 5:49 am wrote:
> > > Anon (nope.delete@this.nope.com) on August 26, 2015 1:23 am wrote:
> > > > juanrga (nospam.delete@this.juanrga.com) on August 25, 2015 6:01 pm wrote:
> > > > > Nvidia talk at ISC2015 was much more interesting. They compared
> > > > > two KNL CPUs against Power+CUDA using Amdahl's
> > > > > law. At 98% parallel the KNL was competitive. At 90% parallel
> > > > > work the KNL system was about two times slower than
> > > > > the Power+CUDA system: ~2 min vs 4.5 min. At 70% parallel work, the KNL system was more than 3x slower.
> > > > >
> > > > > Wider vector units and less cores had worked better.
> > > >
> > > > KNL will still be available as a PCIe card- SKX & KNL combo systems
> > > > are entirely feasible, for workloads that fit that paradigm.
> > >
> > > The card version is for legacy customers. New systems will favor the CPU version.
> > > Nvidia was comparing Summit and Aurora configurations and how Aurora will require
> > > one order of magnitude more nodes to achieve similar performance.
> >
> > Even the Host version of KNL can be paired with Xeons in the node :),
>
> Sure, and then you come with an inefficient system that goes against the philosophy
> of the self-boot approach. Moreover, I am not sure if the increased single
> thread performance will compensate for the interconnect penalty.
I don't know, all depends on low level custom routines Intel has to glue all together, there is sure a reason to integrate a fabric in upcoming Xeons for HPC, this approach could address some/all the problems shown in the NVIDIA talk at ISC for low level of parallelism, this (Cray) fabric is really blazing fast and low latency, feasible on silicon and on cable.
As usual only a third party anlysis can give the answer. We'll see, i don't trust much in these "well studied" competitive comparisons only to show the adversary weakness.
> Alberto (git.delete@this.git.it) on August 26, 2015 6:26 am wrote:
> > juanrga (nospam.delete@this.juanrga.com) on August 26, 2015 5:49 am wrote:
> > > Anon (nope.delete@this.nope.com) on August 26, 2015 1:23 am wrote:
> > > > juanrga (nospam.delete@this.juanrga.com) on August 25, 2015 6:01 pm wrote:
> > > > > Nvidia talk at ISC2015 was much more interesting. They compared
> > > > > two KNL CPUs against Power+CUDA using Amdahl's
> > > > > law. At 98% parallel the KNL was competitive. At 90% parallel
> > > > > work the KNL system was about two times slower than
> > > > > the Power+CUDA system: ~2 min vs 4.5 min. At 70% parallel work, the KNL system was more than 3x slower.
> > > > >
> > > > > Wider vector units and less cores had worked better.
> > > >
> > > > KNL will still be available as a PCIe card- SKX & KNL combo systems
> > > > are entirely feasible, for workloads that fit that paradigm.
> > >
> > > The card version is for legacy customers. New systems will favor the CPU version.
> > > Nvidia was comparing Summit and Aurora configurations and how Aurora will require
> > > one order of magnitude more nodes to achieve similar performance.
> >
> > Even the Host version of KNL can be paired with Xeons in the node :),
>
> Sure, and then you come with an inefficient system that goes against the philosophy
> of the self-boot approach. Moreover, I am not sure if the increased single
> thread performance will compensate for the interconnect penalty.
I don't know, all depends on low level custom routines Intel has to glue all together, there is sure a reason to integrate a fabric in upcoming Xeons for HPC, this approach could address some/all the problems shown in the NVIDIA talk at ISC for low level of parallelism, this (Cray) fabric is really blazing fast and low latency, feasible on silicon and on cable.
As usual only a third party anlysis can give the answer. We'll see, i don't trust much in these "well studied" competitive comparisons only to show the adversary weakness.