By: Emil Briggs (me.delete@this.nowherespam.com), October 20, 2012 10:04 am

Room: Moderated Discussions

> >

> No, you don't believe in

> perpetual motion. You're apparently doing what I suggested was possible: you

> are using a computationally clumsy algorithm to match the work done by the

> processors to the wimpy bandwidth available. I wasn't quite certain about what

> N you were referring to in the N^3 scaling. If you really are referring to the

> number of atoms (which itself goes as M^3, where M is the number of atoms in a

> line on a cubic lattice), then you are (and you do refer to it as a DFT and not

> as an FFT) forgoing the advantages of divide and conquer. Ummm...

>

> Robert.

>

Robert I don't know exactly what sort of background you have in these types of calculations so please forgive me if I start at too low a level here. The DFT acronym refers to

DFT calculations are accurate because they originate from quantum mechanical first principles. QM is non local and the wavefunctions associated with the electrons extends over all space. In practice we can truncate them at some point or use periodic boundary conditions to limit the region of interest but as we increase the number of atoms the size of the region we have to address increases in a roughly linear manner. So we pick up one factor of N. But as the number of atoms increases the number of electrons increases as well and we pick up another factor of N. Finally the orthogonality constraint for the wave functions gives us another factor and we're at N^3.

There are a variety of techniques to solve the resultant equations (referred to as the Kohn-Sham equations). Some of them use FFT's. Others use finite difference methods. The FFT based methods have scaling issues on computer systems with poor interconnect bandwidth. The finite difference methods can do better on such systems.

In either case though the N^3 scaling of the algorithm with the number of atoms is the real fundamental limit. Even a system with with enough interconnect bandwidth to provide good FFT performance on thousands of nodes would still run into this problem. Hence the work on finding algorithms that don't have that limit.

> No, you don't believe in

> perpetual motion. You're apparently doing what I suggested was possible: you

> are using a computationally clumsy algorithm to match the work done by the

> processors to the wimpy bandwidth available. I wasn't quite certain about what

> N you were referring to in the N^3 scaling. If you really are referring to the

> number of atoms (which itself goes as M^3, where M is the number of atoms in a

> line on a cubic lattice), then you are (and you do refer to it as a DFT and not

> as an FFT) forgoing the advantages of divide and conquer. Ummm...

>

> Robert.

>

Robert I don't know exactly what sort of background you have in these types of calculations so please forgive me if I start at too low a level here. The DFT acronym refers to

**Density Functional Theory**and is not related to fourier transforms of any kind.DFT calculations are accurate because they originate from quantum mechanical first principles. QM is non local and the wavefunctions associated with the electrons extends over all space. In practice we can truncate them at some point or use periodic boundary conditions to limit the region of interest but as we increase the number of atoms the size of the region we have to address increases in a roughly linear manner. So we pick up one factor of N. But as the number of atoms increases the number of electrons increases as well and we pick up another factor of N. Finally the orthogonality constraint for the wave functions gives us another factor and we're at N^3.

There are a variety of techniques to solve the resultant equations (referred to as the Kohn-Sham equations). Some of them use FFT's. Others use finite difference methods. The FFT based methods have scaling issues on computer systems with poor interconnect bandwidth. The finite difference methods can do better on such systems.

In either case though the N^3 scaling of the algorithm with the number of atoms is the real fundamental limit. Even a system with with enough interconnect bandwidth to provide good FFT performance on thousands of nodes would still run into this problem. Hence the work on finding algorithms that don't have that limit.