number of sockets is wrong metric (was: New article: 8 socket commodity servers)

Article: 8-Socket Commodity Servers: Flourish or Perish?
By: Vincent Diepeveen (diep.delete@this.xs4all.nl), March 15, 2010 12:36 pm
Room: Moderated Discussions
longtimelurker (rwt@nospam.maibaums.net) on 3/14/10 wrote:
---------------------------
>Vincent Diepeveen (diep@xs4all.nl) on 3/14/10 wrote:
>---------------------------
>
>>>but were still significantly faster. If you just care about number of cores and
>>>can parallelize well, you would build a cluster of 1S or 2S machines anyway.
>>
>>This is dead wrong in case of my software.
>>
>
>
>
>let me rephrase that into: "If you just care about number of cores and your parallelization
>approach tolerates communication latency well enough, you would just build a cluster..."
>
>excuse my laziness of not making that sentence long enough in the first place.
>Other than that, you're preaching to the choir, and as far as I understand we agree on my main point anyway?

No no you have a complete misconception what parallellization of software means.

Nowadays in science, and each year another few sciences gets added, brilliant people invent better algorithms to advance. These algorithms not seldom are very difficult to parallellize real well.

The basic thing most share is that each thread can contribute better to the whole when it can reference the shared memory caching.

In case of game tree search this is called 'hashtable', but in each type of science it is called different.

It is really difficult to parallellize this real well in general spoken, in a quantummechanica calculation in fact i had to find a trick to parallellize it over a quadcore, but the algorithm itself very hard to parallellize over more than say a 4 cores. There is plenty of examples.

This where it would be real cool if a simulation doesn't take 10 years to complete.

So parallellization code real well and embarrassingly parallel software are 2 total different concepts; i get impression most here are total ignorant of that fact.

In all those parallellized workloads, of course a 8 socket machine with shared memory is always going to outperform a cluster.

Please note a lot of different types of workloads still are not parallellized real well. We had already a discussion on compression here in RWT, but think also of a plan i have for a parallellized new type of MD5 sum.

Right now it is very difficult to parallellize md5; basically the only paralllellized workload there is if you want to crack the md5 sum by modifying a file such that it still has the same md5 yet after a modification you had to do in the file (correct a spelling mistake).

It's a matter of time until also such types of workloads will parallellize a lot better, probably best is to directly design them for a GPU, yet it'll require some sort of shared memory, as otherwise you can already prove that cracking it becomes too easy.

Now cracked md5sum for everyone up to secret level is not so interesting, but what is interesting is that taking the md5 for example from an entire harddrive is far slower than the actual bandwidth a harddrive or raid partition can deliver.

So right now it takes like nearly forever to do all this and it only works in embarrassingly parallel form, but that simply isn't acceptable anymore with respect to the future where the Ghz increases hardly yet the number of cores really explodes.

If we look to the latest intel chip now posted that's 12 logical cores and kicks butt. 1.8MLN nps for Diep.

http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=79&Itemid=1&limit=1&limitstart=17

Imagine that would still run embarrassingly parallel, then i wouldn't look like serious is it?

Diep is the best scaling game tree software that's there on the planet; but it is a lot harder to get the same speedup out of a cluster with 10 microsecond latency out of it, than it is at a shared memory machine.

The difference is really big.

You cannot say that 10 years of work from which a big part of those years in the parallel algorithm of it, means it works 'bad' parallel. It doesn't. It's the best parallel scaling and parallel speedup you can get in modern game tree search. There is nothing that's even *close* in performance.

Yet it is *not* embarrassingly parallel going to work like that on a cluster.

So far a lot of software could get away with it by not getting parallel, or using some sort of trick to get parallel; that will however disappear rapidly. When the average user will have 12 logical cores in his hobbyroom, he also will want software that can work with it in a nice manner.

If you have something that has scaling of 2 out of 4 at a quadcore, people do not really notice it. At 12 logical cores that's not a sellable story anymore.

So the pressure will soon come there for every type of software to work parallel very well.

What doesn't scale well will have to leave the market; this means that more and more software also will work great for 4 and 8 socket machines, but not at a cluster of n nodes, as they aren't MPI, let alone can handle those latencies.

Whether this means an increase in sales for those machines is not easy to figure out for me. Future will tell us.

Vincent
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New article: 8 socket commodity serversDavid Kanter2010/03/09 11:27 AM
  New article: 8 socket commodity serversVincent Diepeveen2010/03/09 02:46 PM
    number of sockets is wrong metric (was: New article: 8 socket commodity servers)longtimelurker2010/03/14 06:13 AM
      number of sockets is wrong metric (was: New article: 8 socket commodity servers)EduardoS2010/03/14 06:34 AM
      number of sockets is wrong metric (was: New article: 8 socket commodity servers)Wes Felter2010/03/14 11:33 AM
        Magny-CoursMax2010/03/14 05:56 PM
          Magny-Coursanonymous2010/03/14 07:33 PM
            Magny-Courslongtimelurker2010/03/15 03:54 AM
      number of sockets is wrong metric (was: New article: 8 socket commodity servers)Vincent Diepeveen2010/03/14 12:31 PM
        number of sockets is wrong metric (was: New article: 8 socket commodity servers)longtimelurker2010/03/14 02:37 PM
          number of sockets is wrong metric (was: New article: 8 socket commodity servers)Vincent Diepeveen2010/03/15 12:36 PM
      number of sockets is wrong metric (was: New article: 8 socket commodity servers)David Kanter2010/03/14 12:56 PM
        Bad mathDavid Kanter2010/04/01 02:24 AM
      number of sockets is wrong metric (was: New article: 8 socket commodity servers)slacker2010/03/14 03:51 PM
        number of sockets is wrong metric (was: New article: 8 socket commodity servers)Michael S2010/03/15 06:05 AM
          number of sockets is wrong metric (was: New article: 8 socket commodity servers)slacker2010/03/15 02:02 PM
            Memory interfacesDavid Kanter2010/03/15 02:17 PM
              Memory interfacesslacker2010/03/15 10:08 PM
                Patents on tiny components vs. large, complex thingsmpx2010/03/16 12:41 AM
                  Patents on tiny components vs. large, complex thingsRichard Cownie2010/03/16 06:58 AM
                    Patents on tiny components vs. large, complex thingsMS2010/03/17 06:42 PM
                      Patents on tiny components vs. large, complex thingsa reader2010/03/18 09:45 PM
          Serial Port Memory TechnologyDavid Hess2010/03/21 04:32 AM
  New article: 8 socket commodity serversMichael S2010/03/09 04:13 PM
    New article: 8 socket commodity serverstheluketaylor2010/03/09 06:32 PM
    New article: 8 socket commodity serversJesper Frimann2010/03/09 11:35 PM
    New article: 8 socket commodity serversDavid Kanter2010/03/10 01:38 AM
      New article: 8 socket commodity serversTim2010/03/16 09:44 AM
  New article: 8 socket commodity serversanon2010/03/09 07:59 PM
    New article: 8 socket commodity serversDavid Kanter2010/03/10 12:06 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?