Engineering Roundtable I: Newisys’ HORUS Chipset

Pages: 1 2 3 4 5 6 7 8 9 10

srsingh (Sep 22, 2004 2:31:45 PM)

David Kanter (Sep 22, 2004 2:31:49 PM)
I just released it

David Kanter (Sep 22, 2004 2:32:40 PM)
the problem is conversation flow is sort of one-way

David Kanter (Sep 22, 2004 2:33:07 PM)
So I am hoping that we’ll have around 25-35 people here

srsingh (Sep 22, 2004 2:33:11 PM)
how do we ask questions?

David Kanter (Sep 22, 2004 2:33:25 PM)
Well you just type it in like if you were chatting and then I can forward it to the guest

David Kanter (Sep 22, 2004 2:33:29 PM)
or make it public

David Kanter (Sep 22, 2004 2:34:10 PM)
I can send the messages back to the sender, guest or public

David Kanter (Sep 22, 2004 2:34:15 PM)
and I think I can modify them too

David Kanter (Sep 22, 2004 2:35:37 PM)
if you want to go to the Foyer, we can talk freely there

Rajesh Kota (Sep 22, 2004 2:55:14 PM)
I am in engg. roundtable.

David Wang (Sep 22, 2004 2:55:35 PM)
yes, we are ignoring you.

David Wang (Sep 22, 2004 2:55:37 PM)

srsingh (Sep 22, 2004 2:55:53 PM)

David Wang (Sep 22, 2004 2:56:01 PM)
How’s it going ?

Rajesh Kota (Sep 22, 2004 2:56:14 PM)
I am fine. How are you guys doing?

David Wang (Sep 22, 2004 2:56:45 PM)
Weather is nice in Maryland

David Wang (Sep 22, 2004 2:56:54 PM)
How’s it down in Texas?

Rajesh Kota (Sep 22, 2004 2:57:03 PM)

srsingh (Sep 22, 2004 2:57:07 PM)
warm and sunny in Canada

Dean Kent (Sep 22, 2004 2:57:29 PM)
FYI – if you send a message, it may not display on the chat immediately – a moderator must ‘release’ it so that the Guest can see it. This is how we prevent… um… unacceptable questions. :-).

mas (Sep 22, 2004 2:57:38 PM)
cold and dark 8pm in london ;-)

David Kanter (Sep 22, 2004 2:58:02 PM)
oh, you live near Rob Thorpe then…

David Wang (Sep 22, 2004 2:58:29 PM)
Shall we banter on for a few more minutes until the annointed hour?

Dean Kent (Sep 22, 2004 2:59:21 PM)
I think a David should give some general information on how the chat works for visitors, so they don’t get confused about where messages go, and how moderation works (I am good at delegating, eh?)

Rajesh Kota (Sep 22, 2004 2:59:34 PM)
Guys it is 2pm here. I am ready when you are.

David Kanter (Sep 22, 2004 2:59:37 PM)

David Wang (Sep 22, 2004 3:01:15 PM)
Newisys’s CTO said that the directory and large node cache is like a “belt and suspender” approach to scaling MP performance.

David Kanter (Sep 22, 2004 3:01:18 PM)
Ok, right now we are in a moderated room. In a moderated room, not all users can speak aloud.

David Kanter (Sep 22, 2004 3:01:24 PM)
The general format is that once you

David Kanter (Sep 22, 2004 3:01:49 PM)
‘speak’ a message it goes into a queue, which is then reviewed by David Wang, Dean, Rajesh and myself

David Wang (Sep 22, 2004 3:02:08 PM)
Could Newisys releasse a “belt only” or “suspender only” glue chip?

David Kanter (Sep 22, 2004 3:02:25 PM)
we then pick a message out of the queue, and then choose to direct it to the guest (Rajesh), or the public

David Kanter (Sep 22, 2004 3:03:01 PM)
This helps to keep the signal to noise ratio high and avoid inappropriate questions

Rajesh Kota (Sep 22, 2004 3:03:26 PM)
Answering david’s question regarding “belt and suspender”. The object of HORUS apart from extending the SMP from 8 to 32 is also to reduce latency for transactions. RDC helps with caching and DIR helps with removing unnecessary probes (reduces bandwidht usage).

Rajesh Kota (Sep 22, 2004 3:04:19 PM)
Yes it is possible to have only DIR or only DIR and RDC. We can support only RDC also but then we can’t support exclusive state in RDC if DIR doesn’t exist.

Rajesh Kota (Sep 22, 2004 3:05:47 PM)
no hardware support for Message Passing built into HORUS currently.

Groo (Sep 22, 2004 3:05:47 PM)
The next big thing in servers appears to be RAS. The Opteron is notably light there. Does Horus bring anything to the table that improves reliability and availibility of the Opteron platform? Does the onboard memory controller hamper this?

Rajesh Kota (Sep 22, 2004 3:07:16 PM)
From the start of HORUS we focused on RAS extensively. We improve the reliability and availabilty significantly.

mas (Sep 22, 2004 3:07:56 PM)
What sort of TPC-C numbers are you expecting for the 8/16/32 -ways ? Higher than a Superdome but less than Power 5 Servers ?

Rajesh Kota (Sep 22, 2004 3:08:44 PM)
Wrt opterons we don’t work around opterons to increase the reliability or availability of opterons. But with HORUS and system management running on our service processor we have lots of RAS features built in and supported.

srsingh (Sep 22, 2004 3:08:49 PM)
Regarding reliability, can you state what fraction of non-scan latches & flip-flops are protected by ECC or parity?

Rajesh Kota (Sep 22, 2004 3:10:18 PM)
Wrt to mas’s question on TPC-C. You can check out slide #13 of the hotchips presentation on our scaling of OLTP type applications. These are performance projections. I cann’t give you the actual TPC-C number since we don’t have the chips back yet from TSMC.

Rajesh Kota (Sep 22, 2004 3:10:39 PM)
Also you should be able to find TPC-C number for opteron system.

Rajesh Kota (Sep 22, 2004 3:12:18 PM)
Wrt srsingh’s question on RAS. All arrays are ECC protected. Single bit correct and double bit detect. Our crossbars buses are not parity or ecc protected but we have error checking that will detect invalid encoding and cause machine check events to Service processor.

Rajesh Kota (Sep 22, 2004 3:12:31 PM)
next question.

qin1 (Sep 22, 2004 3:13:01 PM)
Does Newisys have a plan to market the Horus chip itself?

Rajesh Kota (Sep 22, 2004 3:13:51 PM)
I think right now our buissness approach has been to sell complete system solutions to OEMs. Don’t know what approach marketing will take with Horus chip.

Rajesh Kota (Sep 22, 2004 3:14:19 PM)
next question.

Groo (Sep 22, 2004 3:14:36 PM)
Can you give us some idea of the type of clients, if any, who have expressed interest in horus systems? Are they mainly small server customers looking to move up, people cross shopping superdomes, impulse buyers with a few hundred K$s to spend?

Rajesh Kota (Sep 22, 2004 3:14:38 PM)
Please let me know if I haven’t completly answered your question.

Rajesh Kota (Sep 22, 2004 3:15:24 PM)
Can’t give you client details. sorry.

Rajesh Kota (Sep 22, 2004 3:16:02 PM)
But you can imagine the market for a system that supports 32 horus sockets.

Rajesh Kota (Sep 22, 2004 3:16:22 PM)
next q.

mas (Sep 22, 2004 3:16:24 PM)
If I read that right your 8-way is going to be worse than a glueless 8-way with a scaling factor of about 4 ?

Rajesh Kota (Sep 22, 2004 3:19:52 PM)
If you are referring to slide 13 of my hot chips presentation. You will notice the relative scaling between 8 single core sockets and 2 quad horus systems. Our performance simulations show that we are almost neck to neck. From the begining our performance targets for 2quad 8way to be better or no worse than hooking up 8 opterons together. But we pull ahead significantly when we go to dual core.

Rajesh Kota (Sep 22, 2004 3:20:32 PM)
next question.

David Wang (Sep 22, 2004 3:20:42 PM)
Wait a sec.

David Wang (Sep 22, 2004 3:21:06 PM)
Have you had a chance to look into the odd shape of that curve on slide 13?

David Wang (Sep 22, 2004 3:21:24 PM)
That is, scaling from 8 to 16 was better than from 4 to 8?

Dean Kent (Sep 22, 2004 3:23:35 PM)
FYI – if someone has a comment intended for general consumption, rather than a question, mark your message as such and it will be released immediately for others to read. All questions for Mr. Kota are queued up and released one at a time.

Rajesh Kota (Sep 22, 2004 3:23:53 PM)
Yes. At lower number of outstanding transactions the added latency of horus has its impact. HORUS is built for higher bandwidth. That is why you see we scale better as there are more outstanding transactions.

David Wang (Sep 22, 2004 3:24:18 PM)

Rajesh Kota (Sep 22, 2004 3:24:36 PM)
next q.

qin1 (Sep 22, 2004 3:24:53 PM)
How do we contact you for more info on the chip and collaboration on building a system using the chip?

Rajesh Kota (Sep 22, 2004 3:25:16 PM)
qin1. You can contact me at

Groo (Sep 22, 2004 3:25:28 PM)
Since you can’t talk about clients, let me somewhat rephrase that. Can you tell me what type of market these systems are aimed at? Where do you hope to see uptake?

Rajesh Kota (Sep 22, 2004 3:28:46 PM)
Two areas. Primarily Transaction Processing and high performance computing. They have some difference. One requires low latency and the other requires high bandwidth. Both of them are addressed in our design. Multiple protocol engines and wide remote links are for bandwidth and RDC and DIR and 500MHz core are for latency reductions.

Rajesh Kota (Sep 22, 2004 3:29:05 PM)
next q.

srsingh (Sep 22, 2004 3:29:07 PM)
In a discussion with Rich Oehler (and I believe the email was forwarded to you), Mr. Oehler stated that loaded latency for two quads using Horus is better than the latency of a glueless 8-way systems. My question is whether this is solely due to the directory/RDC of Horus, or does Horus also have a strategy to ensure a *stable*, loaded latency?

srsingh (Sep 22, 2004 3:29:09 PM)
That is to say, with many systems, as throughput approaches a maximum, latency explodes almost exponentially. Does Horus have mechanisms to ensure a stable, maximum latency?

Rajesh Kota (Sep 22, 2004 3:35:04 PM)
If you refer to slide 21 of my HOT Chips presentation you will see that for an eight way system has two pairs of remote links for connections. So we believe from raw bandwidth point of view (with out RDC and DIR) we have sufficient resources there. There is a max limit to how many outstanding transactions an opteron can have at a time. Most commercial applications don’t come to that max limit. So we believe we have enough raw bandwidth to handle all outstanding transaction in an eight way without RDC and DIR. With RDC and DIR we obviously will do much better. Ofcourse there isn’t anything preventing one from writting a simple loop that will bang the heck out of the coherent HT links. Currently we don’t see much usage of coherent HT links for commerical applications.

Rajesh Kota (Sep 22, 2004 3:35:30 PM)
next q.

David Kanter (Sep 22, 2004 3:35:40 PM)
It seems like both Opteron and Horus were designed to be multi-purpose, do you think that there are any design changes you might have made if you were going to specifically target HPC workloads? What about commercial server workloads (OLTP, batch processing, etc.)?

Rajesh Kota (Sep 22, 2004 3:38:23 PM)
I can always trade one of the two (bandwidth vs latency) and get better performance for one or the other.

Rajesh Kota (Sep 22, 2004 3:39:15 PM)
But we wanted to maximise both from the begining. In the end I believe we ended up with more bandwidht than we might need for OLTP.

Rajesh Kota (Sep 22, 2004 3:39:27 PM)
next q.

srsingh (Sep 22, 2004 3:39:49 PM)
Page 22 of the Hot Chips Horus slides states, “[U]sing Horus and IB cables…” — what are these IB cables? Infiniband?

Rajesh Kota (Sep 22, 2004 3:40:02 PM)

Rajesh Kota (Sep 22, 2004 3:40:12 PM)
next q.

David Kanter (Sep 22, 2004 3:40:38 PM)
Can a partition be made with non-integer resources (i.e. 2/3 of an I/O link, 1.5 MPUs, etc.)?

Rajesh Kota (Sep 22, 2004 3:41:21 PM)
Please eloberate more. What do you mean 2/3 of an I/O link?

David Kanter (Sep 22, 2004 3:42:52 PM)
If you have say, RAID controllers through PCI-X using an I/O link in a quad, could you share a RAID controller between two partitions

Rajesh Kota (Sep 22, 2004 3:42:59 PM)
If you mean can we have transactions from two different partitions share one physical I/O link, then the answer would be no.

David Kanter (Sep 22, 2004 3:43:34 PM)
Do you see that as a future feature you might be able to add?

Rajesh Kota (Sep 22, 2004 3:45:14 PM)
The reason for that partition is a concept supported by HORUS. But opterons or any of the chipsets don’t have a clue on how to keep transactions from two seperate partions logically seperate. I don’t think Newisys would go down the path of tring to use one physical link to support two partitions. We solve this in our system design by providing two I/O links in each box.

Rajesh Kota (Sep 22, 2004 3:45:43 PM)
next q.

David Wang (Sep 22, 2004 3:45:45 PM)
In regards to my “belt and suspender” comment earlier, I was thinking that since Horus supports cache coherency of local nodes through it, perhaps you can have a low cost Horus that glues together lower cost 2xx series Opterons and enable cheap larger scale MP boxes that are still “cache coherent”, (the cost of cache coherency would be proportionally greater. )

Rajesh Kota (Sep 22, 2004 3:46:40 PM)
See right side of page 5 of my hot chips presentation.

Rajesh Kota (Sep 22, 2004 3:47:01 PM)
next q.

kadickey (Sep 22, 2004 3:48:10 PM)
Sorry if this is a repeat, but what are your post-silicon validation plans? What tests will you run to ensure Horus and Opteron work as expected?

Rajesh Kota (Sep 22, 2004 3:53:41 PM)
We have verified horus and opteron compatibility pre-silicon against opteron RTL and also using FPGA (HORUS). Post-silicon validation plans are split along the following major lines. function/feature validation, os bring up, application testing, performance testing. We have a suite of stress tests that our system folks use for 2P and 4P systems that we build. We have a validation team and performance team that have there own suite of tests that they are working on. Once we have the systems verified we also will be shipping some boxes to customer. Sorry can’t go into detail verfication test plans here since they are large in number.

Rajesh Kota (Sep 22, 2004 3:54:12 PM)
next q.

jforbes (Sep 22, 2004 3:54:19 PM)
Question in regards to Horus and the Linux Kernel, this design will add another level of numaness so to speak. While sched_domains can map it appropriately, is anyone at NewIsys working with kernel developers to ensure that we are scheduling accordingly, taking node affinity and subnode affinity into account?

Rajesh Kota (Sep 22, 2004 3:57:15 PM)
We have a software group here in newisys. Currently I don’t know they are actively working on linux kernel tuning for Horus. I believe that we have linux modified locally for HORUS. I can put you in touch with them if you give me your contact info.

Rajesh Kota (Sep 22, 2004 3:58:10 PM)
next q.

dkanter (Sep 22, 2004 3:58:37 PM)
Are there any plans to incorporate lock-stepping, voting or other fault-tolerant features in future versions of Horus?

Rajesh Kota (Sep 22, 2004 4:00:18 PM)
No such plans for Horus currently.

Groo (Sep 22, 2004 4:00:33 PM)
There are three solutions for breaking the 8 way limit on the Opterons right now, Horus, Cray/Red Storm and Cray/Octigabay. Compare and contrast those, IE why are you better than an Octigabay setup?

Rajesh Kota (Sep 22, 2004 4:04:16 PM)
I don’t there implementation details. I don’t think they support 32 way SMP. I thought they were more cluster based. I might be wrong. In short, I can’t compare since I don’t know there implementation.

Rajesh Kota (Sep 22, 2004 4:04:50 PM)
next q.

Rajesh Kota (Sep 22, 2004 4:05:26 PM)
Sorry I ment their and not there in my previous reply.

David Wang (Sep 22, 2004 4:05:41 PM)
Red Storm relies on message passing by it’s special co-processors, so it’s not really tightly coupled cc, IIRC.

mas (Sep 22, 2004 4:05:53 PM)
What sort of data will be cached in Horus’s 64mb cache ? Will it be cache copies of Opterons in other quads ? Will it be also used as a conventional L3 as well, containing victim cache copies from local/external L2s or will it be inclusive like Intel’s?

srsingh (Sep 22, 2004 4:07:48 PM)
Page 23 of the Horus presentation states that “only data whose home is located in remote quads is cached in RDC”

Rajesh Kota (Sep 22, 2004 4:08:26 PM)
HORUS RDC will cache data whose home (memory controller) is in remote quads. It will fill the cache both on a read request and also on victim writes.

Rajesh Kota (Sep 22, 2004 4:09:03 PM)
So it is more like inclusive cache.

Rajesh Kota (Sep 22, 2004 4:09:21 PM)
next q.

David Wang (Sep 22, 2004 4:10:29 PM)
Would it be difficult to extend the design to larger scale ccMP boxes? 128/256?

Rajesh Kota (Sep 22, 2004 4:12:29 PM)
Yes. The difficulty is more in feasibility/implementation issues to keep the latency down and provide sufficient bandwidth. It will require more links, faster links, bigger RDC and DIRectories. Because the more quads you add the less sparce the directory will become.

mas (Sep 22, 2004 4:13:00 PM)
So are we talking just about cache or data in memory too in remote quads. If it’s just cache then you will need 64 1Mb remote Opterons to fill it up ?

Rajesh Kota (Sep 22, 2004 4:14:19 PM)
I don’t understand the 64 1Mb remote thingy. Please eloberate.

davidboles (Sep 22, 2004 4:14:28 PM)
Hi Rajesh, just stopped by to see what people are asking.

David Wang (Sep 22, 2004 4:14:51 PM)

Rajesh Kota (Sep 22, 2004 4:15:26 PM)
mas could you please explain your question. I think we are on different threads of discussion.

David Wang (Sep 22, 2004 4:15:41 PM)
let’s come back to mas’s Qn.

dkanter (Sep 22, 2004 4:15:44 PM)
Do you think it will be feasible eventually to put the RDC on-die with Horus? Any guesses as to what process node this might be at?

Rajesh Kota (Sep 22, 2004 4:19:22 PM)
Can’t put 64MB of data on 0.13u tech. I don’t think anybody (may be mosis?) supports 1T memory cell for TSMC 0.13u logic process. I believe IBM does. Don’t know how much you can stick on die. For us getting the TAGs on-die itself is a big challenge. It takes up most of our die. Can’t comment for future process nodes. I believe Intel’s new version of itanium supports like 27MB of cache on 90nm

David Wang (Sep 22, 2004 4:19:33 PM)
General comment: Montecito is up to 24 MB L3 at the 90nm node… Maybe 45nm node? But keeping it off die would probably be “better” That is. hitting on die RDC may be 50ns, and offdie RDC may be 70ns. paying the die cost for lowering the latency in that way may not be economical.

mas (Sep 22, 2004 4:19:55 PM)
Well is the data that is stored in the L3 just copies of remote Opterons L2s

David Wang (Sep 22, 2004 4:20:03 PM)
back to mas.

David Wang (Sep 22, 2004 4:20:35 PM)
couldn’t Horus cache data in remote memory as well?

Rajesh Kota (Sep 22, 2004 4:21:06 PM)
Terminology difference. RDC is not exactly L3. RDC doesn’t cache data whose home (memory controller) is in local quad. It caches data that local CPUs request from remote quads.

David Wang (Sep 22, 2004 4:21:43 PM)
So that data may be presently cached in the L2 of a remote quad, or in the DRAM of that remote quad?

Rajesh Kota (Sep 22, 2004 4:22:54 PM)
The purpose of RDC to prevent as much of the transactions to be completed locally with out going to remote quads. Little performance benifit if they have to go over remote links to remote quad.

David Wang (Sep 22, 2004 4:23:39 PM)
mas, is that satisfactory?

Rajesh Kota (Sep 22, 2004 4:23:41 PM)
The data that is cached in RDC is in shared or exclusive state. When it is in shared state that same memory line could be cached in local opterons, remote opterons, other remote RDCs.

Rajesh Kota (Sep 22, 2004 4:24:01 PM)
next q.

dkanter (Sep 22, 2004 4:24:09 PM)
What was your experience like using System C? Any particular advantages and disadvantages?

Rajesh Kota (Sep 22, 2004 4:26:12 PM)
We started our first model of HORUS using system C. This continued to become our performance model and it is still in system C. But our verification behaviour model is in Vera and our RTL is in verilog. The reason to move away from system C primarily was because of support from EDA vendour 3+ years ago.

Rajesh Kota (Sep 22, 2004 4:26:33 PM)
next q.

Will (Sep 22, 2004 4:26:34 PM)
Were there any particulary interesting(difficult) issues you had to deal with when designing around the glueless nature of the Opteron?

Rajesh Kota (Sep 22, 2004 4:28:16 PM)
A whole lot of issues. Most of these were due to undocumented things that opterons do in coherent HT domain. Some of these when found caused extensive redesign of our archtecture. yes lots of difficult issues were found and resolved.

Rajesh Kota (Sep 22, 2004 4:28:31 PM)
next q.

dkanter (Sep 22, 2004 4:28:40 PM)
The RDC achieves significant (~5x) improvements in performance. Are there exact figures for the latency for the RDC, order of magnitude figures?

Rajesh Kota (Sep 22, 2004 4:30:33 PM)
Yes. we have acurate modeling of latencies and resources here in our performance model. We have very low latency when we hit in RDC. Also we have some support provided to us by AMD in opterons that enable the transactions to be completed earlier when we hit in RDC.

David Wang (Sep 22, 2004 4:31:17 PM)
Okay, let’s finish up with on last Qn.

Rajesh Kota (Sep 22, 2004 4:31:26 PM)
next q.

Groo (Sep 22, 2004 4:31:29 PM)
Expanding on the scaling question, with Horus winding up, what is the next step? Anything on the wish list for Horus II that you can talk about?

Rajesh Kota (Sep 22, 2004 4:32:19 PM)
Can’t talk about it. But yes we are working on next steps.

Rajesh Kota (Sep 22, 2004 4:32:56 PM)
next q.

David Kanter (Sep 22, 2004 4:33:15 PM)
WE are actually ready to wrap up

David Kanter (Sep 22, 2004 4:33:24 PM)
I think it’s about 3:30 now

srsingh (Sep 22, 2004 4:33:58 PM)
Thank you, Mr. Kota, for attending this round table and answering our questions. Very much appreciated.

Groo (Sep 22, 2004 4:34:21 PM)
Thanks for your time.

David Kanter (Sep 22, 2004 4:34:27 PM)
Indeed, I enjoyed this immensely

Rajesh Kota (Sep 22, 2004 4:34:33 PM)
If you have any more questions please use the forum at RWT. Thanks everyone for the oppurtunity.

mas (Sep 22, 2004 4:34:42 PM)
yes thanks M.Kota and to RWT

David Kanter (Sep 22, 2004 4:34:49 PM)
Absolutely, thanks to everyone for showing up

Rajesh Kota (Sep 22, 2004 4:35:09 PM)

David Kanter (Sep 22, 2004 4:35:10 PM)
We need an audience just as much as moderators, and I think you guys had great questions

David Kanter (Sep 22, 2004 4:35:16 PM)
Bye rajesh

David Kanter (Sep 22, 2004 4:35:49 PM)
We will be making a cleaned up transcript of this Roundtable available at RWT in the future, so keep your eyes peeled

David Kanter (Sep 22, 2004 4:36:03 PM)
In the mean time, have a great day, where ever you are

David Kanter (Sep 22, 2004 4:36:41 PM)
With that, I believe we are finished

David Kanter (Sep 22, 2004 4:39:54 PM)
One last announcement, since this was our first Roundtable, we would welcome any feedback from users

David Kanter (Sep 22, 2004 4:40:11 PM)
If you have any comments, email, or

Pages: « Prev  1 2 3 4 5 6 7 8 9 10  

Be the first to discuss this article!