By: David Kanter (dkanter.delete@this.realworldtech.com), February 2, 2011 12:08 pm
Room: Moderated Discussions
Nicholas,
Due to the length, I need to trim a fair number of comments, especially those related to SW rasterization techniques (and respond in a separate post).
>David Kanter (dkanter@realworldtech.com) on 1/27/11 wrote:
>---------------------------
>>>Doom 3 on a GTX 460 at 2560x1600 4xAA runs at 53 FPS at >Ultra High Detail, and
>>>at 56 FPS at High Detail. I was being generous when I said >10%.
>>
>>That shows nothing about compression, that merely tells about the change in performance
>>due to larger textures. It's also largely about an older game that isn't designed for 2560x1600.
>
>The change in performance is the whole point.
No it's not. You made a specific claim about texture compression. Comparing large vs. small textures says NOTHING about absolute importance of compression, but only of relative compression.
>Doom 3's uncompressed textures are equal in dimensions to >the compressed ones.
I didn't say dimensions, I said size and bandwidth. Again, you claimed that texture compression does not save meaningful amounts of bandwidth. I see no proof of that claim.
>>What I'd want to see is for a number of MODERN games:
>>
>>1. Texture size (uncompressed vs. compressed)
>>2. Bandwidth usage (uncompressed vs. compressed)
>
>Texture size is irrelevant. They can be kept in compressed >form when unused. Only
>a fraction of all the texture levels is needed during a >frame.
>And I've tested 3DMark06 with SwiftShader
3DMark06 is not reflective of modern games. It's 5 years old now!
> while forcing >the mipmap LOD down one
>level (which is equivalent to reducing the texture bandwidth by a factor 4), and
>the SM3.0 score went from 250 to 249. Yes, due to some >statistical variance the
>score was actually lower. If texture bandwidth was of >great significance, you'd expect a much higher score.
Yes, but by reducing the texture size you have screwed up the compute:bandwidth ratio. Compressing textures will always improve that ratio...simply using smaller textures will preserve that ratio.
By using smaller textures you changed the workload substantially.
>>You're claiming that for #2 the difference is 10% and I don't see any real evidence
>>of that. Compression should be vastly more effective.
>
>Texture compression rates of 1:2 and 1:4 are very common, >but that doesn't translate
>into big performance improvements. Most of the time >there's sufficient bandwidth
>headroom to allow uncompressed textures without an impact >on performance.
And what about power? If I can transfer 2X or 4X less bytes, I can use a smaller (or slower) memory interface.
>And even
>in bandwidth limited situations, there's already a large >chunk of it used by color,
>z and geometry. So the performance won't drop by much.
I don't believe any numbers you've provided on this. The simulator results you posted were very old and the author of the simulator basically indicated that they weren't valid or useful.
[snip]
>>I happen to know the author of that study in question. The data is INCREDIBLY
>>OLD. It's from a simulator that did not use any Z or color compression, so the results cannot be taken seriously.
>
>Yes, it's from a simulator called ATTILA. And for the >record it did use z compression to generate that graph.
I am familiar with the tool and with the author. The data in question came from an old version of Attila, and as per my discussion with Victor Moya (the author)...the data is pretty much useless.
>And even though it's old, this data is still very >relevant. UT2004 has a very high
>TEX:ALU ratio, meaning that contemporary games are far >less likely to become bandwidth
>limited. UT2004 also has simple geometry and no pre-z >pass, again meaning that texture
>bandwidth has only become less relevant with newer games.
I'd have to see a fair number of modern games to know that.
[snip]
>>I think you underestimate the cost of adding pins to your >>memory controller.
>
>I'm not suggesting adding extra pins to make software >rendering viable. It's already viable bandwidth-wise.
You're suggesting getting rid of texture compression. That will increase bandwidth usage.
>Multi-core is driving the bandwidth needs up for all >applications, so I'm confident
>that it will be increased anyway in due time. But there's >no need for additional
>dedicated hardware. The caches do an excellent job at >reducing the overall bandwidth needs.
On modern games, with large textures, etc.?
[snip]
>>Actually for the mobile space it does have to come close to dedicated hardware. Battery life matters, a lot.
>
>For laptops we see the graphics solution range from IGPs >to lower clocked high-end
>desktop GPUs. So while battery life is important, it doesn't mean the TDP has to
>be the lowest possible. It just has to be acceptable. A cheaper IGP which consumes
>more power is likely to sell better than a more expensive >efficient one.
Power consumption != TDP.
And I strongly suspect that even a cheap IGP is going to be more efficient (power-wise) than software rendering on the CPU.
>Also note
>that today's GPUs have far more features than the average >consumer will really use,
>meaning they are less energy efficient than they could >have been. But the TDP is
>still acceptable for a good battery life.
No offense, but you don't seem to understand the difference between TDP and dynamic power consumption. They are only loosely related. Both are important, but I suspect SW rendering falls flat on its face for the latter.
>Furthermore, nobody expects a long battery life during >intense gaming. Even with
>dedicated graphics hardware the power consumption during >gaming is relatively high.
>So instead of a multi-core CPU with an IGP you might as >well have a CPU with a couple
>more cores. As long as the TDP is the same, it's >acceptable.
No it's not. Again, you don't seem to understand the difference between TDP and dynamic power consumption.
>>>Most likely the bandwidth will just steadily keep increasing, helping all (high
>>>bandwidth) applications equally. DDR3 is standard now accross the entire market,
>>>and it's evolving toward higher frequencies and lower voltage. Next up is DDR4,
>>>and if necessary the number of memory lanes can be >increased.
>>
>>More memory lanes substantially increases cost, which is >>something that everyone wants to avoid.
>
>They'll only avoid it till it's the cheapest solution. >Just like dual-channel and
>DDR3 became standard after some time, things are still >evolving toward higher bandwidth
>technologies. Five years ago the bandwidth achieved by >today's budget CPUs was unthinkable.
Actually it was quite predictable. We're still using the same basic memory architecture - dual channel DDRx.
>So frankly I don't care how they'll do it in the future, >but CPUs reaching 100's
>of GB/s of bandwidth will sooner or later be perfectly >normal.
You had better care, because it could easily drive up costs. The more you spend on your I/O the less area and power you have for compute.
>>>And again CPU technology is not at a standstill. With >T-RAM just around the corner
>>>we're looking at 20+ MB of cache for mainstream CPUs in >the not too distant future.
>>
>>T-RAM is not just around the corner.
>
>This news item suggests otherwise: http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
Please stop making me repeat myself. How about we make a gentleman's wager?
You're apparently confident that TRAM will be shipping in products soon. I'm confident it won't.
So let's define what soon means, and then we can step back and see who's right and who is wrong.
For me, soon means a year.
>But even if it does take longer, it doesn't really matter to the long-term viability
>of software rendering. There will be a breakthrough at some point and it will advance
>the convergence by making dedicated decompression hardware totally unnecessary (if
>it even has any relevance left today).
It totally matters. If you are expecting a magical 2-4X increase in cache density that is iso-process, then you might as well just give up. And yes, it seems to me that much of what you are claiming is predicated on a magical increase in cache density.
>>>And while the expectations for 'adequate' graphics go up as well, it's only a slow
>>>moving target. First we saw the appearance of IGPs as an adequate solution for a
>>>large portion of the market, and now things are evolving >in favor of software rendering.
>>
>>I think if you look at the improvement in IGPs, that's a >>very FAST improving target.
>
>The hardware isn't the target.
Yes it is. To be competitive, your solution needs to have comparable power and adequate performance to a hardware solution.
>Consumer expectation is the target. Sandy Bridge
>leaves a hole in the market for consumers who want a >powerful CPU but are content with minimal graphics support.
How many consumers want that? 10? 100? 1K? 1M?
>>Swiftshader is swiftshader - other SW rendering systems >work differently. They may (or may not) see similar >benefits.
>
>Again, why does that make SwiftShader's results only "meaningful" for SwiftShader?
That's easy. Because other SW renders may work *differently* from swiftshader. If they work *differently*, they will have *different* performance and *different* bottlenecks and *different* performance gains from the changes in Sandy Bridge.
Do you see a theme in the above statement?
>All reviews only include a select number of benchmark >applications. Does that mean
>the results are meaningless for other applications? Of >course not.
It means that you need to look at a large number of samples (i.e. benchmarks) to ascertain overall performance.
Let me give you a hypothetical example here. Say GCC runs 20% faster on Sandy Bridge than Nehalem. Now what does that tell you about the performance gain for LLVM or MSVC on Sandy Bridge?
>Unless you can give me any sort of indication how another >software renderer with
>Shader Model 3.0 support could be bandwidth limited, 30% >higher performance with
>55% of the bandwidth is extremely meaningful for any such >software renderer.
>
>You seem to be suggesting that SwiftShader is doing something wrong which makes
>it 30% faster with 55% of the bandwidth. If that's the >case, great! It means that things can get much faster >still.
I'm saying it may work differently from other SW rendering approaches, and therefore have different bottlenecks.
[snip]
>Anyway, to meet you in the middle I downclocked my >i7-920's memory from 1600 MHz
>to 960 MHz, and the 3DMark06 SM3.0 score went from 250 to >247. So once again, reducing
>the bandwidth to 60% has no significant impact on >performance.
Alright...now we are getting somewhere, thanks for taking the effort to look into this! So you definitely have shown that for 3DMark06, there is not a big bandwidth bottleneck.
However, I care about modern games. Could you try and run something like Crysis or Civ 5 or 3D Mark Vantage?
>>Reducing the number of loads and stores isn't really relevant. It's the number
>>of operations that matters. If you are gathering 16 different addresses, you are
>>really doing 16 different load operations.
>
>Not with Larrabee's implementation. It only takes as many >uops as the number of
>cache lines that are needed to collect all elements, per >load unit.
Yes, that's exactly what I said. It's the access pattern that's a problem.
[snip]
>>No, that's totally different. With unified shaders you can easily use more or
>>less geometry...until you run out of physical shaders to execute on. You should
>>look at Kayvon's work on micro-polygon rendering.
>
>Easily? I think you're seriously underestimating the >complexity of adapting your
>software to the hardware. Checking whether you're vertex >or pixel processing limited
>wasn't feasible in actual games ten years ago, and it >still isn't.
Sure you can, there are profiling tools for that. Nvidia and ATI both have them.
[snip]
>>>It's clear that telling software developers what (not) to >do doesn't result in
>>>a succesful next generation hardware architecture. With >non-unified architectures,
>>>there were numerous applications which were vertex >processing limited, and numerous
>>>ones which were pixel processing limited. And even those >in the middle have a fluctuating workload.
>>
>>Yes, except a unified shader architecture doesn't really preclude that many options.
>
>That's what I'm saying.
Um no. You said that programmable shaders are too limited, and you want programmable rasterization and texturing.
I am not convinced that those two should be programmable, and I'm not convinced that FF hardware is really that restrictive.
Moreover, even if it is limiting...it's not obvious to me when you want to make a shift from FF to SW. Except that the next 2-3 years don't seem promising.
[snip]
>>First, you do need to count the increases in resolution. Pixels have definitely
>>increased over time.
>
>Yes the resolution has increased but everything else scaled accordingly. More pixels
>doesn't mean higher benfit from texture compression. In fact TEX:ALU is going down,
>meaning pixel shaders are more compute limited than >bandwidth limited.
I'm skeptical. Available bandwidth always grows more slowly than compute...
>>>In ten more years caches could be around 256 MB, and >that's without taking revolutionary
>>>new technologies like T-RAM into account. So it's really >hard to imagine that this
>>>won't suffice to compensate for the texture bandwidth >needs of the low-end graphics market.
>>
>>Because you are imagining that the low-end market stays put. It won't.
>
>I didn't say it stays put. I said it's a slow moving target. Evidence of this is
>the ever growing gap between high-end and low-end graphics hardware. IGPs were born
>out of the demand for really cheap but adequate 3D graphics support. They cover the majority of the market:
>http://unity3d.com/webplayer/hwstats/pages/web-2011Q1-gfxvendor.html
>
>This massive market must obviously have a further division >in price and performance
>expectations. Some people want a more powerful CPU for the >same price by sacrificing
>a bit of graphics performance, while others simply want a >cheaper system that isn't
>aimed at serious gaming. As the CPU performance continues >to increase exponentially,
>and things like gather/scatter can make a drastic >difference in graphics efficiency,
>software rendering can satisfy more and more people's expectations, even if those
>expectations themselves incease slowly.
In essence, what you are saying is that some people would be fine with lower performance graphics. That's something I agree with.
I just don't know what the relative performance of SW rendering is to dedicated hardware, and how that curve will change over time.
My sense is that hardware is probably getting relatively faster, given the attention Intel is paying.
>>>SwiftShader 1.0 was first used by a 2D casual game called Galapago. Despite the
>>>game's graphical simplicity, it was totally texture sampling limited and it was
>>>barely reaching the necessary 10 FPS for playability. That >was five years ago.
>>Today we have Crysis running at 20+ >FPS.
>>
>>What resolution and quality settings?
>
>800x600 at low detail. It's twice as fast as Microsoft WARP: http://msdn.microsoft.com/en-us/library/dd285359.aspx
Yes, but that's not a realistic resolution. Most displays are going to be probably 1280x960 (or something of that ilk).
>>That's because it won't.
>
>It will. The only strengths the GPU has left are all >components based on the ability
>to load/store lots of data in parallel. The CPU cores >already achieve higher GFLOPS
>than the IGP
That's true today, but I suspect it won't be true in the future. Also remember that you have to share those FLOP/s with other tasks.
>so gather/scatter unlocks that power for >graphics >applications.
>You
>can either ditch the IGP to make things cheaper, or >replace it with additional CPU
>cores so you get a really powerful processor for any >workload.
I think to achieve IGP level of performance, using an IGP is the most efficient in terms of power and area.
>>>http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
>>
>>That means nothing. It means that GF is investigating the technology, not that it's production ready.
>
>Quoting the announcement: "...into a joint DEVELOPMENT >agreement targeted toward
>the APPLICATION of T-RAM’s Thyristor-RAM embedded >memory..."
>
>Emphasis mine. Why would a major foundry enter into a >development agreement with
>a startup, unless the technology has already been proven >on a smaller scale?
To *jointly investigate*. Look AMD even licensed ZRAM's first generation crap:
http://www.eetimes.com/electronics-news/4057964/AMD-licenses-Innovative-Silicon-s-SOI-memory
When did it get commercialized? Never...
>Quoting http://www.t-ram.com/news/media/3B.1_IRPS2009_Salling.pdf:
>"Taken together, the results of this study show
>that T-RAM is a reliable and manufacturable memory
>technology."
>
>Quoting t-ram.com: "T-RAM Semiconductor has successfully developed the Thyristor-RAM
>technology from concept to production-readiness. Our Thyristor-RAM technology has
>been successfully implemented on both Bulk and SOI CMOS. "
>
>Sounds like production ready to me.
Not even close.
>>>Anyway, there are multiple high density cache technologies. There's Thyristor-RAM,
>>
>>That's T-RAM.
>
>I know. I was just summing up "high density cache >technologies".
>
>>>1T-RAM,
>>
>>Not a replacement for SRAM.
>
>Why not? Looks useful as L3 cache to me.
>>>2nd gen Z-RAM
>>
>>Doesn't work at all.
>
>Maybe not as cache memory, but it's hopeful as a DRAM replacement: http://www.z-ram.com/en/pdf/Z-RAM_LV_and_bulk_PR_Final_for_press.pdf
Just a moment ago, you were suggesting it as a cache replacement. Now you suddenly are back-tracking? And nobody really wants a proprietary DRAM replacement.
My point is that all this stuff is totally unproven from a high volume perspective. It may be 'feasible', but that doesn't make it 'economical'. And in the case of ZRAM, it wasn't even feasible in the first place. TRAM seems better, but it's very unclear.
>>It's possible, but they will need to become more competitive from an energy perspective with fixed function stuff.
>
>There's not a lot of fixed-function stuff left. The >majority of the GPU's die space
>consist of programmable or generic components.
>
>And I've shown before that the CPUs FLOPS/Watt is in the >same league as GPUs:
>- Core i7-282QM: 150 GFLOPS / 45 Watt (more with Turbo Boost)
>- GeForce GT 420: 134.4 GFLOPS / 50 Watt
The GT 420 is ancient. A better comparison would be the GT 440, which is 96 shaders, 1.6GHz and 65W. That's ~300 GFLOP/s for 65W, or a roughly 2X advantage.
>Obviously software rendering requires a bit more >arithmetic power to implement
>the remaining fixed-function functionality, but >programmable shaders take the bulk.
>
>So there's no lack of energy efficiency. The CPU simply >can't utilize its computing power effectively
GPUs are definitely more power efficient than CPUs.
>During the early days of AC'97 there was some pretty >serious debate about moving
>the audio processing workload to the CPU. It made a real >difference in benchmark
>results. People who back then swore by the efficiency of >dedicated sound cards, now happily use HD Audio.
>>Perhaps when graphics
>>gets to that point, it will be fine to put it in SW.
>
>Exactly. There's no doubt it will happen, some day. My >take is that gather/scatter
>support is sufficient to initiate the move to software >rendering.
I bet HD Audio uses about 1-2% of the overall CPU cycles. Perhaps when graphics gets in that neighborhood - say 5-10% of the CPU cycles...it might be feasible. But I don't think that's really feasible today or in the near future.
>>And yes, power efficiency matters a lot. You may not >>think so, but it does.
>
>I do think it matters, a lot. But I think you're >underestimating how power efficient
>CPUs already are. It just doesn't translate into high >effective performance for
>3D graphics due to wasting a lot of cycles on moving data >elements around.
CPUs are simply not as power efficient as GPUs, because they spend more power on reducing latency of operations. There's pretty hard data to prove it as well. Consider the efficiency for ATI GPUs vs. a CPU.
>An AVX FMA instruction can perform 16 operations every >single cycle, but it would
>take a whopping 72 uops if every address and element was >extracted/inserted sequentially.
>When it comes to load/store, we haven't evolved beyond x87 >yet. Of course this is
>the worst case and typically not every vector load/store >has to be a gather/scatter,
>but for situations where you do need them it makes a >massive difference.
That's because the hardware for scatter/gather is expensive and power hungry.
[snip]
>>I don't think you design CPUs for high volume applications. Most don't need scatter/gather
>>and the hardware cost is high.
>
>All applications that contain loops can benefit from >gather/scatter. That's all applications.
If that's true, then what % performance increase could we expect to see in SPECint?
>With sather/scatter support every scalar operation would >have a parallel equivalent.
>So any loop with independent iterations can be >parallelized and execute up to 8 times faster.
That's assuming there is no control flow divergence.
>And I don't think the hardware cost is that high. All you >need is a bit of logic
>to check which elements are located in the same cache >line, and four byte shift
>units per 128-bit load units instead of one, to collect >the individual elements.
>Note that logic for sequentially accessing the cache lines >is already largely in
>place to support load operations which straddle a cache >line boundary.
You are saying that because you don't design hardware. What you are suggesting is in fact, quite complicated and large.
>>Really? Have you heard of Vertica? They do an awful lot >>of lossless compression of data in memory.
>
>No, I hadn't heard about them before. Could you point me >to some document where
>they detail how they added hardware support for compressed >memory transfers to reduce bandwidth?
They don't need hardware to do lossless compression. They have a clever column oriented database. Check vertica.com. One of their big performance gains is from reducing memory (and disk) bandwidth.
>>Many applications use adjacent values.
>
>Yes, and many applications also use non-adjacent values.
>
>If a loop contains just one load or store at an address >which isn't consecutive,
>it can't be vectorized (unless you want to resort to >serially extracting/inserting
>addresses and values). So even if the majority of values >are adjacent, it doesn't
>take a lot of non-adjacent data to cripple the performance.
You can still vectorize it, you just need to have a bunch of scalar loads/stores to deal with the non-adjacent addresses.
>>>Why? It only accesses the cache lines it needs. If all >elements are from the same
>>>cache line, it's as fast as accessing a single element.
>>
>>And exactly as fast as using AVX! i.e. no improvement >>and more complexity/power.
>
>No. The addresses are unknown at compile time. So the only >>option with AVX1 is
>to sequentially extract each address from the address >vector, and insert the read
>element into the result vector. This takes 18 instructions.
>With gather support it would be just one instruction. >Assuming it gets split into
>two 128-bit gather uops, the maximum throughput is 1 every >cycle and the minimal throughput is 1 every 4 cycles.
>>>But even in the worst case
>>>it can't generate more misses or consume more bandwidth.
>>
>>It sure can. Now instead of having 1-2 TLB accesses per cycle, you get 16. How
>>many TLB copies do you want? How many misses in flight do you want to support?
>
>You're still not getting it. It only accesses one cache >line per cycle. It simply
>has to check which elements are within the same cache >line, and perform a single
>TLB access for all of these elements. Checking whether the >addresses land on the
>same cache line doesn't require full translation of each >address.
That's quite complicated hardware, and you can't afford to have that on the critical path for any of your normal loads. So now you need a fairly separate load/store pipeline for scatter/gather.
>Nothing other than graphics runs better on the IGP. As >I've mentioned before, GPGPU
>is only succesful using high-end hardware.
Today...unclear what tomorrow holds.
>So the CPU is better than the IGP at absolutely everything >else. That makes it
>really tempting to have a closer look at what it would >take to make it adequately efficient at graphics as well.
>
>The answer: gather/scatter.
It would also need a 2X improvement in FLOP/w and /mm2, possibly more.
>Multi-core, 256-bit vectors, Hyper-Threading, software >pipelining... the CPU is
>already a throughput device! It's just being held back by >the lack of parallel load/store
>support. It's the one missing part to let all those GFLOPS >come to full fruition.
You keep on repeating this as if it were true, but it's not. I agree that lack of scatter/gather is an issue. But a more fundamental issue is that throughput optimized cores (e.g. shader arrays) are simply more efficient for compute rich workloads. You can't really get around that.
>What specialized hardware would that be? I've already shown that texture compression
>hardly makes a difference,
No, you cited extremely old data from a simulator, where even the author of the simulator thinks the data is not useful.
>and sampling and filtering is becoming programmable anyway.
>Gather/scatter speeds up just about every other pipeline >stage as well.
Except it doesn't benefit many workloads, and it costs a lot of area and power. So you want to disable it on the many workloads where it does not help.
>>Yes, but you can charge more for the system since it gets >>better battery life.
>
>No you can't, because the competition will sell it for >less and take away your market share.
What is the price delta is only $5?
>>I totally agree that scatter/gather is a great capability to have. But what's
>>the cost in die area, power and complexity? Not just to the core, but also the memory controller, etc.
>
>Larrabee has wider vectors and smaller cores, but features gather/scatter support.
>So I don't think it takes a lot of die space either way. It doesn't require any
>changes to the memory controller, just the load/store units. I'm not entirely sure
>but collecting four elements from a cache line can probably largely make use of
>the existing network to extract one (unaligned) value. And checking which addresses
>land on the same cache line is a very simple equality test >of the upper bits.
I think you have no or minimal experience designing hardware, so I'm not really inclined to take your word for it...especially compared against the expertise of the thousands of CPU designers at places like Intel, AMD and IBM.
Scatter/gather is expensive and that's why it isn't done. Even the LRB implementation was fairly limited compared to some of the older vector machines (which had both temporal and spatial scatter/gather).
David
Due to the length, I need to trim a fair number of comments, especially those related to SW rasterization techniques (and respond in a separate post).
>David Kanter (dkanter@realworldtech.com) on 1/27/11 wrote:
>---------------------------
>>>Doom 3 on a GTX 460 at 2560x1600 4xAA runs at 53 FPS at >Ultra High Detail, and
>>>at 56 FPS at High Detail. I was being generous when I said >10%.
>>
>>That shows nothing about compression, that merely tells about the change in performance
>>due to larger textures. It's also largely about an older game that isn't designed for 2560x1600.
>
>The change in performance is the whole point.
No it's not. You made a specific claim about texture compression. Comparing large vs. small textures says NOTHING about absolute importance of compression, but only of relative compression.
>Doom 3's uncompressed textures are equal in dimensions to >the compressed ones.
I didn't say dimensions, I said size and bandwidth. Again, you claimed that texture compression does not save meaningful amounts of bandwidth. I see no proof of that claim.
>>What I'd want to see is for a number of MODERN games:
>>
>>1. Texture size (uncompressed vs. compressed)
>>2. Bandwidth usage (uncompressed vs. compressed)
>
>Texture size is irrelevant. They can be kept in compressed >form when unused. Only
>a fraction of all the texture levels is needed during a >frame.
>And I've tested 3DMark06 with SwiftShader
3DMark06 is not reflective of modern games. It's 5 years old now!
> while forcing >the mipmap LOD down one
>level (which is equivalent to reducing the texture bandwidth by a factor 4), and
>the SM3.0 score went from 250 to 249. Yes, due to some >statistical variance the
>score was actually lower. If texture bandwidth was of >great significance, you'd expect a much higher score.
Yes, but by reducing the texture size you have screwed up the compute:bandwidth ratio. Compressing textures will always improve that ratio...simply using smaller textures will preserve that ratio.
By using smaller textures you changed the workload substantially.
>>You're claiming that for #2 the difference is 10% and I don't see any real evidence
>>of that. Compression should be vastly more effective.
>
>Texture compression rates of 1:2 and 1:4 are very common, >but that doesn't translate
>into big performance improvements. Most of the time >there's sufficient bandwidth
>headroom to allow uncompressed textures without an impact >on performance.
And what about power? If I can transfer 2X or 4X less bytes, I can use a smaller (or slower) memory interface.
>And even
>in bandwidth limited situations, there's already a large >chunk of it used by color,
>z and geometry. So the performance won't drop by much.
I don't believe any numbers you've provided on this. The simulator results you posted were very old and the author of the simulator basically indicated that they weren't valid or useful.
[snip]
>>I happen to know the author of that study in question. The data is INCREDIBLY
>>OLD. It's from a simulator that did not use any Z or color compression, so the results cannot be taken seriously.
>
>Yes, it's from a simulator called ATTILA. And for the >record it did use z compression to generate that graph.
I am familiar with the tool and with the author. The data in question came from an old version of Attila, and as per my discussion with Victor Moya (the author)...the data is pretty much useless.
>And even though it's old, this data is still very >relevant. UT2004 has a very high
>TEX:ALU ratio, meaning that contemporary games are far >less likely to become bandwidth
>limited. UT2004 also has simple geometry and no pre-z >pass, again meaning that texture
>bandwidth has only become less relevant with newer games.
I'd have to see a fair number of modern games to know that.
[snip]
>>I think you underestimate the cost of adding pins to your >>memory controller.
>
>I'm not suggesting adding extra pins to make software >rendering viable. It's already viable bandwidth-wise.
You're suggesting getting rid of texture compression. That will increase bandwidth usage.
>Multi-core is driving the bandwidth needs up for all >applications, so I'm confident
>that it will be increased anyway in due time. But there's >no need for additional
>dedicated hardware. The caches do an excellent job at >reducing the overall bandwidth needs.
On modern games, with large textures, etc.?
[snip]
>>Actually for the mobile space it does have to come close to dedicated hardware. Battery life matters, a lot.
>
>For laptops we see the graphics solution range from IGPs >to lower clocked high-end
>desktop GPUs. So while battery life is important, it doesn't mean the TDP has to
>be the lowest possible. It just has to be acceptable. A cheaper IGP which consumes
>more power is likely to sell better than a more expensive >efficient one.
Power consumption != TDP.
And I strongly suspect that even a cheap IGP is going to be more efficient (power-wise) than software rendering on the CPU.
>Also note
>that today's GPUs have far more features than the average >consumer will really use,
>meaning they are less energy efficient than they could >have been. But the TDP is
>still acceptable for a good battery life.
No offense, but you don't seem to understand the difference between TDP and dynamic power consumption. They are only loosely related. Both are important, but I suspect SW rendering falls flat on its face for the latter.
>Furthermore, nobody expects a long battery life during >intense gaming. Even with
>dedicated graphics hardware the power consumption during >gaming is relatively high.
>So instead of a multi-core CPU with an IGP you might as >well have a CPU with a couple
>more cores. As long as the TDP is the same, it's >acceptable.
No it's not. Again, you don't seem to understand the difference between TDP and dynamic power consumption.
>>>Most likely the bandwidth will just steadily keep increasing, helping all (high
>>>bandwidth) applications equally. DDR3 is standard now accross the entire market,
>>>and it's evolving toward higher frequencies and lower voltage. Next up is DDR4,
>>>and if necessary the number of memory lanes can be >increased.
>>
>>More memory lanes substantially increases cost, which is >>something that everyone wants to avoid.
>
>They'll only avoid it till it's the cheapest solution. >Just like dual-channel and
>DDR3 became standard after some time, things are still >evolving toward higher bandwidth
>technologies. Five years ago the bandwidth achieved by >today's budget CPUs was unthinkable.
Actually it was quite predictable. We're still using the same basic memory architecture - dual channel DDRx.
>So frankly I don't care how they'll do it in the future, >but CPUs reaching 100's
>of GB/s of bandwidth will sooner or later be perfectly >normal.
You had better care, because it could easily drive up costs. The more you spend on your I/O the less area and power you have for compute.
>>>And again CPU technology is not at a standstill. With >T-RAM just around the corner
>>>we're looking at 20+ MB of cache for mainstream CPUs in >the not too distant future.
>>
>>T-RAM is not just around the corner.
>
>This news item suggests otherwise: http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
Please stop making me repeat myself. How about we make a gentleman's wager?
You're apparently confident that TRAM will be shipping in products soon. I'm confident it won't.
So let's define what soon means, and then we can step back and see who's right and who is wrong.
For me, soon means a year.
>But even if it does take longer, it doesn't really matter to the long-term viability
>of software rendering. There will be a breakthrough at some point and it will advance
>the convergence by making dedicated decompression hardware totally unnecessary (if
>it even has any relevance left today).
It totally matters. If you are expecting a magical 2-4X increase in cache density that is iso-process, then you might as well just give up. And yes, it seems to me that much of what you are claiming is predicated on a magical increase in cache density.
>>>And while the expectations for 'adequate' graphics go up as well, it's only a slow
>>>moving target. First we saw the appearance of IGPs as an adequate solution for a
>>>large portion of the market, and now things are evolving >in favor of software rendering.
>>
>>I think if you look at the improvement in IGPs, that's a >>very FAST improving target.
>
>The hardware isn't the target.
Yes it is. To be competitive, your solution needs to have comparable power and adequate performance to a hardware solution.
>Consumer expectation is the target. Sandy Bridge
>leaves a hole in the market for consumers who want a >powerful CPU but are content with minimal graphics support.
How many consumers want that? 10? 100? 1K? 1M?
>>Swiftshader is swiftshader - other SW rendering systems >work differently. They may (or may not) see similar >benefits.
>
>Again, why does that make SwiftShader's results only "meaningful" for SwiftShader?
That's easy. Because other SW renders may work *differently* from swiftshader. If they work *differently*, they will have *different* performance and *different* bottlenecks and *different* performance gains from the changes in Sandy Bridge.
Do you see a theme in the above statement?
>All reviews only include a select number of benchmark >applications. Does that mean
>the results are meaningless for other applications? Of >course not.
It means that you need to look at a large number of samples (i.e. benchmarks) to ascertain overall performance.
Let me give you a hypothetical example here. Say GCC runs 20% faster on Sandy Bridge than Nehalem. Now what does that tell you about the performance gain for LLVM or MSVC on Sandy Bridge?
>Unless you can give me any sort of indication how another >software renderer with
>Shader Model 3.0 support could be bandwidth limited, 30% >higher performance with
>55% of the bandwidth is extremely meaningful for any such >software renderer.
>
>You seem to be suggesting that SwiftShader is doing something wrong which makes
>it 30% faster with 55% of the bandwidth. If that's the >case, great! It means that things can get much faster >still.
I'm saying it may work differently from other SW rendering approaches, and therefore have different bottlenecks.
[snip]
>Anyway, to meet you in the middle I downclocked my >i7-920's memory from 1600 MHz
>to 960 MHz, and the 3DMark06 SM3.0 score went from 250 to >247. So once again, reducing
>the bandwidth to 60% has no significant impact on >performance.
Alright...now we are getting somewhere, thanks for taking the effort to look into this! So you definitely have shown that for 3DMark06, there is not a big bandwidth bottleneck.
However, I care about modern games. Could you try and run something like Crysis or Civ 5 or 3D Mark Vantage?
>>Reducing the number of loads and stores isn't really relevant. It's the number
>>of operations that matters. If you are gathering 16 different addresses, you are
>>really doing 16 different load operations.
>
>Not with Larrabee's implementation. It only takes as many >uops as the number of
>cache lines that are needed to collect all elements, per >load unit.
Yes, that's exactly what I said. It's the access pattern that's a problem.
[snip]
>>No, that's totally different. With unified shaders you can easily use more or
>>less geometry...until you run out of physical shaders to execute on. You should
>>look at Kayvon's work on micro-polygon rendering.
>
>Easily? I think you're seriously underestimating the >complexity of adapting your
>software to the hardware. Checking whether you're vertex >or pixel processing limited
>wasn't feasible in actual games ten years ago, and it >still isn't.
Sure you can, there are profiling tools for that. Nvidia and ATI both have them.
[snip]
>>>It's clear that telling software developers what (not) to >do doesn't result in
>>>a succesful next generation hardware architecture. With >non-unified architectures,
>>>there were numerous applications which were vertex >processing limited, and numerous
>>>ones which were pixel processing limited. And even those >in the middle have a fluctuating workload.
>>
>>Yes, except a unified shader architecture doesn't really preclude that many options.
>
>That's what I'm saying.
Um no. You said that programmable shaders are too limited, and you want programmable rasterization and texturing.
I am not convinced that those two should be programmable, and I'm not convinced that FF hardware is really that restrictive.
Moreover, even if it is limiting...it's not obvious to me when you want to make a shift from FF to SW. Except that the next 2-3 years don't seem promising.
[snip]
>>First, you do need to count the increases in resolution. Pixels have definitely
>>increased over time.
>
>Yes the resolution has increased but everything else scaled accordingly. More pixels
>doesn't mean higher benfit from texture compression. In fact TEX:ALU is going down,
>meaning pixel shaders are more compute limited than >bandwidth limited.
I'm skeptical. Available bandwidth always grows more slowly than compute...
>>>In ten more years caches could be around 256 MB, and >that's without taking revolutionary
>>>new technologies like T-RAM into account. So it's really >hard to imagine that this
>>>won't suffice to compensate for the texture bandwidth >needs of the low-end graphics market.
>>
>>Because you are imagining that the low-end market stays put. It won't.
>
>I didn't say it stays put. I said it's a slow moving target. Evidence of this is
>the ever growing gap between high-end and low-end graphics hardware. IGPs were born
>out of the demand for really cheap but adequate 3D graphics support. They cover the majority of the market:
>http://unity3d.com/webplayer/hwstats/pages/web-2011Q1-gfxvendor.html
>
>This massive market must obviously have a further division >in price and performance
>expectations. Some people want a more powerful CPU for the >same price by sacrificing
>a bit of graphics performance, while others simply want a >cheaper system that isn't
>aimed at serious gaming. As the CPU performance continues >to increase exponentially,
>and things like gather/scatter can make a drastic >difference in graphics efficiency,
>software rendering can satisfy more and more people's expectations, even if those
>expectations themselves incease slowly.
In essence, what you are saying is that some people would be fine with lower performance graphics. That's something I agree with.
I just don't know what the relative performance of SW rendering is to dedicated hardware, and how that curve will change over time.
My sense is that hardware is probably getting relatively faster, given the attention Intel is paying.
>>>SwiftShader 1.0 was first used by a 2D casual game called Galapago. Despite the
>>>game's graphical simplicity, it was totally texture sampling limited and it was
>>>barely reaching the necessary 10 FPS for playability. That >was five years ago.
>>Today we have Crysis running at 20+ >FPS.
>>
>>What resolution and quality settings?
>
>800x600 at low detail. It's twice as fast as Microsoft WARP: http://msdn.microsoft.com/en-us/library/dd285359.aspx
Yes, but that's not a realistic resolution. Most displays are going to be probably 1280x960 (or something of that ilk).
>>That's because it won't.
>
>It will. The only strengths the GPU has left are all >components based on the ability
>to load/store lots of data in parallel. The CPU cores >already achieve higher GFLOPS
>than the IGP
That's true today, but I suspect it won't be true in the future. Also remember that you have to share those FLOP/s with other tasks.
>so gather/scatter unlocks that power for >graphics >applications.
>You
>can either ditch the IGP to make things cheaper, or >replace it with additional CPU
>cores so you get a really powerful processor for any >workload.
I think to achieve IGP level of performance, using an IGP is the most efficient in terms of power and area.
>>>http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
>>
>>That means nothing. It means that GF is investigating the technology, not that it's production ready.
>
>Quoting the announcement: "...into a joint DEVELOPMENT >agreement targeted toward
>the APPLICATION of T-RAM’s Thyristor-RAM embedded >memory..."
>
>Emphasis mine. Why would a major foundry enter into a >development agreement with
>a startup, unless the technology has already been proven >on a smaller scale?
To *jointly investigate*. Look AMD even licensed ZRAM's first generation crap:
http://www.eetimes.com/electronics-news/4057964/AMD-licenses-Innovative-Silicon-s-SOI-memory
When did it get commercialized? Never...
>Quoting http://www.t-ram.com/news/media/3B.1_IRPS2009_Salling.pdf:
>"Taken together, the results of this study show
>that T-RAM is a reliable and manufacturable memory
>technology."
>
>Quoting t-ram.com: "T-RAM Semiconductor has successfully developed the Thyristor-RAM
>technology from concept to production-readiness. Our Thyristor-RAM technology has
>been successfully implemented on both Bulk and SOI CMOS. "
>
>Sounds like production ready to me.
Not even close.
>>>Anyway, there are multiple high density cache technologies. There's Thyristor-RAM,
>>
>>That's T-RAM.
>
>I know. I was just summing up "high density cache >technologies".
>
>>>1T-RAM,
>>
>>Not a replacement for SRAM.
>
>Why not? Looks useful as L3 cache to me.
>>>2nd gen Z-RAM
>>
>>Doesn't work at all.
>
>Maybe not as cache memory, but it's hopeful as a DRAM replacement: http://www.z-ram.com/en/pdf/Z-RAM_LV_and_bulk_PR_Final_for_press.pdf
Just a moment ago, you were suggesting it as a cache replacement. Now you suddenly are back-tracking? And nobody really wants a proprietary DRAM replacement.
My point is that all this stuff is totally unproven from a high volume perspective. It may be 'feasible', but that doesn't make it 'economical'. And in the case of ZRAM, it wasn't even feasible in the first place. TRAM seems better, but it's very unclear.
>>It's possible, but they will need to become more competitive from an energy perspective with fixed function stuff.
>
>There's not a lot of fixed-function stuff left. The >majority of the GPU's die space
>consist of programmable or generic components.
>
>And I've shown before that the CPUs FLOPS/Watt is in the >same league as GPUs:
>- Core i7-282QM: 150 GFLOPS / 45 Watt (more with Turbo Boost)
>- GeForce GT 420: 134.4 GFLOPS / 50 Watt
The GT 420 is ancient. A better comparison would be the GT 440, which is 96 shaders, 1.6GHz and 65W. That's ~300 GFLOP/s for 65W, or a roughly 2X advantage.
>Obviously software rendering requires a bit more >arithmetic power to implement
>the remaining fixed-function functionality, but >programmable shaders take the bulk.
>
>So there's no lack of energy efficiency. The CPU simply >can't utilize its computing power effectively
GPUs are definitely more power efficient than CPUs.
>During the early days of AC'97 there was some pretty >serious debate about moving
>the audio processing workload to the CPU. It made a real >difference in benchmark
>results. People who back then swore by the efficiency of >dedicated sound cards, now happily use HD Audio.
>>Perhaps when graphics
>>gets to that point, it will be fine to put it in SW.
>
>Exactly. There's no doubt it will happen, some day. My >take is that gather/scatter
>support is sufficient to initiate the move to software >rendering.
I bet HD Audio uses about 1-2% of the overall CPU cycles. Perhaps when graphics gets in that neighborhood - say 5-10% of the CPU cycles...it might be feasible. But I don't think that's really feasible today or in the near future.
>>And yes, power efficiency matters a lot. You may not >>think so, but it does.
>
>I do think it matters, a lot. But I think you're >underestimating how power efficient
>CPUs already are. It just doesn't translate into high >effective performance for
>3D graphics due to wasting a lot of cycles on moving data >elements around.
CPUs are simply not as power efficient as GPUs, because they spend more power on reducing latency of operations. There's pretty hard data to prove it as well. Consider the efficiency for ATI GPUs vs. a CPU.
>An AVX FMA instruction can perform 16 operations every >single cycle, but it would
>take a whopping 72 uops if every address and element was >extracted/inserted sequentially.
>When it comes to load/store, we haven't evolved beyond x87 >yet. Of course this is
>the worst case and typically not every vector load/store >has to be a gather/scatter,
>but for situations where you do need them it makes a >massive difference.
That's because the hardware for scatter/gather is expensive and power hungry.
[snip]
>>I don't think you design CPUs for high volume applications. Most don't need scatter/gather
>>and the hardware cost is high.
>
>All applications that contain loops can benefit from >gather/scatter. That's all applications.
If that's true, then what % performance increase could we expect to see in SPECint?
>With sather/scatter support every scalar operation would >have a parallel equivalent.
>So any loop with independent iterations can be >parallelized and execute up to 8 times faster.
That's assuming there is no control flow divergence.
>And I don't think the hardware cost is that high. All you >need is a bit of logic
>to check which elements are located in the same cache >line, and four byte shift
>units per 128-bit load units instead of one, to collect >the individual elements.
>Note that logic for sequentially accessing the cache lines >is already largely in
>place to support load operations which straddle a cache >line boundary.
You are saying that because you don't design hardware. What you are suggesting is in fact, quite complicated and large.
>>Really? Have you heard of Vertica? They do an awful lot >>of lossless compression of data in memory.
>
>No, I hadn't heard about them before. Could you point me >to some document where
>they detail how they added hardware support for compressed >memory transfers to reduce bandwidth?
They don't need hardware to do lossless compression. They have a clever column oriented database. Check vertica.com. One of their big performance gains is from reducing memory (and disk) bandwidth.
>>Many applications use adjacent values.
>
>Yes, and many applications also use non-adjacent values.
>
>If a loop contains just one load or store at an address >which isn't consecutive,
>it can't be vectorized (unless you want to resort to >serially extracting/inserting
>addresses and values). So even if the majority of values >are adjacent, it doesn't
>take a lot of non-adjacent data to cripple the performance.
You can still vectorize it, you just need to have a bunch of scalar loads/stores to deal with the non-adjacent addresses.
>>>Why? It only accesses the cache lines it needs. If all >elements are from the same
>>>cache line, it's as fast as accessing a single element.
>>
>>And exactly as fast as using AVX! i.e. no improvement >>and more complexity/power.
>
>No. The addresses are unknown at compile time. So the only >>option with AVX1 is
>to sequentially extract each address from the address >vector, and insert the read
>element into the result vector. This takes 18 instructions.
>With gather support it would be just one instruction. >Assuming it gets split into
>two 128-bit gather uops, the maximum throughput is 1 every >cycle and the minimal throughput is 1 every 4 cycles.
>>>But even in the worst case
>>>it can't generate more misses or consume more bandwidth.
>>
>>It sure can. Now instead of having 1-2 TLB accesses per cycle, you get 16. How
>>many TLB copies do you want? How many misses in flight do you want to support?
>
>You're still not getting it. It only accesses one cache >line per cycle. It simply
>has to check which elements are within the same cache >line, and perform a single
>TLB access for all of these elements. Checking whether the >addresses land on the
>same cache line doesn't require full translation of each >address.
That's quite complicated hardware, and you can't afford to have that on the critical path for any of your normal loads. So now you need a fairly separate load/store pipeline for scatter/gather.
>Nothing other than graphics runs better on the IGP. As >I've mentioned before, GPGPU
>is only succesful using high-end hardware.
Today...unclear what tomorrow holds.
>So the CPU is better than the IGP at absolutely everything >else. That makes it
>really tempting to have a closer look at what it would >take to make it adequately efficient at graphics as well.
>
>The answer: gather/scatter.
It would also need a 2X improvement in FLOP/w and /mm2, possibly more.
>Multi-core, 256-bit vectors, Hyper-Threading, software >pipelining... the CPU is
>already a throughput device! It's just being held back by >the lack of parallel load/store
>support. It's the one missing part to let all those GFLOPS >come to full fruition.
You keep on repeating this as if it were true, but it's not. I agree that lack of scatter/gather is an issue. But a more fundamental issue is that throughput optimized cores (e.g. shader arrays) are simply more efficient for compute rich workloads. You can't really get around that.
>What specialized hardware would that be? I've already shown that texture compression
>hardly makes a difference,
No, you cited extremely old data from a simulator, where even the author of the simulator thinks the data is not useful.
>and sampling and filtering is becoming programmable anyway.
>Gather/scatter speeds up just about every other pipeline >stage as well.
Except it doesn't benefit many workloads, and it costs a lot of area and power. So you want to disable it on the many workloads where it does not help.
>>Yes, but you can charge more for the system since it gets >>better battery life.
>
>No you can't, because the competition will sell it for >less and take away your market share.
What is the price delta is only $5?
>>I totally agree that scatter/gather is a great capability to have. But what's
>>the cost in die area, power and complexity? Not just to the core, but also the memory controller, etc.
>
>Larrabee has wider vectors and smaller cores, but features gather/scatter support.
>So I don't think it takes a lot of die space either way. It doesn't require any
>changes to the memory controller, just the load/store units. I'm not entirely sure
>but collecting four elements from a cache line can probably largely make use of
>the existing network to extract one (unaligned) value. And checking which addresses
>land on the same cache line is a very simple equality test >of the upper bits.
I think you have no or minimal experience designing hardware, so I'm not really inclined to take your word for it...especially compared against the expertise of the thousands of CPU designers at places like Intel, AMD and IBM.
Scatter/gather is expensive and that's why it isn't done. Even the LRB implementation was fairly limited compared to some of the older vector machines (which had both temporal and spatial scatter/gather).
David
Topic | Posted By | Date |
---|---|---|
Sandy Bridge CPU article online | David Kanter | 2010/09/26 08:35 PM |
Sandy Bridge CPU article online | Alex | 2010/09/27 04:22 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:06 AM |
Sandy Bridge CPU article online | someone | 2010/09/27 05:03 AM |
Sandy Bridge CPU article online | slacker | 2010/09/27 01:08 PM |
PowerPC is now Power | Paul A. Clayton | 2010/09/27 03:34 PM |
Sandy Bridge CPU article online | Dave | 2010/11/10 09:15 PM |
Sandy Bridge CPU article online | someone | 2010/09/27 05:23 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 05:39 PM |
Optimizing register clear | Paul A. Clayton | 2010/09/28 11:34 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 05:54 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:15 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 10:02 AM |
Sandy Bridge CPU article online | mpx | 2010/09/27 10:44 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 01:37 PM |
Precisely | David Kanter | 2010/09/27 02:22 PM |
Sandy Bridge CPU article online | Richard Cownie | 2010/09/27 07:27 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:01 AM |
Sandy Bridge CPU article online | Richard Cownie | 2010/09/27 09:40 AM |
Sandy Bridge CPU article online | boots | 2010/09/27 10:19 AM |
Right, mid-2011, not 2010. Sorry (NT) | Richard Cownie | 2010/09/27 10:42 AM |
bulldozer single thread performance | Max | 2010/09/27 11:57 AM |
bulldozer single thread performance | Matt Waldhauer | 2011/03/02 10:32 AM |
Sandy Bridge CPU article online | Pun Zu | 2010/09/27 10:32 AM |
Sandy Bridge CPU article online | ? | 2010/09/27 10:44 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 12:11 PM |
My opinion is that anything that would take advantage of 256-bit AVX | redpriest | 2010/09/27 12:17 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/27 02:09 PM |
My opinion is that anything that would take advantage of 256-bit AVX | redpriest | 2010/09/27 03:06 PM |
My opinion is that anything that would take advantage of 256-bit AVX | David Kanter | 2010/09/27 04:23 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 02:57 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:35 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Matt Waldhauer | 2010/09/28 09:58 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/27 05:39 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:14 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Megol | 2010/09/28 01:17 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Michael S | 2010/09/28 04:47 AM |
PGI | Carlie Coats | 2010/09/28 09:23 AM |
gfortran... | Carlie Coats | 2010/09/29 08:33 AM |
My opinion is that anything that would take advantage of 256-bit AVX | mpx | 2010/09/28 11:58 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Michael S | 2010/09/28 12:36 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Foo_ | 2010/09/29 12:08 AM |
My opinion is that anything that would take advantage of 256-bit AVX | mpx | 2010/09/28 10:37 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/28 12:19 PM |
My opinion is that anything that would take advantage of 256-bit AVX | hobold | 2010/09/28 02:08 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:26 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Anthony | 2010/09/28 09:31 PM |
Sandy Bridge CPU article online | Hans de Vries | 2010/09/27 01:19 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 02:19 PM |
Sandy Bridge CPU article online | -Sweeper_ | 2010/09/27 04:50 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 05:41 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/27 01:55 PM |
Sandy Bridge CPU article online | line98 | 2010/09/27 02:05 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 02:20 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/27 02:23 PM |
Sandy Bridge CPU article online | line98 | 2010/09/27 02:42 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 08:33 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 03:04 PM |
Sandy Bridge CPU article online | Jack | 2010/09/27 03:40 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 10:47 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 10:54 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 10:59 PM |
Sandy Bridge CPU article online | JS | 2010/09/28 12:18 AM |
Sandy Bridge CPU article online | Royi | 2010/09/28 12:31 AM |
Sandy Bridge CPU article online | Jack | 2010/09/28 05:34 AM |
Sandy Bridge CPU article online | Royi | 2010/09/28 07:22 AM |
Sandy Bridge CPU article online | Foo_ | 2010/09/28 11:53 AM |
Sandy Bridge CPU article online | Paul | 2010/09/28 12:17 PM |
Sandy Bridge CPU article online | mpx | 2010/09/28 12:22 PM |
Sandy Bridge CPU article online | anonymous | 2010/09/28 01:06 PM |
Sandy Bridge CPU article online | IntelUser2000 | 2010/09/29 12:49 AM |
Sandy Bridge CPU article online | Jack | 2010/09/28 04:08 PM |
Sandy Bridge CPU article online | mpx | 2010/09/29 12:50 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/29 11:01 AM |
Sandy Bridge CPU article online | Royi | 2010/09/29 11:48 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 01:15 PM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/29 01:27 PM |
Sandy Bridge CPU article online | ? | 2010/09/29 10:18 PM |
Sandy Bridge CPU article online | savantu | 2010/09/29 11:28 PM |
Sandy Bridge CPU article online | ? | 2010/09/30 02:43 AM |
Sandy Bridge CPU article online | gallier2 | 2010/09/30 03:18 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 07:38 AM |
Sandy Bridge CPU article online | David Hess | 2010/09/30 09:28 AM |
moderation (again) | hobold | 2010/10/01 04:08 AM |
Sandy Bridge CPU article online | Megol | 2010/09/30 01:13 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 02:47 AM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 07:54 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 09:18 AM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 11:04 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 11:38 AM |
Sandy Bridge CPU article online | Michael S | 2010/09/30 12:02 PM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 07:09 PM |
Sandy Bridge CPU article online | mpx | 2010/09/30 11:40 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 12:00 PM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 07:44 PM |
Sandy Bridge CPU article online | David Hess | 2010/09/30 09:36 AM |
Sandy Bridge CPU article online | someone | 2010/09/30 10:23 AM |
Sandy Bridge CPU article online | mpx | 2010/09/30 12:50 PM |
wii lesson | Michael S | 2010/09/30 01:12 PM |
wii lesson | Dan Downs | 2010/09/30 02:33 PM |
wii lesson | Kevin G | 2010/09/30 11:27 PM |
wii lesson | Rohit | 2010/10/01 06:53 AM |
wii lesson | Kevin G | 2010/10/02 02:30 AM |
wii lesson | mpx | 2010/10/01 08:02 AM |
wii lesson | IntelUser2000 | 2010/10/01 08:31 AM |
GPUs and games | David Kanter | 2010/09/30 07:17 PM |
GPUs and games | hobold | 2010/10/01 04:27 AM |
GPUs and games | anonymous | 2010/10/01 05:35 AM |
GPUs and games | Gabriele Svelto | 2010/10/01 08:07 AM |
GPUs and games | Linus Torvalds | 2010/10/01 09:41 AM |
GPUs and games | Anon | 2010/10/01 10:23 AM |
Can Intel do *this* ??? | Mark Roulo | 2010/10/03 02:17 PM |
Can Intel do *this* ??? | Anon | 2010/10/03 02:29 PM |
Can Intel do *this* ??? | Mark Roulo | 2010/10/03 02:55 PM |
Can Intel do *this* ??? | Anon | 2010/10/03 04:45 PM |
Can Intel do *this* ??? | Ian Ameline | 2010/10/03 09:35 PM |
Graphics, IGPs, and Cache | Joe | 2010/10/10 08:51 AM |
Graphics, IGPs, and Cache | Anon | 2010/10/10 09:18 PM |
Graphics, IGPs, and Cache | Rohit | 2010/10/11 05:14 AM |
Graphics, IGPs, and Cache | hobold | 2010/10/11 05:43 AM |
Maybe the IGPU doesn't load into the L3 | Mark Roulo | 2010/10/11 07:05 AM |
Graphics, IGPs, and Cache | David Kanter | 2010/10/11 08:01 AM |
Can Intel do *this* ??? | Gabriele Svelto | 2010/10/03 11:31 PM |
Kanter's Law. | Ian Ameline | 2010/10/01 01:05 PM |
Kanter's Law. | David Kanter | 2010/10/01 01:18 PM |
Kanter's Law. | Ian Ameline | 2010/10/01 01:33 PM |
Kanter's Law. | Kevin G | 2010/10/01 03:19 PM |
Kanter's Law. | IntelUser2000 | 2010/10/01 09:36 PM |
Kanter's Law. | Kevin G | 2010/10/02 02:15 AM |
Kanter's Law. | IntelUser2000 | 2010/10/02 01:35 PM |
Wii vs pc's | Rohit | 2010/10/01 06:34 PM |
Wii vs pc's | Gabriele Svelto | 2010/10/01 10:54 PM |
GPUs and games | mpx | 2010/10/02 10:30 AM |
GPUs and games | Foo_ | 2010/10/02 03:03 PM |
GPUs and games | mpx | 2010/10/03 10:29 AM |
GPUs and games | Foo_ | 2010/10/03 12:52 PM |
GPUs and games | mpx | 2010/10/03 02:29 PM |
GPUs and games | Anon | 2010/10/03 02:49 PM |
GPUs and games | mpx | 2010/10/04 10:42 AM |
GPUs and games | MS | 2010/10/04 01:51 PM |
GPUs and games | Anon | 2010/10/04 07:29 PM |
persistence of vision | hobold | 2010/10/04 10:47 PM |
GPUs and games | mpx | 2010/10/04 11:51 PM |
GPUs and games | MS | 2010/10/05 05:49 AM |
GPUs and games | Jack | 2010/10/05 10:17 AM |
GPUs and games | MS | 2010/10/05 04:19 PM |
GPUs and games | Jack | 2010/10/05 10:11 AM |
GPUs and games | mpx | 2010/10/05 11:51 AM |
GPUs and games | David Kanter | 2010/10/06 08:04 AM |
GPUs and games | jack | 2010/10/06 08:34 PM |
GPUs and games | Linus Torvalds | 2010/10/05 06:29 AM |
GPUs and games | Foo_ | 2010/10/04 03:49 AM |
GPUs and games | Jeremiah | 2010/10/08 09:58 AM |
GPUs and games | MS | 2010/10/08 12:37 PM |
GPUs and games | Salvatore De Dominicis | 2010/10/04 12:41 AM |
GPUs and games | Kevin G | 2010/10/05 01:13 PM |
GPUs and games | mpx | 2010/10/03 10:36 AM |
GPUs and games | David Kanter | 2010/10/04 06:08 AM |
GPUs and games | Kevin G | 2010/10/04 09:38 AM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 08:19 PM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 11:06 AM |
Sandy Bridge CPU article online | rwessel | 2010/09/30 01:29 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/30 02:06 PM |
Sandy Bridge CPU article online | rwessel | 2010/09/30 05:55 PM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 02:53 AM |
Sandy Bridge CPU article online | rwessel | 2010/10/01 07:30 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 08:31 AM |
Sandy Bridge CPU article online | rwessel | 2010/10/01 09:56 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:28 PM |
Sandy Bridge CPU article online | Ricardo B | 2010/10/02 04:38 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/02 05:59 PM |
which bus more wasteful | Michael S | 2010/10/02 09:38 AM |
which bus more wasteful | rwessel | 2010/10/02 06:15 PM |
Sandy Bridge CPU article online | Ricardo B | 2010/10/01 09:08 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:31 PM |
Sandy Bridge CPU article online | Andi Kleen | 2010/10/01 10:55 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:32 PM |
Sandy Bridge CPU article online | kdg | 2010/10/01 10:26 AM |
Sandy Bridge CPU article online | Anon | 2010/10/01 10:33 AM |
Analog display out? | David Kanter | 2010/10/01 12:05 PM |
Analog display out? | mpx | 2010/10/02 10:46 AM |
Analog display out? | Anon | 2010/10/03 02:26 PM |
Digital is expensive! | David Kanter | 2010/10/03 05:36 PM |
Digital is expensive! | Anon | 2010/10/03 07:07 PM |
Digital is expensive! | David Kanter | 2010/10/03 09:02 PM |
Digital is expensive! | Steve Underwood | 2010/10/04 02:52 AM |
Digital is expensive! | David Kanter | 2010/10/04 06:03 AM |
Digital is expensive! | anonymous | 2010/10/04 06:11 AM |
Digital is not very expensive! | Steve Underwood | 2010/10/04 05:08 PM |
Digital is not very expensive! | Anon | 2010/10/04 07:33 PM |
Digital is not very expensive! | Steve Underwood | 2010/10/04 10:03 PM |
Digital is not very expensive! | mpx | 2010/10/05 12:10 PM |
Digital is not very expensive! | Gabriele Svelto | 2010/10/04 11:24 PM |
Digital is expensive! | jal142 | 2010/10/04 10:46 AM |
Digital is expensive! | mpx | 2010/10/04 12:04 AM |
Digital is expensive! | Gabriele Svelto | 2010/10/04 02:28 AM |
Digital is expensive! | Mark Christiansen | 2010/10/04 02:12 PM |
Analog display out? | slacker | 2010/10/03 05:44 PM |
Analog display out? | Anon | 2010/10/03 07:05 PM |
Analog display out? | Steve Underwood | 2010/10/04 02:48 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:37 PM |
Sandy Bridge CPU article online | slacker | 2010/10/02 01:53 PM |
Sandy Bridge CPU article online | David Hess | 2010/10/02 05:49 PM |
memory bandwith | Max | 2010/09/30 11:19 AM |
memory bandwith | Anon | 2010/10/01 10:28 AM |
memory bandwith | Jack | 2010/10/01 06:45 PM |
memory bandwith | Anon | 2010/10/03 02:19 PM |
Sandy Bridge CPU article online | PiedPiper | 2010/09/30 06:05 PM |
Sandy Bridge CPU article online | Matt Sayler | 2010/09/29 03:38 PM |
Sandy Bridge CPU article online | Jack | 2010/09/29 08:39 PM |
Sandy Bridge CPU article online | mpx | 2010/09/29 11:24 PM |
Sandy Bridge CPU article online | passer | 2010/09/30 02:15 AM |
Sandy Bridge CPU article online | mpx | 2010/09/30 02:47 AM |
Sandy Bridge CPU article online | passer | 2010/09/30 03:25 AM |
SB and web browsing | Rohit | 2010/09/30 05:47 AM |
SB and web browsing | David Hess | 2010/09/30 06:10 AM |
SB and web browsing | MS | 2010/09/30 09:21 AM |
SB and web browsing | passer | 2010/09/30 09:26 AM |
SB and web browsing | MS | 2010/10/02 05:41 PM |
SB and web browsing | Rohit | 2010/10/01 07:02 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/30 07:35 AM |
Sandy Bridge CPU article online | Jack | 2010/09/30 09:40 PM |
processor evolution | hobold | 2010/09/29 01:16 PM |
processor evolution | Foo_ | 2010/09/30 05:10 AM |
processor evolution | Jack | 2010/09/30 06:07 PM |
3D gaming as GPGPU app | hobold | 2010/10/01 03:59 AM |
3D gaming as GPGPU app | Jack | 2010/10/01 06:39 PM |
processor evolution | hobold | 2010/10/01 03:35 AM |
processor evolution | David Kanter | 2010/10/01 09:02 AM |
processor evolution | Anon | 2010/10/01 10:46 AM |
Display | David Kanter | 2010/10/01 12:26 PM |
Display | Rohit | 2010/10/02 01:56 AM |
Display | Linus Torvalds | 2010/10/02 06:40 AM |
Display | rwessel | 2010/10/02 07:58 AM |
Display | sJ | 2010/10/02 09:28 PM |
Display | rwessel | 2010/10/03 07:38 AM |
Display | Anon | 2010/10/03 02:06 PM |
Display tech and compute are different | David Kanter | 2010/10/03 05:33 PM |
Display tech and compute are different | Anon | 2010/10/03 07:16 PM |
Display tech and compute are different | David Kanter | 2010/10/03 09:00 PM |
Display tech and compute are different | hobold | 2010/10/04 12:40 AM |
Display | ? | 2010/10/03 02:02 AM |
Display | Linus Torvalds | 2010/10/03 09:18 AM |
Display | Richard Cownie | 2010/10/03 10:12 AM |
Display | Linus Torvalds | 2010/10/03 11:16 AM |
Display | slacker | 2010/10/03 06:35 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/04 06:06 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/04 10:44 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/04 01:59 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/04 02:13 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/04 07:58 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/05 12:39 AM |
current V12 engines with >6.0 displacement | MS | 2010/10/05 05:57 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/05 12:20 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/05 08:26 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 04:39 AM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 12:22 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/06 02:07 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 02:56 PM |
current V12 engines with >6.0 displacement | rwessel | 2010/10/06 02:30 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 02:53 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/07 12:32 PM |
current V12 engines with >6.0 displacement | rwessel | 2010/10/07 06:54 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 08:02 PM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | slacker | 2010/10/06 06:20 PM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | Ricardo B | 2010/10/07 12:32 AM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | slacker | 2010/10/07 07:15 AM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | Ricardo B | 2010/10/07 09:51 AM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 04:03 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 05:26 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 10:15 PM |
current V12 engines with >6.0 displacement | Howard Chu | 2010/10/07 01:16 PM |
current V12 engines with >6.0 displacement | Anon | 2010/10/05 09:31 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 04:55 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/06 05:15 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 05:34 AM |
I wonder is there any tech area that this forum doesn't have an opinion on (NT) | Rob Thorpe | 2010/10/06 09:11 AM |
Cunieform tablets | David Kanter | 2010/10/06 11:57 AM |
Cunieform tablets | Linus Torvalds | 2010/10/06 12:06 PM |
Ouch...maybe I should hire a new editor (NT) | David Kanter | 2010/10/06 03:38 PM |
Cunieform tablets | rwessel | 2010/10/06 02:41 PM |
Cunieform tablets | seni | 2010/10/07 09:56 AM |
Cunieform tablets | Howard Chu | 2010/10/07 12:44 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/06 05:10 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/06 09:44 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 06:55 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 07:51 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 06:38 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 07:33 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 08:04 PM |
Practical vehicles for commuting | Rob Thorpe | 2010/10/08 04:50 AM |
Practical vehicles for commuting | Gabriele Svelto | 2010/10/08 05:05 AM |
Practical vehicles for commuting | Rob Thorpe | 2010/10/08 05:21 AM |
Practical vehicles for commuting | j | 2010/10/08 01:20 PM |
Practical vehicles for commuting | Rob Thorpe | 2010/12/09 06:00 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/08 09:14 AM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/07 12:23 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/07 03:08 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 04:41 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 07:05 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 07:52 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/08 06:52 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 10:28 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 11:37 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/07 12:37 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/05 01:02 AM |
Display | Linus Torvalds | 2010/10/04 09:39 AM |
Display | Gabriele Svelto | 2010/10/04 11:34 PM |
Display | Richard Cownie | 2010/10/04 05:22 AM |
Display | anon | 2010/10/04 08:22 PM |
Display | Richard Cownie | 2010/10/05 05:42 AM |
Display | mpx | 2010/10/03 10:55 AM |
Display | rcf | 2010/10/03 12:12 PM |
Display | mpx | 2010/10/03 01:36 PM |
Display | rcf | 2010/10/03 04:36 PM |
Display | Ricardo B | 2010/10/04 01:50 PM |
Display | gallier2 | 2010/10/05 02:44 AM |
Display | David Hess | 2010/10/05 04:21 AM |
Display | gallier2 | 2010/10/05 07:21 AM |
Display | David Hess | 2010/10/03 10:21 PM |
Display | rcf | 2010/10/04 07:06 AM |
Display | David Kanter | 2010/10/03 12:54 PM |
Alternative integration | Paul A. Clayton | 2010/10/06 07:51 AM |
Display | slacker | 2010/10/03 06:26 PM |
Display & marketing & analogies | ? | 2010/10/04 01:33 AM |
Display & marketing & analogies | kdg | 2010/10/04 05:00 AM |
Display | Kevin G | 2010/10/02 08:49 AM |
Display | Anon | 2010/10/03 02:43 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/29 02:17 PM |
Sandy Bridge CPU article online | Jack | 2010/09/28 05:27 AM |
Sandy Bridge CPU article online | IntelUser2000 | 2010/09/28 02:07 AM |
Sandy Bridge CPU article online | mpx | 2010/09/28 11:34 AM |
Sandy Bridge CPU article online | Aaron Spink | 2010/09/28 12:28 PM |
Sandy Bridge CPU article online | JoshW | 2010/09/28 01:13 PM |
Sandy Bridge CPU article online | mpx | 2010/09/28 01:54 PM |
Sandy Bridge CPU article online | Foo_ | 2010/09/29 12:19 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 02:06 AM |
Sandy Bridge CPU article online | JS | 2010/09/29 02:42 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 03:03 AM |
Sandy Bridge CPU article online | Foo_ | 2010/09/29 04:55 AM |
Sandy Bridge CPU article online | ajensen | 2010/09/27 11:19 PM |
Sandy Bridge CPU article online | Ian Ollmann | 2010/09/28 03:52 PM |
Sandy Bridge CPU article online | a reader | 2010/09/28 04:05 PM |
Sandy Bridge CPU article online | ajensen | 2010/09/28 10:35 PM |
Updated: Sandy Bridge CPU article | David Kanter | 2010/10/01 04:11 AM |
Updated: Sandy Bridge CPU article | anon | 2011/01/07 08:55 PM |
Updated: Sandy Bridge CPU article | Eric Bron | 2011/01/08 02:29 AM |
Updated: Sandy Bridge CPU article | anon | 2011/01/11 10:24 PM |
Updated: Sandy Bridge CPU article | anon | 2011/01/15 10:21 AM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anon | 2011/01/16 10:22 PM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anonymous | 2011/01/17 01:04 AM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anon | 2011/01/17 06:12 AM |
I can try.... | David Kanter | 2011/01/18 02:54 PM |
I can try.... | anon | 2011/01/18 07:07 PM |
I can try.... | David Kanter | 2011/01/18 10:24 PM |
I can try.... | anon | 2011/01/19 06:51 AM |
Wider fetch than execute makes sense | Paul A. Clayton | 2011/01/19 07:53 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/04 06:29 AM |
Sandy Bridge CPU article online | Seni | 2011/01/04 08:07 PM |
Sandy Bridge CPU article online | hobold | 2011/01/04 10:26 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 01:01 AM |
software assist exceptions | hobold | 2011/01/05 03:36 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 12:58 AM |
Sandy Bridge CPU article online | anon | 2011/01/05 03:51 AM |
Sandy Bridge CPU article online | Seni | 2011/01/05 07:53 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 08:03 AM |
Sandy Bridge CPU article online | anon | 2011/01/05 03:14 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 03:50 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/05 04:00 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 06:26 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/05 06:50 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 07:39 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 02:50 PM |
permuting vector elements | hobold | 2011/01/05 04:03 PM |
permuting vector elements | Nicolas Capens | 2011/01/05 05:01 PM |
permuting vector elements | Nicolas Capens | 2011/01/06 07:27 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/11 10:33 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/11 12:51 PM |
Sandy Bridge CPU article online | hobold | 2011/01/11 01:11 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/11 05:07 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/12 02:25 AM |
Sandy Bridge CPU article online | hobold | 2011/01/12 04:03 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/12 10:27 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/13 01:38 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/13 02:32 AM |
Sandy Bridge CPU article online | hobold | 2011/01/13 12:53 PM |
What happened to VPERMIL2PS? | Michael S | 2011/01/13 02:46 AM |
What happened to VPERMIL2PS? | Eric Bron | 2011/01/13 05:46 AM |
Lower cost permute | Paul A. Clayton | 2011/01/13 11:11 AM |
Sandy Bridge CPU article online | anon | 2011/01/25 05:31 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/12 05:34 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/13 06:38 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/15 08:47 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/16 02:13 AM |
And just to make a further example | Gabriele Svelto | 2011/01/16 03:24 AM |
Sandy Bridge CPU article online | mpx | 2011/01/16 12:27 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/25 01:56 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/25 03:11 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/26 07:49 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/26 03:35 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/27 01:51 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/27 01:40 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/28 02:24 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/28 02:49 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/30 01:11 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/31 02:43 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/01 03:02 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/01 03:28 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/01 03:43 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/28 06:14 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/01 01:58 AM |
Sandy Bridge CPU article online | EduardoS | 2011/02/01 01:36 PM |
Sandy Bridge CPU article online | anon | 2011/02/01 03:56 PM |
Sandy Bridge CPU article online | EduardoS | 2011/02/01 08:17 PM |
Sandy Bridge CPU article online | anon | 2011/02/01 09:13 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/02 03:08 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/02 03:26 AM |
Sandy Bridge CPU article online | kalmaegi | 2011/02/01 08:29 AM |
SW Rasterization | David Kanter | 2011/01/27 04:18 PM |
Lower pin count memory | iz | 2011/01/27 08:19 PM |
Lower pin count memory | David Kanter | 2011/01/27 08:25 PM |
Lower pin count memory | iz | 2011/01/27 10:31 PM |
Lower pin count memory | David Kanter | 2011/01/27 10:52 PM |
Lower pin count memory | iz | 2011/01/27 11:28 PM |
Lower pin count memory | David Kanter | 2011/01/28 12:05 AM |
Lower pin count memory | iz | 2011/01/28 02:55 AM |
Lower pin count memory | David Hess | 2011/01/28 12:15 PM |
Lower pin count memory | David Kanter | 2011/01/28 12:57 PM |
Lower pin count memory | iz | 2011/01/28 04:20 PM |
Two years later | ForgotPants | 2013/10/26 10:33 AM |
Two years later | anon | 2013/10/26 10:36 AM |
Two years later | Exophase | 2013/10/26 11:56 AM |
Two years later | David Hess | 2013/10/26 04:05 PM |
Herz is totally the thing you DON*T care. | Jouni Osmala | 2013/10/27 12:48 AM |
Herz is totally the thing you DON*T care. | EduardoS | 2013/10/27 06:00 AM |
Herz is totally the thing you DON*T care. | Michael S | 2013/10/27 06:45 AM |
Two years later | someone | 2013/10/28 06:21 AM |
Lower pin count memory | Martin Høyer Kristiansen | 2011/01/28 12:41 AM |
Lower pin count memory | iz | 2011/01/28 02:07 AM |
Lower pin count memory | Darrell Coker | 2011/01/27 09:39 PM |
Lower pin count memory | iz | 2011/01/27 11:20 PM |
Lower pin count memory | Darrell Coker | 2011/01/28 05:07 PM |
Lower pin count memory | iz | 2011/01/28 10:57 PM |
Lower pin count memory | Darrell Coker | 2011/01/29 01:21 AM |
Lower pin count memory | iz | 2011/01/31 09:28 PM |
SW Rasterization | Nicolas Capens | 2011/02/02 07:48 AM |
SW Rasterization | Eric Bron | 2011/02/02 08:37 AM |
SW Rasterization | Nicolas Capens | 2011/02/02 03:35 PM |
SW Rasterization | Eric Bron | 2011/02/02 04:11 PM |
SW Rasterization | Eric Bron | 2011/02/03 01:13 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 06:57 AM |
SW Rasterization | Eric Bron | 2011/02/04 07:50 AM |
erratum | Eric Bron | 2011/02/04 07:58 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 04:25 PM |
SW Rasterization | David Kanter | 2011/02/04 04:33 PM |
SW Rasterization | anon | 2011/02/04 05:04 PM |
SW Rasterization | Nicolas Capens | 2011/02/05 02:39 PM |
SW Rasterization | David Kanter | 2011/02/05 04:07 PM |
SW Rasterization | Nicolas Capens | 2011/02/05 10:39 PM |
SW Rasterization | Eric Bron | 2011/02/04 09:55 AM |
Comments pt 1 | David Kanter | 2011/02/02 12:08 PM |
Comments pt 1 | Eric Bron | 2011/02/02 02:16 PM |
Comments pt 1 | Gabriele Svelto | 2011/02/03 12:37 AM |
Comments pt 1 | Eric Bron | 2011/02/03 01:36 AM |
Comments pt 1 | Nicolas Capens | 2011/02/03 10:08 PM |
Comments pt 1 | Nicolas Capens | 2011/02/03 09:26 PM |
Comments pt 1 | Eric Bron | 2011/02/04 02:33 AM |
Comments pt 1 | Nicolas Capens | 2011/02/04 04:24 AM |
example code | Eric Bron | 2011/02/04 03:51 AM |
example code | Nicolas Capens | 2011/02/04 07:24 AM |
example code | Eric Bron | 2011/02/04 07:36 AM |
example code | Nicolas Capens | 2011/02/05 10:43 PM |
Comments pt 1 | Rohit | 2011/02/04 11:43 AM |
Comments pt 1 | Nicolas Capens | 2011/02/04 04:05 PM |
Comments pt 1 | David Kanter | 2011/02/04 04:36 PM |
Comments pt 1 | Nicolas Capens | 2011/02/05 01:45 PM |
Comments pt 1 | Eric Bron | 2011/02/05 03:13 PM |
Comments pt 1 | Nicolas Capens | 2011/02/05 10:52 PM |
Comments pt 1 | Eric Bron | 2011/02/06 12:31 AM |
Comments pt 1 | Nicolas Capens | 2011/02/06 03:06 PM |
Comments pt 1 | Eric Bron | 2011/02/07 02:12 AM |
The need for gather/scatter support | Nicolas Capens | 2011/02/10 09:07 AM |
The need for gather/scatter support | Eric Bron | 2011/02/11 02:11 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/13 02:39 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 06:46 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 06:48 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 08:32 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 09:07 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 08:00 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 06:49 AM |
Gather/scatter performance data | Eric Bron | 2011/02/15 01:23 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 04:06 PM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 06:52 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 08:43 AM |
SW Rasterization - a long way off | Rohit | 2011/02/02 12:17 PM |
SW Rasterization - a long way off | Nicolas Capens | 2011/02/04 02:59 AM |
CPU only rendering - a long way off | Rohit | 2011/02/04 10:52 AM |
CPU only rendering - a long way off | Nicolas Capens | 2011/02/04 06:15 PM |
CPU only rendering - a long way off | Rohit | 2011/02/05 01:00 AM |
CPU only rendering - a long way off | Nicolas Capens | 2011/02/05 08:45 PM |
CPU only rendering - a long way off | David Kanter | 2011/02/06 08:51 PM |
CPU only rendering - a long way off | Gian-Carlo Pascutto | 2011/02/06 11:22 PM |
Encryption | David Kanter | 2011/02/07 12:18 AM |
Encryption | Nicolas Capens | 2011/02/07 06:51 AM |
Encryption | David Kanter | 2011/02/07 10:50 AM |
Encryption | Nicolas Capens | 2011/02/08 09:26 AM |
CPUs are latency optimized | David Kanter | 2011/02/08 10:38 AM |
efficient compiler on an efficient GPU real today. | sJ | 2011/02/08 10:29 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/09 08:49 PM |
CPUs are latency optimized | Eric Bron | 2011/02/09 11:49 PM |
CPUs are latency optimized | Antti-Ville Tuunainen | 2011/02/10 05:16 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/10 06:04 AM |
CPUs are latency optimized | Eric Bron | 2011/02/10 06:48 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/10 12:31 PM |
CPUs are latency optimized | Eric Bron | 2011/02/11 01:43 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/11 06:31 AM |
CPUs are latency optimized | EduardoS | 2011/02/10 04:29 PM |
CPUs are latency optimized | Anon | 2011/02/10 05:40 PM |
CPUs are latency optimized | David Kanter | 2011/02/10 07:33 PM |
CPUs are latency optimized | EduardoS | 2011/02/11 01:18 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/11 04:56 AM |
CPUs are latency optimized | Rohit | 2011/02/11 06:33 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/14 01:19 AM |
CPUs are latency optimized | Eric Bron | 2011/02/14 02:23 AM |
CPUs are latency optimized | EduardoS | 2011/02/14 12:11 PM |
CPUs are latency optimized | David Kanter | 2011/02/11 01:45 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/15 04:22 AM |
CPUs are latency optimized | David Kanter | 2011/02/15 11:47 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/15 06:10 PM |
Have fun | David Kanter | 2011/02/15 09:04 PM |
Have fun | Nicolas Capens | 2011/02/17 02:59 AM |
Have fun | Brett | 2011/02/17 11:56 AM |
Have fun | Nicolas Capens | 2011/02/19 03:53 PM |
Have fun | Brett | 2011/02/20 05:08 PM |
Have fun | Brett | 2011/02/20 06:13 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/02/23 04:37 PM |
On-die storage to fight Amdahl | Brett | 2011/02/23 08:59 PM |
On-die storage to fight Amdahl | Brett | 2011/02/23 09:08 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/02/24 06:42 PM |
On-die storage to fight Amdahl | Rohit | 2011/02/25 10:02 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/03/09 05:53 PM |
On-die storage to fight Amdahl | Rohit | 2011/03/10 07:02 AM |
NVIDIA using tile based rendering? | Nathan Monson | 2011/03/11 06:58 PM |
NVIDIA using tile based rendering? | Rohit | 2011/03/12 03:29 AM |
NVIDIA using tile based rendering? | Nathan Monson | 2011/03/12 10:05 AM |
NVIDIA using tile based rendering? | Rohit | 2011/03/12 10:16 AM |
On-die storage to fight Amdahl | Brett | 2011/02/26 01:10 AM |
On-die storage to fight Amdahl | Nathan Monson | 2011/02/26 12:51 PM |
On-die storage to fight Amdahl | Brett | 2011/02/26 03:40 PM |
Convergence is inevitable | Nicolas Capens | 2011/03/09 07:22 PM |
Convergence is inevitable | Brett | 2011/03/09 09:59 PM |
Convergence is inevitable | Antti-Ville Tuunainen | 2011/03/10 02:34 PM |
Convergence is inevitable | Brett | 2011/03/10 08:39 PM |
Procedural texturing? | David Kanter | 2011/03/11 12:32 AM |
Procedural texturing? | hobold | 2011/03/11 02:59 AM |
Procedural texturing? | Dan Downs | 2011/03/11 08:28 AM |
Procedural texturing? | Mark Roulo | 2011/03/11 01:58 PM |
Procedural texturing? | Anon | 2011/03/11 05:11 PM |
Procedural texturing? | Nathan Monson | 2011/03/11 06:30 PM |
Procedural texturing? | Brett | 2011/03/15 06:45 AM |
Procedural texturing? | Seni | 2011/03/15 09:13 AM |
Procedural texturing? | Brett | 2011/03/15 10:45 AM |
Procedural texturing? | Seni | 2011/03/15 01:09 PM |
Procedural texturing? | Brett | 2011/03/11 09:02 PM |
Procedural texturing? | Brett | 2011/03/11 08:34 PM |
Procedural texturing? | Eric Bron | 2011/03/12 02:37 AM |
Convergence is inevitable | Jouni Osmala | 2011/03/09 10:28 PM |
Convergence is inevitable | Brett | 2011/04/05 04:08 PM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 04:23 AM |
Convergence is inevitable | none | 2011/04/07 06:03 AM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 09:34 AM |
Convergence is inevitable | anon | 2011/04/07 01:15 PM |
Convergence is inevitable | none | 2011/04/08 12:57 AM |
Convergence is inevitable | Brett | 2011/04/07 07:04 PM |
Convergence is inevitable | none | 2011/04/08 01:14 AM |
Gather implementation | David Kanter | 2011/04/08 11:01 AM |
RAM Latency | David Hess | 2011/04/07 07:22 AM |
RAM Latency | Brett | 2011/04/07 06:20 PM |
RAM Latency | Nicolas Capens | 2011/04/07 09:18 PM |
RAM Latency | Brett | 2011/04/08 04:33 AM |
RAM Latency | Nicolas Capens | 2011/04/10 01:23 PM |
RAM Latency | Rohit | 2011/04/08 05:57 AM |
RAM Latency | Nicolas Capens | 2011/04/10 12:23 PM |
RAM Latency | David Kanter | 2011/04/10 01:27 PM |
RAM Latency | Rohit | 2011/04/11 05:17 AM |
Convergence is inevitable | Eric Bron | 2011/04/07 08:46 AM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 08:50 PM |
Convergence is inevitable | Eric Bron | 2011/04/07 11:39 PM |
Flaws in PowerVR | Rohit | 2011/02/25 10:21 PM |
Flaws in PowerVR | Brett | 2011/02/25 11:37 PM |
Flaws in PowerVR | Paul | 2011/02/26 04:17 AM |
Have fun | David Kanter | 2011/02/18 11:52 AM |
Have fun | Michael S | 2011/02/19 11:12 AM |
Have fun | David Kanter | 2011/02/19 02:26 PM |
Have fun | Michael S | 2011/02/19 03:43 PM |
Have fun | anon | 2011/02/19 04:02 PM |
Have fun | Michael S | 2011/02/19 04:56 PM |
Have fun | anon | 2011/02/20 02:50 PM |
Have fun | EduardoS | 2011/02/20 01:44 PM |
Linear vs non-linear | EduardoS | 2011/02/20 01:55 PM |
Have fun | Michael S | 2011/02/20 03:19 PM |
Have fun | EduardoS | 2011/02/20 04:51 PM |
Have fun | Nicolas Capens | 2011/02/21 10:12 AM |
Have fun | Michael S | 2011/02/21 11:38 AM |
Have fun | Eric Bron | 2011/02/21 01:10 PM |
Have fun | Eric Bron | 2011/02/21 01:39 PM |
Have fun | Michael S | 2011/02/21 05:13 PM |
Have fun | Eric Bron | 2011/02/21 11:43 PM |
Have fun | Michael S | 2011/02/22 12:47 AM |
Have fun | Eric Bron | 2011/02/22 01:10 AM |
Have fun | Michael S | 2011/02/22 10:37 AM |
Have fun | anon | 2011/02/22 12:38 PM |
Have fun | EduardoS | 2011/02/22 02:49 PM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/23 05:37 PM |
Gather/scatter efficiency | anonymous | 2011/02/23 05:51 PM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/24 05:57 PM |
Gather/scatter efficiency | anonymous | 2011/02/24 06:16 PM |
Gather/scatter efficiency | Michael S | 2011/02/25 06:45 AM |
Gather implementation | David Kanter | 2011/02/25 04:34 PM |
Gather implementation | Michael S | 2011/02/26 09:40 AM |
Gather implementation | anon | 2011/02/26 10:52 AM |
Gather implementation | Michael S | 2011/02/26 11:16 AM |
Gather implementation | anon | 2011/02/26 10:22 PM |
Gather implementation | Michael S | 2011/02/27 06:23 AM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/28 02:14 PM |
Consider yourself ignored | David Kanter | 2011/02/22 12:05 AM |
one more anti-FMA flame. By me. | Michael S | 2011/02/16 06:40 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/16 07:30 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/16 08:15 AM |
one more anti-FMA flame. By me. | Nicolas Capens | 2011/02/17 05:27 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/17 06:42 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/17 04:46 PM |
Tarantula paper | Paul A. Clayton | 2011/02/17 11:38 PM |
Tarantula paper | Nicolas Capens | 2011/02/19 04:19 PM |
anti-FMA != anti-throughput or anti-SG | Eric Bron | 2011/02/18 12:48 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/20 02:46 PM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/20 04:00 PM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 03:05 AM |
Software pipelining on x86 | David Kanter | 2011/02/23 04:04 AM |
Software pipelining on x86 | JS | 2011/02/23 04:25 AM |
Software pipelining on x86 | Salvatore De Dominicis | 2011/02/23 07:37 AM |
Software pipelining on x86 | Jouni Osmala | 2011/02/23 08:10 AM |
Software pipelining on x86 | LeeMiller | 2011/02/23 09:07 PM |
Software pipelining on x86 | Nicolas Capens | 2011/02/24 02:17 PM |
Software pipelining on x86 | anonymous | 2011/02/24 06:04 PM |
Software pipelining on x86 | Nicolas Capens | 2011/02/28 08:27 AM |
Software pipelining on x86 | Antti-Ville Tuunainen | 2011/03/02 03:31 AM |
Software pipelining on x86 | Megol | 2011/03/02 11:55 AM |
Software pipelining on x86 | Geert Bosch | 2011/03/03 06:58 AM |
FMA benefits and latency predictions | David Kanter | 2011/02/25 04:14 PM |
FMA benefits and latency predictions | Antti-Ville Tuunainen | 2011/02/26 09:43 AM |
FMA benefits and latency predictions | Matt Waldhauer | 2011/02/27 05:42 AM |
FMA benefits and latency predictions | Nicolas Capens | 2011/03/09 05:11 PM |
FMA benefits and latency predictions | Rohit | 2011/03/10 07:11 AM |
FMA benefits and latency predictions | Eric Bron | 2011/03/10 08:30 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/23 04:19 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 06:50 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/23 09:37 AM |
FMA and beyond | Nicolas Capens | 2011/02/24 03:47 PM |
detour on terminology | hobold | 2011/02/24 06:08 PM |
detour on terminology | Nicolas Capens | 2011/02/28 01:24 PM |
detour on terminology | Eric Bron | 2011/03/01 01:38 AM |
detour on terminology | Michael S | 2011/03/01 04:03 AM |
detour on terminology | Eric Bron | 2011/03/01 04:39 AM |
detour on terminology | Michael S | 2011/03/01 07:33 AM |
detour on terminology | Eric Bron | 2011/03/01 08:34 AM |
erratum | Eric Bron | 2011/03/01 08:54 AM |
detour on terminology | Nicolas Capens | 2011/03/10 07:39 AM |
detour on terminology | Eric Bron | 2011/03/10 08:50 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 05:12 AM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 10:25 PM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/17 05:51 PM |
Tarantula vector unit well-integrated | Paul A. Clayton | 2011/02/17 11:38 PM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/19 01:17 PM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 01:09 AM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/20 08:55 AM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 12:39 PM |
anti-FMA != anti-throughput or anti-SG | EduardoS | 2011/02/20 01:35 PM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/21 07:12 AM |
anti-FMA != anti-throughput or anti-SG | anon | 2011/02/17 09:44 PM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/18 05:20 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/17 07:24 AM |
thanks | Michael S | 2011/02/17 03:56 PM |
CPUs are latency optimized | EduardoS | 2011/02/15 12:24 PM |
SwiftShader SNB test | Eric Bron | 2011/02/15 02:46 PM |
SwiftShader NHM test | Eric Bron | 2011/02/15 03:50 PM |
SwiftShader SNB test | Nicolas Capens | 2011/02/16 11:06 PM |
SwiftShader SNB test | Eric Bron | 2011/02/17 12:21 AM |
SwiftShader SNB test | Eric Bron | 2011/02/22 09:32 AM |
SwiftShader SNB test 2nd run | Eric Bron | 2011/02/22 09:51 AM |
SwiftShader SNB test 2nd run | Nicolas Capens | 2011/02/23 01:14 PM |
SwiftShader SNB test 2nd run | Eric Bron | 2011/02/23 01:42 PM |
Win7SP1 out but no AVX hype? | Michael S | 2011/02/24 02:14 AM |
Win7SP1 out but no AVX hype? | Eric Bron | 2011/02/24 02:39 AM |
CPUs are latency optimized | Eric Bron | 2011/02/15 07:02 AM |
CPUs are latency optimized | EduardoS | 2011/02/11 02:40 PM |
CPU only rendering - not a long way off | Nicolas Capens | 2011/02/07 05:45 AM |
CPU only rendering - not a long way off | David Kanter | 2011/02/07 11:09 AM |
CPU only rendering - not a long way off | anonymous | 2011/02/07 09:25 PM |
Sandy Bridge IGP EUs | David Kanter | 2011/02/07 10:22 PM |
Sandy Bridge IGP EUs | Hannes | 2011/02/08 04:59 AM |
SW Rasterization - Why? | Seni | 2011/02/02 01:53 PM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/10 02:12 PM |
Market reasons to ditch the IGP | Seni | 2011/02/11 04:42 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/16 03:29 AM |
Market reasons to ditch the IGP | Seni | 2011/02/16 12:39 PM |
An excellent post! | David Kanter | 2011/02/16 02:18 PM |
CPUs clock higher | Moritz | 2011/02/17 07:06 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/18 05:22 PM |
Market reasons to ditch the IGP | IntelUser2000 | 2011/02/18 06:20 PM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/21 01:42 PM |
Bad data (repeated) | David Kanter | 2011/02/21 11:21 PM |
Bad data (repeated) | none | 2011/02/22 02:04 AM |
13W or 8W? | Foo_ | 2011/02/22 05:00 AM |
13W or 8W? | Linus Torvalds | 2011/02/22 07:58 AM |
13W or 8W? | David Kanter | 2011/02/22 10:33 AM |
13W or 8W? | Mark Christiansen | 2011/02/22 01:47 PM |
Bigger picture | Nicolas Capens | 2011/02/24 05:33 PM |
Bigger picture | Nicolas Capens | 2011/02/24 07:06 PM |
20+ Watt | Nicolas Capens | 2011/02/24 07:18 PM |
<20W | David Kanter | 2011/02/25 12:13 PM |
>20W | Nicolas Capens | 2011/03/08 06:34 PM |
IGP is 3X more efficient | David Kanter | 2011/03/08 09:53 PM |
IGP is 3X more efficient | Eric Bron | 2011/03/09 01:44 AM |
>20W | Eric Bron | 2011/03/09 02:48 AM |
Specious data and claims are still specious | David Kanter | 2011/02/25 01:38 AM |
IGP power consumption, LRB samplers | Nicolas Capens | 2011/03/08 05:24 PM |
IGP power consumption, LRB samplers | EduardoS | 2011/03/08 05:52 PM |
IGP power consumption, LRB samplers | Rohit | 2011/03/09 06:42 AM |
Market reasons to ditch the IGP | none | 2011/02/22 01:58 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/24 05:43 PM |
Market reasons to ditch the IGP | slacker | 2011/02/22 01:32 PM |
Market reasons to ditch the IGP | Seni | 2011/02/18 08:51 PM |
Correction - 28 comparators, not 36. (NT) | Seni | 2011/02/18 09:03 PM |
Market reasons to ditch the IGP | Gabriele Svelto | 2011/02/19 12:49 AM |
Market reasons to ditch the IGP | Seni | 2011/02/19 10:59 AM |
Market reasons to ditch the IGP | Exophase | 2011/02/20 09:43 AM |
Market reasons to ditch the IGP | EduardoS | 2011/02/19 09:13 AM |
Market reasons to ditch the IGP | Seni | 2011/02/19 10:46 AM |
The next revolution | Nicolas Capens | 2011/02/22 02:33 AM |
The next revolution | Gabriele Svelto | 2011/02/22 08:15 AM |
The next revolution | Eric Bron | 2011/02/22 08:48 AM |
The next revolution | Nicolas Capens | 2011/02/23 06:39 PM |
The next revolution | Gabriele Svelto | 2011/02/23 11:43 PM |
GPGPU content creation (or lack of it) | Nicolas Capens | 2011/02/28 06:39 AM |
GPGPU content creation (or lack of it) | The market begs to differ | 2011/03/01 05:32 AM |
GPGPU content creation (or lack of it) | Nicolas Capens | 2011/03/09 08:14 PM |
GPGPU content creation (or lack of it) | Gabriele Svelto | 2011/03/10 12:01 AM |
The market begs to differ | Gabriele Svelto | 2011/03/01 05:33 AM |
The next revolution | Anon | 2011/02/24 01:15 AM |
The next revolution | Nicolas Capens | 2011/02/28 01:34 PM |
The next revolution | Seni | 2011/02/22 01:02 PM |
The next revolution | Gabriele Svelto | 2011/02/23 05:27 AM |
The next revolution | Seni | 2011/02/23 08:03 AM |
The next revolution | Nicolas Capens | 2011/02/24 05:11 AM |
The next revolution | Seni | 2011/02/24 07:45 PM |
IGP sampler count | Nicolas Capens | 2011/03/03 04:19 AM |
Latency and throughput optimized cores | Nicolas Capens | 2011/03/07 02:28 PM |
The real reason no IGP /CPU converge. | Jouni Osmala | 2011/03/07 10:34 PM |
Still converging | Nicolas Capens | 2011/03/13 02:08 PM |
Homogeneous CPU advantages | Nicolas Capens | 2011/03/07 11:12 PM |
Homogeneous CPU advantages | Seni | 2011/03/08 08:23 AM |
Homogeneous CPU advantages | David Kanter | 2011/03/08 10:16 AM |
Homogeneous CPU advantages | Brett | 2011/03/09 02:37 AM |
Homogeneous CPU advantages | Jouni Osmala | 2011/03/08 11:27 PM |
SW Rasterization | firsttimeposter | 2011/02/03 10:18 PM |
SW Rasterization | Nicolas Capens | 2011/02/04 03:48 AM |
SW Rasterization | Eric Bron | 2011/02/04 04:14 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 07:36 AM |
SW Rasterization | Eric Bron | 2011/02/04 07:42 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/26 02:23 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/02/04 03:31 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/05 07:46 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/02/06 05:20 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/06 05:07 PM |
Sandy Bridge CPU article online | arch.comp | 2011/01/06 09:58 PM |
Sandy Bridge CPU article online | Seni | 2011/01/07 09:25 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 03:28 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 05:06 AM |
permuting vector elements (yet again) | hobold | 2011/01/05 04:15 PM |
permuting vector elements (yet again) | Nicolas Capens | 2011/01/06 05:11 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/05 11:46 AM |
wow ...! | hobold | 2011/01/05 04:19 PM |
wow ...! | Nicolas Capens | 2011/01/05 05:11 PM |
wow ...! | Eric Bron | 2011/01/05 09:46 PM |
compress LUT | Eric Bron | 2011/01/05 10:05 PM |
wow ...! | Michael S | 2011/01/06 01:25 AM |
wow ...! | Nicolas Capens | 2011/01/06 05:26 AM |
wow ...! | Eric Bron | 2011/01/06 08:08 AM |
wow ...! | Nicolas Capens | 2011/01/07 06:19 AM |
wow ...! | Steve Underwood | 2011/01/07 09:53 PM |
saturation | hobold | 2011/01/08 09:25 AM |
saturation | Steve Underwood | 2011/01/08 11:38 AM |
saturation | Michael S | 2011/01/08 12:05 PM |
128 bit floats | Brett | 2011/01/08 12:39 PM |
128 bit floats | Michael S | 2011/01/08 01:10 PM |
128 bit floats | Anil Maliyekkel | 2011/01/08 02:46 PM |
128 bit floats | Kevin G | 2011/02/27 10:15 AM |
128 bit floats | hobold | 2011/02/27 03:42 PM |
128 bit floats | Ian Ollmann | 2011/02/28 03:56 PM |
OpenCL FP accuracy | hobold | 2011/03/01 05:45 AM |
OpenCL FP accuracy | anon | 2011/03/01 07:03 PM |
OpenCL FP accuracy | hobold | 2011/03/02 02:53 AM |
OpenCL FP accuracy | Eric Bron | 2011/03/02 06:10 AM |
pet project | hobold | 2011/03/02 08:22 AM |
pet project | Anon | 2011/03/02 08:10 PM |
pet project | hobold | 2011/03/03 03:57 AM |
pet project | Eric Bron | 2011/03/03 01:29 AM |
pet project | hobold | 2011/03/03 04:14 AM |
pet project | Eric Bron | 2011/03/03 02:10 PM |
pet project | hobold | 2011/03/03 03:04 PM |
OpenCL and AMD | Vincent Diepeveen | 2011/03/07 12:44 PM |
OpenCL and AMD | Eric Bron | 2011/03/08 01:05 AM |
OpenCL and AMD | Vincent Diepeveen | 2011/03/08 07:27 AM |
128 bit floats | Michael S | 2011/02/27 03:46 PM |
128 bit floats | Anil Maliyekkel | 2011/02/27 05:14 PM |
saturation | Steve Underwood | 2011/01/17 03:42 AM |
wow ...! | hobold | 2011/01/06 04:05 PM |
Ring | Moritz | 2011/01/20 09:51 PM |
Ring | Antti-Ville Tuunainen | 2011/01/21 11:25 AM |
Ring | Moritz | 2011/01/23 12:38 AM |
Ring | Michael S | 2011/01/23 03:04 AM |
So fast | Moritz | 2011/01/23 06:57 AM |
So fast | David Kanter | 2011/01/23 09:05 AM |
Sandy Bridge CPU (L1D cache) | Gordon Ward | 2011/09/09 01:47 AM |
Sandy Bridge CPU (L1D cache) | David Kanter | 2011/09/09 03:19 PM |
Sandy Bridge CPU (L1D cache) | EduardoS | 2011/09/09 07:53 PM |
Sandy Bridge CPU (L1D cache) | Paul A. Clayton | 2011/09/10 04:12 AM |
Sandy Bridge CPU (L1D cache) | Michael S | 2011/09/10 08:41 AM |
Sandy Bridge CPU (L1D cache) | EduardoS | 2011/09/10 10:17 AM |
Address Ports on Sandy Bridge Scheduler | Victor | 2011/10/16 05:40 AM |
Address Ports on Sandy Bridge Scheduler | EduardoS | 2011/10/16 06:45 PM |
Address Ports on Sandy Bridge Scheduler | Megol | 2011/10/17 08:20 AM |
Address Ports on Sandy Bridge Scheduler | Victor | 2011/10/18 04:34 PM |
Benefits of early scheduling | Paul A. Clayton | 2011/10/18 05:53 PM |
Benefits of early scheduling | Victor | 2011/10/19 04:58 PM |
Consistency and invalidation ordering | Paul A. Clayton | 2011/10/20 03:43 AM |
Address Ports on Sandy Bridge Scheduler | John Upcroft | 2011/10/21 03:16 PM |
Address Ports on Sandy Bridge Scheduler | David Kanter | 2011/10/22 09:49 AM |
Address Ports on Sandy Bridge Scheduler | John Upcroft | 2011/10/26 12:24 PM |
Store TLB look-up at commit? | Paul A. Clayton | 2011/10/26 07:30 PM |
Store TLB look-up at commit? | Richard Scott | 2011/10/26 08:40 PM |
Just a guess | Paul A. Clayton | 2011/10/27 12:54 PM |