By: Nicolas Capens (nicolas.capens.delete@this.gmail.com), January 12, 2011 6:34 pm
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto@gmail.com) on 1/11/11 wrote:
---------------------------
>Nicolas Capens (nicolas.capens@gmail.com) on 1/5/11 wrote:
>---------------------------
>>Do you happen to have any sources which explain just how complex it is? I'm sure
>>it's not trivial, but given that LRBni already has many other features which I'd
>>consider relatively complex, I'd be surprised if gather/scatter took a disproportionately
>>large area. Intel was able to fit 32 of these feature rich cores with 512-bit vectors
>>onto a chip roughly double the transistor count of Sandy Bridge...
>
>Unfortunately Intel didn't disclose details on LRB die AFAIK so we don't know what
>was the weight of every single feature. Yet the fact that LRBni doesn't include
>a full-vector permute tells you that doing a crossbar of that size is prohibitive.
Somehow it does have full-vector gather/scatter support, so it's not prohibitive if it's valuable enough to add. Expensive, definitely, but not prohibitive. And as I pointed out earlier, with two 128-bit load units you'd only need two 128-bit permute crossbars so it should be very feasible.
>>What I meant is that the two 128-bit load units could each have their own 4x32-bit
>>crossbar. It means they sometimes load the same cache line, but that's ok. I assume
>>that's what already happens anyway with vmovaps, and it considerably simplifies
>>the crossbar. So while Larrabee requires one massive 16x32 crossbar an architecture
>>based on Sandy Bridge would have a near optimal gather/scatter implementation with two simpler 4x32 crossbars.
>
>Yeah, that sounds feasible to me.
See? ;-)
>>At the same or higher image quality level. On a Core i7 965 SwiftShader scores
>>620 3DMark06 points. That's more than a GMA X3100.
>
>It is nice to hear that 3DMark06 can be run in software but an i7 965 is a pretty
>high end processor and beating a an old and lowly GMA X3100 doesn't tell me much.
>At 1280x1024 decent IGPs are already scoring in the thousands and will have vastly
>better performance/W, performance/$ and performance/area than an i7 965 doing software rendering.
For the record, I have an i7 920 clocked at 965 speeds. As you certainly know, CPUs get ridiculously more expensive for a small increase in performance. But it also works the other way around! You don't have to lose a lot of performance when you opt for a cheaper CPU. Plus, any system needs a CPU anyway, so you got to take that into account when comparing CPUs against GPUs. The latter still needs a CPU or it's worthless...
This weekend I had the opportunity to run SwiftShader on an i7 2600. It scored 820 3DMark06 points at stock speed. So that's 32% faster than my overclocked 920! What's more, the 2600 is considered a mainstream CPU and it's price is expected to drop fast once AMD launches Bulldozer. And it comes with a ridiculously small heatsink. Note that the GMA X4500, still sold in massive numbers, only scores about 950 3DMark06 points.
So while I won't deny there's still a gap in performance and power consumption, it's getting smaller every generation!
But that's not all. Sandy Bridge wastes die space on a GPU. It could have had two extra cores instead, exceeding the X4500's performance. Furthermore, SwiftShader doesn't take advantage of AVX yet. And with gather/scatter support lots of graphics operations would become a whole lote more efficient.
So it's already well within reach to have a GPU-less system and still have ample 3D graphics performance for the same market that APUs target. Last but not least, with extra generic CPU cores developers can create other diverse and demanding applications.
>> (...)
>You could also say that AMD managed to create an architecture which matches NVIDIA's
>with vastly lower die area and transistor count. The differences between their hardware
>boil down to the fact that AMD is using VLIW shader processors which by design will
>have lower achievable throughput but being simpler much higher compute density.
Actually I believe AMD's GPUs have a higher transistor count, on a smaller die. This is probably due to NVIDIA's architecture using a higher clocked shader domain, requiring faster and thus larger transistors. Furthermore, depending on the exact chips you're comparing, NVIDIA has higher double-precision performance. Together with the new Fermi capabilities this means NVIDIA is way ahead of AMD in the HPC market. It's obvious this comes at a bit of an extra die size cost.
But clearly that difference is insignificant in comparison to what I pointed out: It achieves the same actual performance with half the GFLOPS! The importance of this is that efficiency is critical, even for graphics, which was previously regarded as "ridiculously parallel". And since there are lots of GPGPU applications which don't even achieve 10% of the theoretical performance of the GPU, while they use the CPU to it's fullest, it's clear that CPUs are really efficient at juggling tasks around and keeping the data flowing.
So how AMD and NVIDIA compare today is pretty irrelevant to the discussion. What matters in the long term is that applications, including graphics applications, are getting more complex and a CPU architecture is more suited for this.
Note that software rendering frees game engine developers from the graphics API restrictions, which in turn results in higher performance in practice: http://graphics.cs.williams.edu/archive/SweeneyHPG2009/TimHPG2009.pdf
>>GPUs need ever larger caches, reduced branch granularities, ALUs with more generic
>>operations, support for call stacks, etc. Mark my words, one day they'll need speculative
>>execution just to prevent the register file from growing out of proportion.
>
>Speculative execution will not cause the register files to shrink, it will cause
>them to grow. The larger the instruction window the more physical registers you
>need to store results of not yet retired instructions.
Retirement buffers are pretty small. But instead of speculative execution, GPUs opt for massive simultaneous multi-threading, which means they need register space for all these extra threads. Shaders which use more registers than what the hardware was designed for, can really decimate performance. Developers also want a true call stack, so you need massive caches to store all this context.
This situation is not sustainable. Some really drastic measures need to be taken to reduce the thread count. But you don't need to look any further than CPU architectures. Speculative execution, branch prediction, forwarding, register renaming, etc. can come to the rescue. But when GPUs sacrifice computing density for higher efficiency at complex generic tasks, it's obviously also a really interesting option to just start with a multi-core CPU architecture and give it powerful graphics capabilities by adding gather/scatter.
>>Likely
>>before that we'll also have programmable texture filtering, which means the texture
>>units become more like generic gather/scatter units and the actual filtering is
>>done at full FP32 precision in the shader cores (AMD already does the texel address
>>calculations in the shader).
>
>Texture address calculation has always done partly in the shader core and partly
>in the TU since R6xx IIRC, and technically speaking you could consider the TU a
>coprocessor of the shader processor being completely tied to it. Add to this that
>both nVidia and AMD continuously improved the filtering speed of their TU as well
>as adding specialized functionality (like Gather4) to accelerate in hardware common
>parts of custom filters. Besides TUs also implement on-the-fly decompression of
>compressed textures, not just addressing and they often have specialized caches
>(very granular, often with fully associative addressing). That is not something
>you can easily merge with a more generic architecture. Doing so would certainly
>yield lower throughput, and higher power at the same performance level.
While texture sampling is a highly specialized operation (it's the only thing Larrabee really has dedicated hardware for), the general trend is still to generalize them into load/store (gather/scatter) operations. There are several reasons for this:
Shaders need ever fewer texture samples (TEX:ALU ratio). So the amount of die area use on texture samplers has steadily decreased. But this means that TEX heavy shaders become bottlenecked, and for ALU heavy shaders they are underutilized...
But while the TEX:ALU ratio decreases, the shaders do make more unfiltered memory accesses. So modern GPUs also have specialized access to local memory, shared memory, global memory, etc. Each of these can again be a bottleneck!
Also, filtering is useless for GPGPU, while graphics on the other hand wants full FP32 filtering (possibly even FP64). These diverging needs are hard to combine into texture units or other highly specialized load/store units.
So it becomes very tempting to just slap all these different forms of memory access together, have generic load/store units, and let the cache hierarchy take care of local and temporal coherence.
If only CPUs had gather/scatter support, SwiftShader would have no real bottlenecks, no matter what you throw at it. The CPU might not be the be the most efficient piece of hardware for any particular task, but the fact that in practice GPUs are always bottlenecked by something means you can still get high average efficiency.
As for compressed textures, they're not as critical as they may seem. Due to licensing issues, popular S3TC based formats are still not part of core OpenGL. Loads of applications don't bother using them. And low-end and mid-end hardware have higher bandwith:computing ratios so it's not that helpful. Ironically people with high-end hardware tend not to like the quality loss of texture compression so they sometimes disable it. The general trent is toward higher quality. Where compression does help significantly is to fit everything in VRAM. But for software rendering that's far less of an issue since you typically have about four times more system RAM. Note also that an i7 2600 has lower memory bandwidth than an i7 920 yet it's significantly faster so software rendering isn't bandwidth limited.
>>The only reason we haven't seen a sign of Larrabee as a GPU yet, is because it's
>>too soon, and because it approaches the market from the wrong end. It's too soon
>>because despite the enormous opportunities a generic architecture like that offers,
>>developers aren't just going to invest in developing applications for it if it has
>>too little market share. And it has too little market share because Intel tried
>>to sell it as a high-end GPU. It's an architecture with huge potential but it's
>>facing a massive chicken-and-egg problem. We first need the rest of the GPU landscape
>>to evolve in the same direction before developers will take the plunge. There's
>>simply no strong software ecosystem yet for things that go beyond the classic graphics
>>APIs. OpenCL and such are cautious first steps but it will take many more years
>>before you can write code in the language of your liking and have it run on either
>>a CPU or a GPU. It takes more steps toward full convergence for this to become a reality.
>
>I agree with this POV but I don't think that software rendering will ever be viable,
>even in the low end, but more on this later.
>
>>Software rendering for the low-end market on the other hand is viable today. There's
>>bound to be people who rather have a more powerful CPU instead of an on-die GPU.
>>And these powerful CPUs are attractive to developers because they can safely extend
>>the existing software ecosystem without big risky investments. This software further
>>increases the demand for powerful CPUs, and these will be capable of more than just low-end graphics...
>
>I don't think so. Basing ourselves on your 3DMark06 number means that most of the
>casual games available lately on services like Steam wouldn't be able to run at
>playable frame-rates even at the lowest setting and resolutions. And that would
>be on high-end, quad cores, what would this mean for people with CULV laptops?
Why not try it instead of "basing" yourself on the 3DMark06 score? I have yet to find a game in Steam's casual section which doesn't run fine with SwiftShader on a quad-core CPU. Some even run smoothly with anti-aliasing. 3DMark06 is a synthetic stress test designed to benchmark the hardware at the time of release, and still offer a challenge to future graphics solutions. Even today's casual games are nowhere near as complex as 3DMark06. And that's not likely to change fast. Causal games are designed to run on the lowest common denominator, which includes older IGP's like the X3100.
For your information, SwiftShader will be used in future versions of Flash to enable 3D casual games on the web, when you don't have adequate graphics hardware: http://blogs.adobe.com/flashplatform/2011/01/digging-more-into-the-molehill-apis.html
But that's one outer end of the application area. You appear to be stuck thinking about what's possible with today's systems. Sandy Bridge's performance per Watt is already double that of the previous generation: http://www.pcper.com/article.php?aid=1057&type=expert&pid=12. Now imagine that in two more generations we will have four times higher performance per watt, and gather/scatter support, and tell me again it will never be viable to have GPU-less systems even for the low-end market.
Even if it doesn't happen in four years, it's still bound to happen one day. Every single parameter is evolving in favor of software rendering.
>>If all this still seems unlikely to you, just look at what happened to sound processing.
>>In the early days, mixing and modulating multiple channels at high quality was a
>>heavy task for the CPU to perform. So your best option was to get a discrete sound
>>card. Then this sound processor migrated to the chipset. And nowadays it has been
>>reduced to just an I/O chip and the actual processing takes place in an audio codec
>>run by the CPU, thanks to its vastly increased performance. Not even power efficiency can reverse that trend.
>
>Well, in fact it already did and I am glad that you brought up the topic. Look
>at mobile phone SoCs, almost all of them have CPUs with excellent performance/W
>yet except for the cheapest ones they all sport a custom DSP for sound decoding
>and processing, usually coupled with a custom path to the DAC so that large portions
>of the chip can be powered down. The net result is vastly longer battery life when
>doing the common activity of playing back an MP3. Also SoCs are a pretty good example
>of custom hardware growing instead of shrinking, video hardware decoders have been
>growing in functionality and are now augmented by encoders. A peek at the schema
>of a BluRay-player SoC will also show you a disproportionate amount of hardware
>accelerators. The reason for this is very simple: as far as performance/W and performance/area
>go nothing beats custom hardware and that is becoming ever more important with the
>continuous growth of power-constrained devices.
Systems with fixed functionality, ask for dedicated hardware. No argument there. But consumer systems can take on many roles. Graphics is increasingly becoming less of a focus. Once you can run a wide range of casual games (and beyond) on the CPU, which is there anyway, there's very little demand for a piece of dedicated hardware.
Let me put it this way: Professional DJ's will always buy dedicated sound cards. But nowadays it's clear that's a worthless argument against audio codecs for the consumer market. Likewise, mobile phones may have dedicated DSPs for sound today, but with an ever wider range of applications a powerful CPU is needed and so in several years it will be a lot cheaper to have an audio codec for these power-constrained devices as well.
Dual-core CPUs for phones have only barely started to hit the market, but much against some people's expectations they're more power efficient. Two cores at 800 MHz consume less power during everyday use than one core at 1000 MHz, and can deliver higher peak performance when it matters. It's the MHz race to multi-core revolution all over again. Multi-core wins at performance per Watt.
>>The same thing will happen with graphics. Sandy Bridge already makes the GPU share
>>the CPU's caches and memory controllers. So it's not that big a step to perform
>>the actual processing on the CPU cores.
>
>Well, actually it *is* a big step. The only thing that changed is that now the
>IGP can access the L3 in a non-coherent manner, the memory controller was already
>shared as the IGP was on the northbridge. The IGP is still a processor of its own,
>with a specialized instruction set and hardware.
Clarkdale/Arrandale didn't have an on-die memory controller like Nehalem, but was connected to the GMCH with a QPI link. Nehalem didn't have a GPU. So with Sandy Bridge for the first time the CPU and GPU use the same memory hierarchy. Let me put it this way; the (logical) distance between two CPU cores is the same as the distance between a CPU core and a GPU core. You could say they do exactly the same thing: processing data. For as second, think of Sandy Bridge's GPU as a CPU core with a graphics-oriented instruction set.
That's a huge leap toward fully uniform CPUs doing software rendering. At a hardware level it may only seem like juggling things around, but it's a new milestone at a conceptual level. Previously the GPU was considered a device you send some data and some graphics commands and it did its magic. Fire and forget. It may as well have been on a different planet. But with Sandy Bridge these heterogenous cores are sitting side by side.
It opens up the possibility to really look at what sets them apart, and combine their strengths into a homogenous device. Previously this wasn't imaginable. Even if somehow it was reasonably efficient to replace the GPU with generic CPU cores, the communication distance between the two would make it impossible to take advantage of the homogenous instruction set.
By replacing Sandy Bridge's GPU with CPU cores, and adding gather/scatter support, you get a more powerful CPU and a more versatile GPU into one. That is not a big step, at the hardware level, but revolutionizes software development.
>>Once the CPU can run Crysis at high quality,
>>which I expect to happen well within this decade, the number of people willing to
>>pay for a GPU will have reduced considerably.
>
>Really? Even if running Crysis on the CPU will consume 100-1000X as much power
>as running it on a GPU for a similar performance level?
It doesn't consume anywhere near 100 times as much power. Buy yourself an i7 2600 and measure the wall socket power consumption while running Crysis with SwiftShader. It should be about 150 Watt. Next, find a system with the same performance in Crysis, but using the GPU. The closest thing I can think of is an ASUS N10 netbook, with a GeForce 9300M GS (http://www.youtube.com/watch?v=GqzhtI_xhPY). It consumes up to 30 Watts, so that's only a 5x difference.
So while today there is still a gap, it's not nearly as big as you think it is. You really have to keep in mind that GPUs are worthless without a CPU and the rest of the system.
And once again, it's only going to get better. Gather/scatter and a complete set of integer AVX instructions will considerably improve the performance. Also note that not nearly as many man-hours have been spent on SwiftShader compared to the development of GPUs and their drivers, so there's still room for some improvement. And finally, games are getting ever more complex so future GPUs will need more registers and caches and such so they'll burn more Watts to keep the efficiency up.
So this 5x gap is going to close from both ends. It's still considerable, but note that software rendering is very cheap in comparison so even today there's already a market for it.
>>And this entices more developers to
>>make use of the powerful generic capabilities the CPU offers, meaning that this
>>software doensn't run very efficiently on a GPU unless it evolves even closer to
>>the CPU's architecture. It's an unstoppable spiral which will cause the GPU to go
>>the way of the dodo in the long end, just like sound cards are exhaling their last breath.
>
>See above for the sound card analogy, as for the GPU-going-to-the-CPU code I beg
>to differ. For a long time I have been skeptical of the possibility of offloading
>non-rasterized workloads on GPUs but with the impressive stride of ray-tracing,
>path-tracing and full radiosity implementations coming to the GPUs it seems to me
>that it's going the other way around. GPUs are simply better for graphics, no matter how you slice it.
Wrong: http://www.youtube.com/watch?v=4bITAdWvMXE
>>The DLP for graphics workloads is stagnating. The motto "batch, batch, batch" has
>>reached its limits and to efficiently run more diverse applications you need to
>>take advantage of more than just data level parallelism.
>
>The way graphics workloads have been evolving DLP is actually becoming less and
>less of a problem. With deferred rendering and virtual texturing many recent engines
>can actually draw an entire scene with only a minimal number of API calls.
Deferred rendering and virtual texturing are nice techniques but they're not silver bullets by any means. The number of ALUs still grows and despite these techniques today's games use more API calls per frame than those five years ago. So I wouldn't say that DLP is becoming less and less of a problem at all. Like I said, the situation is stagnating (at best), and to achieve higher performance it's useful to start looking at more than just data level parallelism.
The writing on the wall is that both NVIDIA and AMD have released architectures capable of task level parallelism. Clearly they don't trust DLP enough for the foreseeable future and are willing to invest transistors in logic to exploit other types of parallelism.
>>And like I said before we don't want dedicated hardware for filtering. Everything
>>is pointing towards more generic gather/scatter units and filtering in the shaders.
>>Neither do we want dedicated hardware for rasterization. Tesselation is an early
>step toward programmable rasterization.
>
>I suggest you dig a little deeper LRB's demise as a GPU, lack of hardware rasterization
>and poor filtering performance had both an important part in it.
The causes for Larrabee's demise are not an argument against the viability of software rendering on the CPU. Here's why: Larrabee is an expensive piece of additional hardware they attempted to position against fierce competition in the high-end market. In the end what really killed Larrabee is performance per dollar. For Larrabee single digit percentages mattered a lot. So big investments into brilliant architectural changes which only start to pay off in five years when developers take advantage of the new possibilities, are a big no-no.
But that's the situation you get when the hardware is given only one purpose. If Larrabee doesn't perform as a GPU, you can't sell it as a GPU.
Anyhow, this doesn't affect software rendering on the CPU nearly as much. That's because you're already paying for one anwyay. If it's not used for graphics, it's still performing an indispensable purpose for other applications. So unlike with Larrabee which initially would be just for graphics so you expect it to be worth every penny when it's performing its single purpose, it's ok to buy a less-than-perfect architecture for graphics if it has also has many other tricks up its sleeve.
>>None of this is happening overnight and indeed it's not an easy task, but if you
>>look what GPUs were like several years ago, and extrapolate that several years into
>>the future, it's readily apparent that not a single graphics-specific feature will survive the test of time.
>
>Customized hardware will always beat software-based methods on generic hardware
>in every possible metric so no matter what happens, if a workloads is regular enough
>and gobbles enough processing time it will be a candidate for hardware offloading.
>Just look at IBM's wire-speed processor presentation, the thing has encryption,
>regexp and XML parsing accelerators on die (which on a side note tells me our whole
>industry is going down the drain, I never though I would see the XML acronym slapped on a piece of a die micrograph).
Again, systems with a single purpose ask for dedicated hardware. It's not an argument against audio codecs for the consumer market, and it's not an argument against software rendering.
>>Please name one co-processor on the same die, which after many years still remains a heterogeneous component.
>
>Well, I guess it depends on what you define as heterogeneous. FPUs still have separate
>instructions and register files, that sounds heterogeneous to me. A plethora of
>other hardware accelerators have been living on the same die as the CPU in SoCs
>without being absorbed (and some of them actually gained a customized CPU in the meantime).
A heterogenous architecture is generally considered to use separate instructions streams with a different ISA. You use an API to communicate with the slave components. So the FPU is not a heterogenous component, except maybe in the early days when it was a separate chip you could consider x86 and x87 to be different ISAs and the instruction steam is separated by the ESC prefix (which caused an interrupt which you could consider a very low level API call). But regardless of where you put the dividing line it proves my point that it became fully homogenous.
A customized CPU is not a heterogenous architecture either.
And that's the beauty of it. Just because we call something a CPU doesn't mean we can't give it a customized instructions set to make it efficient at processing graphics!
You could even add complete texture sampling instructions. You may even find some use for them outside of graphics. But personally I don't think this is a good idea. As I detailed above things are evolving toward generic gather/scatter operations anyway, which are a lot more useful outside of graphics as well.
>>The fact of the matter is that no developer really likes heterogenous architectures.
>>Software development is complex enough as it is to avoid having to deal with multiple
>>programming models.
>
>I wholeheartedly agree with that. Yet it's not going away, in fact it's getting
>more complex. Die area matters, and power even more so even if we developers don't like it.
It does go away. We used to have vs_1_1 and ps_1_3 instruction sets which were physically implemented in separate vertex and pixel processing pipelines. So if die area and power matter so much, why did they eventually unify? Simple. They were already on the same die (which wan't true prior to hardware T&L), with somewhat overlapping functionality, and the possibility to allow new applications. Unifying them resulted in an architecture which was faster due to removing the balancing bottleneck, and enabled developers to explore new horizons.
Before unified architectures were launched, vertex processing and pixel processing was still very different. Vertices needed full FP32 precision and complex operations, while pixels first and foremost needed texture sampling and at the most a bumpmap operation. Arguments based on die area and power consumption were used to ridicule anyone who even remotely considered the possibility of unifying them. After all, what would a vertex shader do with texture sampling, and what would a pixel shader do with FP32, right? And as for the balancing bottleneck, geometry LOD was the obvious silver bullet...
Sounds familiar? I think we both know who ended up being dead wrong. Note that this happened less than ten years ago. So again, just how certain are you that die area and power consumption will prevent a complete unification of the CPU and GPU (i.e. a completed vector instruction set), and a return to software rendering? They're on the same die today, using the same memory subsystem, you'd be removing a balancing bottleneck, you'd enable the CPU to run other existing and new/unknown compute intensive applications, and as a graphics processor you'd gain endless possibilities...
>>Also keep in mind that the GPU's programming models haven't
>>even settled down. Supporting multiple brands and even multiple generations from
>>the same manufacturer is a huge pain. So together with the 100 fold difference in
>>performance between the high-end and low-end, we're nowhere near using the GPU as
>>a reliable vector co-processor. Targetting SSE and AVX offers a lot more guarantees.
>
>That I also agree with, however we went from being completely unable to use GPUs
>for other workloads to being able of using them. As far as content creation goes
>I've been seeing more and more CPU-only applications offering GPU-assisted offloading
>which is something that (pleasantly) surprised me as I think it's the field where GPGPU makes more sense.
There's an article on this very site which shows that offloading anything other than graphics to modest GPUs is a bad idea and you need to cheat to make it look even remotely interesting: http://www.realworldtech.com/page.cfm?ArticleID=RWT070510142143
Also, what sort of content creation are you talking about? If it's graphics realated, it's not GPGPU.
>>The only thing missing to make the CPU's SIMD support complete, is gather/scatter.
>>Once that's available, the GPU makes little chance as a co-processor.
>
>Honestly, you seem to be ignoring what happened with LRB and that's a piece of
>hardware on which some very fine hardware designers and programmers worked on and
>it tanked exactly because bolting gather/scatter to a CPU is not enough to replace a GPU, far from it.
I'm not ignoring what happened to Larrabee at all. It has become a prime example of what not to do; trying to compete with high-end dedicated hardware when every dollar worth of hardware not spent on achieving higher framerates for today's games results in halving your potential GPU market share.
Eventually GPUs will support fully generic code and have more independent cores and explicit SIMD instruction sets with gather/scatter and a coherent cache hierarchy, just like Larrabee, but you can't jump to that in one go since the software needs to evolve too. If NVIDIA released a GeForce GT 430 back in 2007 instead of the GeForce 8800 Ultra (same transistor count), it would have failed miserably, despite the DirectX 11 features, CUDA 2.1 capabilities, and large caches. It sacrifices raw computing power and bandwidth per transistor, and this trend will slowly but surely continue.
But Larrabee's mistakes don't apply to CPU software rendering. With only minor changes the CPU can become a more attractive processor for a wider range of applications, while also providing adequate support for graphics. The first iteration may only interest people who want a powerful CPU and don't care much about graphics as long as it can support simple games and 3D interfaces. But as things continue to converge and the advantages of generic graphics programming become apparent it will make mid-end and eventually high-end GPUs redundant. People who want powerful graphics will simply buy CPUs with more cores.
---------------------------
>Nicolas Capens (nicolas.capens@gmail.com) on 1/5/11 wrote:
>---------------------------
>>Do you happen to have any sources which explain just how complex it is? I'm sure
>>it's not trivial, but given that LRBni already has many other features which I'd
>>consider relatively complex, I'd be surprised if gather/scatter took a disproportionately
>>large area. Intel was able to fit 32 of these feature rich cores with 512-bit vectors
>>onto a chip roughly double the transistor count of Sandy Bridge...
>
>Unfortunately Intel didn't disclose details on LRB die AFAIK so we don't know what
>was the weight of every single feature. Yet the fact that LRBni doesn't include
>a full-vector permute tells you that doing a crossbar of that size is prohibitive.
Somehow it does have full-vector gather/scatter support, so it's not prohibitive if it's valuable enough to add. Expensive, definitely, but not prohibitive. And as I pointed out earlier, with two 128-bit load units you'd only need two 128-bit permute crossbars so it should be very feasible.
>>What I meant is that the two 128-bit load units could each have their own 4x32-bit
>>crossbar. It means they sometimes load the same cache line, but that's ok. I assume
>>that's what already happens anyway with vmovaps, and it considerably simplifies
>>the crossbar. So while Larrabee requires one massive 16x32 crossbar an architecture
>>based on Sandy Bridge would have a near optimal gather/scatter implementation with two simpler 4x32 crossbars.
>
>Yeah, that sounds feasible to me.
See? ;-)
>>At the same or higher image quality level. On a Core i7 965 SwiftShader scores
>>620 3DMark06 points. That's more than a GMA X3100.
>
>It is nice to hear that 3DMark06 can be run in software but an i7 965 is a pretty
>high end processor and beating a an old and lowly GMA X3100 doesn't tell me much.
>At 1280x1024 decent IGPs are already scoring in the thousands and will have vastly
>better performance/W, performance/$ and performance/area than an i7 965 doing software rendering.
For the record, I have an i7 920 clocked at 965 speeds. As you certainly know, CPUs get ridiculously more expensive for a small increase in performance. But it also works the other way around! You don't have to lose a lot of performance when you opt for a cheaper CPU. Plus, any system needs a CPU anyway, so you got to take that into account when comparing CPUs against GPUs. The latter still needs a CPU or it's worthless...
This weekend I had the opportunity to run SwiftShader on an i7 2600. It scored 820 3DMark06 points at stock speed. So that's 32% faster than my overclocked 920! What's more, the 2600 is considered a mainstream CPU and it's price is expected to drop fast once AMD launches Bulldozer. And it comes with a ridiculously small heatsink. Note that the GMA X4500, still sold in massive numbers, only scores about 950 3DMark06 points.
So while I won't deny there's still a gap in performance and power consumption, it's getting smaller every generation!
But that's not all. Sandy Bridge wastes die space on a GPU. It could have had two extra cores instead, exceeding the X4500's performance. Furthermore, SwiftShader doesn't take advantage of AVX yet. And with gather/scatter support lots of graphics operations would become a whole lote more efficient.
So it's already well within reach to have a GPU-less system and still have ample 3D graphics performance for the same market that APUs target. Last but not least, with extra generic CPU cores developers can create other diverse and demanding applications.
>> (...)
>You could also say that AMD managed to create an architecture which matches NVIDIA's
>with vastly lower die area and transistor count. The differences between their hardware
>boil down to the fact that AMD is using VLIW shader processors which by design will
>have lower achievable throughput but being simpler much higher compute density.
Actually I believe AMD's GPUs have a higher transistor count, on a smaller die. This is probably due to NVIDIA's architecture using a higher clocked shader domain, requiring faster and thus larger transistors. Furthermore, depending on the exact chips you're comparing, NVIDIA has higher double-precision performance. Together with the new Fermi capabilities this means NVIDIA is way ahead of AMD in the HPC market. It's obvious this comes at a bit of an extra die size cost.
But clearly that difference is insignificant in comparison to what I pointed out: It achieves the same actual performance with half the GFLOPS! The importance of this is that efficiency is critical, even for graphics, which was previously regarded as "ridiculously parallel". And since there are lots of GPGPU applications which don't even achieve 10% of the theoretical performance of the GPU, while they use the CPU to it's fullest, it's clear that CPUs are really efficient at juggling tasks around and keeping the data flowing.
So how AMD and NVIDIA compare today is pretty irrelevant to the discussion. What matters in the long term is that applications, including graphics applications, are getting more complex and a CPU architecture is more suited for this.
Note that software rendering frees game engine developers from the graphics API restrictions, which in turn results in higher performance in practice: http://graphics.cs.williams.edu/archive/SweeneyHPG2009/TimHPG2009.pdf
>>GPUs need ever larger caches, reduced branch granularities, ALUs with more generic
>>operations, support for call stacks, etc. Mark my words, one day they'll need speculative
>>execution just to prevent the register file from growing out of proportion.
>
>Speculative execution will not cause the register files to shrink, it will cause
>them to grow. The larger the instruction window the more physical registers you
>need to store results of not yet retired instructions.
Retirement buffers are pretty small. But instead of speculative execution, GPUs opt for massive simultaneous multi-threading, which means they need register space for all these extra threads. Shaders which use more registers than what the hardware was designed for, can really decimate performance. Developers also want a true call stack, so you need massive caches to store all this context.
This situation is not sustainable. Some really drastic measures need to be taken to reduce the thread count. But you don't need to look any further than CPU architectures. Speculative execution, branch prediction, forwarding, register renaming, etc. can come to the rescue. But when GPUs sacrifice computing density for higher efficiency at complex generic tasks, it's obviously also a really interesting option to just start with a multi-core CPU architecture and give it powerful graphics capabilities by adding gather/scatter.
>>Likely
>>before that we'll also have programmable texture filtering, which means the texture
>>units become more like generic gather/scatter units and the actual filtering is
>>done at full FP32 precision in the shader cores (AMD already does the texel address
>>calculations in the shader).
>
>Texture address calculation has always done partly in the shader core and partly
>in the TU since R6xx IIRC, and technically speaking you could consider the TU a
>coprocessor of the shader processor being completely tied to it. Add to this that
>both nVidia and AMD continuously improved the filtering speed of their TU as well
>as adding specialized functionality (like Gather4) to accelerate in hardware common
>parts of custom filters. Besides TUs also implement on-the-fly decompression of
>compressed textures, not just addressing and they often have specialized caches
>(very granular, often with fully associative addressing). That is not something
>you can easily merge with a more generic architecture. Doing so would certainly
>yield lower throughput, and higher power at the same performance level.
While texture sampling is a highly specialized operation (it's the only thing Larrabee really has dedicated hardware for), the general trend is still to generalize them into load/store (gather/scatter) operations. There are several reasons for this:
Shaders need ever fewer texture samples (TEX:ALU ratio). So the amount of die area use on texture samplers has steadily decreased. But this means that TEX heavy shaders become bottlenecked, and for ALU heavy shaders they are underutilized...
But while the TEX:ALU ratio decreases, the shaders do make more unfiltered memory accesses. So modern GPUs also have specialized access to local memory, shared memory, global memory, etc. Each of these can again be a bottleneck!
Also, filtering is useless for GPGPU, while graphics on the other hand wants full FP32 filtering (possibly even FP64). These diverging needs are hard to combine into texture units or other highly specialized load/store units.
So it becomes very tempting to just slap all these different forms of memory access together, have generic load/store units, and let the cache hierarchy take care of local and temporal coherence.
If only CPUs had gather/scatter support, SwiftShader would have no real bottlenecks, no matter what you throw at it. The CPU might not be the be the most efficient piece of hardware for any particular task, but the fact that in practice GPUs are always bottlenecked by something means you can still get high average efficiency.
As for compressed textures, they're not as critical as they may seem. Due to licensing issues, popular S3TC based formats are still not part of core OpenGL. Loads of applications don't bother using them. And low-end and mid-end hardware have higher bandwith:computing ratios so it's not that helpful. Ironically people with high-end hardware tend not to like the quality loss of texture compression so they sometimes disable it. The general trent is toward higher quality. Where compression does help significantly is to fit everything in VRAM. But for software rendering that's far less of an issue since you typically have about four times more system RAM. Note also that an i7 2600 has lower memory bandwidth than an i7 920 yet it's significantly faster so software rendering isn't bandwidth limited.
>>The only reason we haven't seen a sign of Larrabee as a GPU yet, is because it's
>>too soon, and because it approaches the market from the wrong end. It's too soon
>>because despite the enormous opportunities a generic architecture like that offers,
>>developers aren't just going to invest in developing applications for it if it has
>>too little market share. And it has too little market share because Intel tried
>>to sell it as a high-end GPU. It's an architecture with huge potential but it's
>>facing a massive chicken-and-egg problem. We first need the rest of the GPU landscape
>>to evolve in the same direction before developers will take the plunge. There's
>>simply no strong software ecosystem yet for things that go beyond the classic graphics
>>APIs. OpenCL and such are cautious first steps but it will take many more years
>>before you can write code in the language of your liking and have it run on either
>>a CPU or a GPU. It takes more steps toward full convergence for this to become a reality.
>
>I agree with this POV but I don't think that software rendering will ever be viable,
>even in the low end, but more on this later.
>
>>Software rendering for the low-end market on the other hand is viable today. There's
>>bound to be people who rather have a more powerful CPU instead of an on-die GPU.
>>And these powerful CPUs are attractive to developers because they can safely extend
>>the existing software ecosystem without big risky investments. This software further
>>increases the demand for powerful CPUs, and these will be capable of more than just low-end graphics...
>
>I don't think so. Basing ourselves on your 3DMark06 number means that most of the
>casual games available lately on services like Steam wouldn't be able to run at
>playable frame-rates even at the lowest setting and resolutions. And that would
>be on high-end, quad cores, what would this mean for people with CULV laptops?
Why not try it instead of "basing" yourself on the 3DMark06 score? I have yet to find a game in Steam's casual section which doesn't run fine with SwiftShader on a quad-core CPU. Some even run smoothly with anti-aliasing. 3DMark06 is a synthetic stress test designed to benchmark the hardware at the time of release, and still offer a challenge to future graphics solutions. Even today's casual games are nowhere near as complex as 3DMark06. And that's not likely to change fast. Causal games are designed to run on the lowest common denominator, which includes older IGP's like the X3100.
For your information, SwiftShader will be used in future versions of Flash to enable 3D casual games on the web, when you don't have adequate graphics hardware: http://blogs.adobe.com/flashplatform/2011/01/digging-more-into-the-molehill-apis.html
But that's one outer end of the application area. You appear to be stuck thinking about what's possible with today's systems. Sandy Bridge's performance per Watt is already double that of the previous generation: http://www.pcper.com/article.php?aid=1057&type=expert&pid=12. Now imagine that in two more generations we will have four times higher performance per watt, and gather/scatter support, and tell me again it will never be viable to have GPU-less systems even for the low-end market.
Even if it doesn't happen in four years, it's still bound to happen one day. Every single parameter is evolving in favor of software rendering.
>>If all this still seems unlikely to you, just look at what happened to sound processing.
>>In the early days, mixing and modulating multiple channels at high quality was a
>>heavy task for the CPU to perform. So your best option was to get a discrete sound
>>card. Then this sound processor migrated to the chipset. And nowadays it has been
>>reduced to just an I/O chip and the actual processing takes place in an audio codec
>>run by the CPU, thanks to its vastly increased performance. Not even power efficiency can reverse that trend.
>
>Well, in fact it already did and I am glad that you brought up the topic. Look
>at mobile phone SoCs, almost all of them have CPUs with excellent performance/W
>yet except for the cheapest ones they all sport a custom DSP for sound decoding
>and processing, usually coupled with a custom path to the DAC so that large portions
>of the chip can be powered down. The net result is vastly longer battery life when
>doing the common activity of playing back an MP3. Also SoCs are a pretty good example
>of custom hardware growing instead of shrinking, video hardware decoders have been
>growing in functionality and are now augmented by encoders. A peek at the schema
>of a BluRay-player SoC will also show you a disproportionate amount of hardware
>accelerators. The reason for this is very simple: as far as performance/W and performance/area
>go nothing beats custom hardware and that is becoming ever more important with the
>continuous growth of power-constrained devices.
Systems with fixed functionality, ask for dedicated hardware. No argument there. But consumer systems can take on many roles. Graphics is increasingly becoming less of a focus. Once you can run a wide range of casual games (and beyond) on the CPU, which is there anyway, there's very little demand for a piece of dedicated hardware.
Let me put it this way: Professional DJ's will always buy dedicated sound cards. But nowadays it's clear that's a worthless argument against audio codecs for the consumer market. Likewise, mobile phones may have dedicated DSPs for sound today, but with an ever wider range of applications a powerful CPU is needed and so in several years it will be a lot cheaper to have an audio codec for these power-constrained devices as well.
Dual-core CPUs for phones have only barely started to hit the market, but much against some people's expectations they're more power efficient. Two cores at 800 MHz consume less power during everyday use than one core at 1000 MHz, and can deliver higher peak performance when it matters. It's the MHz race to multi-core revolution all over again. Multi-core wins at performance per Watt.
>>The same thing will happen with graphics. Sandy Bridge already makes the GPU share
>>the CPU's caches and memory controllers. So it's not that big a step to perform
>>the actual processing on the CPU cores.
>
>Well, actually it *is* a big step. The only thing that changed is that now the
>IGP can access the L3 in a non-coherent manner, the memory controller was already
>shared as the IGP was on the northbridge. The IGP is still a processor of its own,
>with a specialized instruction set and hardware.
Clarkdale/Arrandale didn't have an on-die memory controller like Nehalem, but was connected to the GMCH with a QPI link. Nehalem didn't have a GPU. So with Sandy Bridge for the first time the CPU and GPU use the same memory hierarchy. Let me put it this way; the (logical) distance between two CPU cores is the same as the distance between a CPU core and a GPU core. You could say they do exactly the same thing: processing data. For as second, think of Sandy Bridge's GPU as a CPU core with a graphics-oriented instruction set.
That's a huge leap toward fully uniform CPUs doing software rendering. At a hardware level it may only seem like juggling things around, but it's a new milestone at a conceptual level. Previously the GPU was considered a device you send some data and some graphics commands and it did its magic. Fire and forget. It may as well have been on a different planet. But with Sandy Bridge these heterogenous cores are sitting side by side.
It opens up the possibility to really look at what sets them apart, and combine their strengths into a homogenous device. Previously this wasn't imaginable. Even if somehow it was reasonably efficient to replace the GPU with generic CPU cores, the communication distance between the two would make it impossible to take advantage of the homogenous instruction set.
By replacing Sandy Bridge's GPU with CPU cores, and adding gather/scatter support, you get a more powerful CPU and a more versatile GPU into one. That is not a big step, at the hardware level, but revolutionizes software development.
>>Once the CPU can run Crysis at high quality,
>>which I expect to happen well within this decade, the number of people willing to
>>pay for a GPU will have reduced considerably.
>
>Really? Even if running Crysis on the CPU will consume 100-1000X as much power
>as running it on a GPU for a similar performance level?
It doesn't consume anywhere near 100 times as much power. Buy yourself an i7 2600 and measure the wall socket power consumption while running Crysis with SwiftShader. It should be about 150 Watt. Next, find a system with the same performance in Crysis, but using the GPU. The closest thing I can think of is an ASUS N10 netbook, with a GeForce 9300M GS (http://www.youtube.com/watch?v=GqzhtI_xhPY). It consumes up to 30 Watts, so that's only a 5x difference.
So while today there is still a gap, it's not nearly as big as you think it is. You really have to keep in mind that GPUs are worthless without a CPU and the rest of the system.
And once again, it's only going to get better. Gather/scatter and a complete set of integer AVX instructions will considerably improve the performance. Also note that not nearly as many man-hours have been spent on SwiftShader compared to the development of GPUs and their drivers, so there's still room for some improvement. And finally, games are getting ever more complex so future GPUs will need more registers and caches and such so they'll burn more Watts to keep the efficiency up.
So this 5x gap is going to close from both ends. It's still considerable, but note that software rendering is very cheap in comparison so even today there's already a market for it.
>>And this entices more developers to
>>make use of the powerful generic capabilities the CPU offers, meaning that this
>>software doensn't run very efficiently on a GPU unless it evolves even closer to
>>the CPU's architecture. It's an unstoppable spiral which will cause the GPU to go
>>the way of the dodo in the long end, just like sound cards are exhaling their last breath.
>
>See above for the sound card analogy, as for the GPU-going-to-the-CPU code I beg
>to differ. For a long time I have been skeptical of the possibility of offloading
>non-rasterized workloads on GPUs but with the impressive stride of ray-tracing,
>path-tracing and full radiosity implementations coming to the GPUs it seems to me
>that it's going the other way around. GPUs are simply better for graphics, no matter how you slice it.
Wrong: http://www.youtube.com/watch?v=4bITAdWvMXE
>>The DLP for graphics workloads is stagnating. The motto "batch, batch, batch" has
>>reached its limits and to efficiently run more diverse applications you need to
>>take advantage of more than just data level parallelism.
>
>The way graphics workloads have been evolving DLP is actually becoming less and
>less of a problem. With deferred rendering and virtual texturing many recent engines
>can actually draw an entire scene with only a minimal number of API calls.
Deferred rendering and virtual texturing are nice techniques but they're not silver bullets by any means. The number of ALUs still grows and despite these techniques today's games use more API calls per frame than those five years ago. So I wouldn't say that DLP is becoming less and less of a problem at all. Like I said, the situation is stagnating (at best), and to achieve higher performance it's useful to start looking at more than just data level parallelism.
The writing on the wall is that both NVIDIA and AMD have released architectures capable of task level parallelism. Clearly they don't trust DLP enough for the foreseeable future and are willing to invest transistors in logic to exploit other types of parallelism.
>>And like I said before we don't want dedicated hardware for filtering. Everything
>>is pointing towards more generic gather/scatter units and filtering in the shaders.
>>Neither do we want dedicated hardware for rasterization. Tesselation is an early
>step toward programmable rasterization.
>
>I suggest you dig a little deeper LRB's demise as a GPU, lack of hardware rasterization
>and poor filtering performance had both an important part in it.
The causes for Larrabee's demise are not an argument against the viability of software rendering on the CPU. Here's why: Larrabee is an expensive piece of additional hardware they attempted to position against fierce competition in the high-end market. In the end what really killed Larrabee is performance per dollar. For Larrabee single digit percentages mattered a lot. So big investments into brilliant architectural changes which only start to pay off in five years when developers take advantage of the new possibilities, are a big no-no.
But that's the situation you get when the hardware is given only one purpose. If Larrabee doesn't perform as a GPU, you can't sell it as a GPU.
Anyhow, this doesn't affect software rendering on the CPU nearly as much. That's because you're already paying for one anwyay. If it's not used for graphics, it's still performing an indispensable purpose for other applications. So unlike with Larrabee which initially would be just for graphics so you expect it to be worth every penny when it's performing its single purpose, it's ok to buy a less-than-perfect architecture for graphics if it has also has many other tricks up its sleeve.
>>None of this is happening overnight and indeed it's not an easy task, but if you
>>look what GPUs were like several years ago, and extrapolate that several years into
>>the future, it's readily apparent that not a single graphics-specific feature will survive the test of time.
>
>Customized hardware will always beat software-based methods on generic hardware
>in every possible metric so no matter what happens, if a workloads is regular enough
>and gobbles enough processing time it will be a candidate for hardware offloading.
>Just look at IBM's wire-speed processor presentation, the thing has encryption,
>regexp and XML parsing accelerators on die (which on a side note tells me our whole
>industry is going down the drain, I never though I would see the XML acronym slapped on a piece of a die micrograph).
Again, systems with a single purpose ask for dedicated hardware. It's not an argument against audio codecs for the consumer market, and it's not an argument against software rendering.
>>Please name one co-processor on the same die, which after many years still remains a heterogeneous component.
>
>Well, I guess it depends on what you define as heterogeneous. FPUs still have separate
>instructions and register files, that sounds heterogeneous to me. A plethora of
>other hardware accelerators have been living on the same die as the CPU in SoCs
>without being absorbed (and some of them actually gained a customized CPU in the meantime).
A heterogenous architecture is generally considered to use separate instructions streams with a different ISA. You use an API to communicate with the slave components. So the FPU is not a heterogenous component, except maybe in the early days when it was a separate chip you could consider x86 and x87 to be different ISAs and the instruction steam is separated by the ESC prefix (which caused an interrupt which you could consider a very low level API call). But regardless of where you put the dividing line it proves my point that it became fully homogenous.
A customized CPU is not a heterogenous architecture either.
And that's the beauty of it. Just because we call something a CPU doesn't mean we can't give it a customized instructions set to make it efficient at processing graphics!
You could even add complete texture sampling instructions. You may even find some use for them outside of graphics. But personally I don't think this is a good idea. As I detailed above things are evolving toward generic gather/scatter operations anyway, which are a lot more useful outside of graphics as well.
>>The fact of the matter is that no developer really likes heterogenous architectures.
>>Software development is complex enough as it is to avoid having to deal with multiple
>>programming models.
>
>I wholeheartedly agree with that. Yet it's not going away, in fact it's getting
>more complex. Die area matters, and power even more so even if we developers don't like it.
It does go away. We used to have vs_1_1 and ps_1_3 instruction sets which were physically implemented in separate vertex and pixel processing pipelines. So if die area and power matter so much, why did they eventually unify? Simple. They were already on the same die (which wan't true prior to hardware T&L), with somewhat overlapping functionality, and the possibility to allow new applications. Unifying them resulted in an architecture which was faster due to removing the balancing bottleneck, and enabled developers to explore new horizons.
Before unified architectures were launched, vertex processing and pixel processing was still very different. Vertices needed full FP32 precision and complex operations, while pixels first and foremost needed texture sampling and at the most a bumpmap operation. Arguments based on die area and power consumption were used to ridicule anyone who even remotely considered the possibility of unifying them. After all, what would a vertex shader do with texture sampling, and what would a pixel shader do with FP32, right? And as for the balancing bottleneck, geometry LOD was the obvious silver bullet...
Sounds familiar? I think we both know who ended up being dead wrong. Note that this happened less than ten years ago. So again, just how certain are you that die area and power consumption will prevent a complete unification of the CPU and GPU (i.e. a completed vector instruction set), and a return to software rendering? They're on the same die today, using the same memory subsystem, you'd be removing a balancing bottleneck, you'd enable the CPU to run other existing and new/unknown compute intensive applications, and as a graphics processor you'd gain endless possibilities...
>>Also keep in mind that the GPU's programming models haven't
>>even settled down. Supporting multiple brands and even multiple generations from
>>the same manufacturer is a huge pain. So together with the 100 fold difference in
>>performance between the high-end and low-end, we're nowhere near using the GPU as
>>a reliable vector co-processor. Targetting SSE and AVX offers a lot more guarantees.
>
>That I also agree with, however we went from being completely unable to use GPUs
>for other workloads to being able of using them. As far as content creation goes
>I've been seeing more and more CPU-only applications offering GPU-assisted offloading
>which is something that (pleasantly) surprised me as I think it's the field where GPGPU makes more sense.
There's an article on this very site which shows that offloading anything other than graphics to modest GPUs is a bad idea and you need to cheat to make it look even remotely interesting: http://www.realworldtech.com/page.cfm?ArticleID=RWT070510142143
Also, what sort of content creation are you talking about? If it's graphics realated, it's not GPGPU.
>>The only thing missing to make the CPU's SIMD support complete, is gather/scatter.
>>Once that's available, the GPU makes little chance as a co-processor.
>
>Honestly, you seem to be ignoring what happened with LRB and that's a piece of
>hardware on which some very fine hardware designers and programmers worked on and
>it tanked exactly because bolting gather/scatter to a CPU is not enough to replace a GPU, far from it.
I'm not ignoring what happened to Larrabee at all. It has become a prime example of what not to do; trying to compete with high-end dedicated hardware when every dollar worth of hardware not spent on achieving higher framerates for today's games results in halving your potential GPU market share.
Eventually GPUs will support fully generic code and have more independent cores and explicit SIMD instruction sets with gather/scatter and a coherent cache hierarchy, just like Larrabee, but you can't jump to that in one go since the software needs to evolve too. If NVIDIA released a GeForce GT 430 back in 2007 instead of the GeForce 8800 Ultra (same transistor count), it would have failed miserably, despite the DirectX 11 features, CUDA 2.1 capabilities, and large caches. It sacrifices raw computing power and bandwidth per transistor, and this trend will slowly but surely continue.
But Larrabee's mistakes don't apply to CPU software rendering. With only minor changes the CPU can become a more attractive processor for a wider range of applications, while also providing adequate support for graphics. The first iteration may only interest people who want a powerful CPU and don't care much about graphics as long as it can support simple games and 3D interfaces. But as things continue to converge and the advantages of generic graphics programming become apparent it will make mid-end and eventually high-end GPUs redundant. People who want powerful graphics will simply buy CPUs with more cores.
Topic | Posted By | Date |
---|---|---|
Sandy Bridge CPU article online | David Kanter | 2010/09/26 09:35 PM |
Sandy Bridge CPU article online | Alex | 2010/09/27 05:22 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 10:06 AM |
Sandy Bridge CPU article online | someone | 2010/09/27 06:03 AM |
Sandy Bridge CPU article online | slacker | 2010/09/27 02:08 PM |
PowerPC is now Power | Paul A. Clayton | 2010/09/27 04:34 PM |
Sandy Bridge CPU article online | Dave | 2010/11/10 10:15 PM |
Sandy Bridge CPU article online | someone | 2010/09/27 06:23 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 06:39 PM |
Optimizing register clear | Paul A. Clayton | 2010/09/28 12:34 PM |
Sandy Bridge CPU article online | MS | 2010/09/27 06:54 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 10:15 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 11:02 AM |
Sandy Bridge CPU article online | mpx | 2010/09/27 11:44 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 02:37 PM |
Precisely | David Kanter | 2010/09/27 03:22 PM |
Sandy Bridge CPU article online | Richard Cownie | 2010/09/27 08:27 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 10:01 AM |
Sandy Bridge CPU article online | Richard Cownie | 2010/09/27 10:40 AM |
Sandy Bridge CPU article online | boots | 2010/09/27 11:19 AM |
Right, mid-2011, not 2010. Sorry (NT) | Richard Cownie | 2010/09/27 11:42 AM |
bulldozer single thread performance | Max | 2010/09/27 12:57 PM |
bulldozer single thread performance | Matt Waldhauer | 2011/03/02 11:32 AM |
Sandy Bridge CPU article online | Pun Zu | 2010/09/27 11:32 AM |
Sandy Bridge CPU article online | ? | 2010/09/27 11:44 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 01:11 PM |
My opinion is that anything that would take advantage of 256-bit AVX | redpriest | 2010/09/27 01:17 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/27 03:09 PM |
My opinion is that anything that would take advantage of 256-bit AVX | redpriest | 2010/09/27 04:06 PM |
My opinion is that anything that would take advantage of 256-bit AVX | David Kanter | 2010/09/27 05:23 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:57 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 04:35 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Matt Waldhauer | 2010/09/28 10:58 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/27 06:39 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 04:14 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Megol | 2010/09/28 02:17 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Michael S | 2010/09/28 05:47 AM |
PGI | Carlie Coats | 2010/09/28 10:23 AM |
gfortran... | Carlie Coats | 2010/09/29 09:33 AM |
My opinion is that anything that would take advantage of 256-bit AVX | mpx | 2010/09/28 12:58 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Michael S | 2010/09/28 01:36 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Foo_ | 2010/09/29 01:08 AM |
My opinion is that anything that would take advantage of 256-bit AVX | mpx | 2010/09/28 11:37 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/28 01:19 PM |
My opinion is that anything that would take advantage of 256-bit AVX | hobold | 2010/09/28 03:08 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 04:26 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Anthony | 2010/09/28 10:31 PM |
Sandy Bridge CPU article online | Hans de Vries | 2010/09/27 02:19 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 03:19 PM |
Sandy Bridge CPU article online | -Sweeper_ | 2010/09/27 05:50 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 06:41 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/27 02:55 PM |
Sandy Bridge CPU article online | line98 | 2010/09/27 03:05 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 03:20 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/27 03:23 PM |
Sandy Bridge CPU article online | line98 | 2010/09/27 03:42 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:33 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 04:04 PM |
Sandy Bridge CPU article online | Jack | 2010/09/27 04:40 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 11:47 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 11:54 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 11:59 PM |
Sandy Bridge CPU article online | JS | 2010/09/28 01:18 AM |
Sandy Bridge CPU article online | Royi | 2010/09/28 01:31 AM |
Sandy Bridge CPU article online | Jack | 2010/09/28 06:34 AM |
Sandy Bridge CPU article online | Royi | 2010/09/28 08:22 AM |
Sandy Bridge CPU article online | Foo_ | 2010/09/28 12:53 PM |
Sandy Bridge CPU article online | Paul | 2010/09/28 01:17 PM |
Sandy Bridge CPU article online | mpx | 2010/09/28 01:22 PM |
Sandy Bridge CPU article online | anonymous | 2010/09/28 02:06 PM |
Sandy Bridge CPU article online | IntelUser2000 | 2010/09/29 01:49 AM |
Sandy Bridge CPU article online | Jack | 2010/09/28 05:08 PM |
Sandy Bridge CPU article online | mpx | 2010/09/29 01:50 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/29 12:01 PM |
Sandy Bridge CPU article online | Royi | 2010/09/29 12:48 PM |
Sandy Bridge CPU article online | mpx | 2010/09/29 02:15 PM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/29 02:27 PM |
Sandy Bridge CPU article online | ? | 2010/09/29 11:18 PM |
Sandy Bridge CPU article online | savantu | 2010/09/30 12:28 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 03:43 AM |
Sandy Bridge CPU article online | gallier2 | 2010/09/30 04:18 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 08:38 AM |
Sandy Bridge CPU article online | David Hess | 2010/09/30 10:28 AM |
moderation (again) | hobold | 2010/10/01 05:08 AM |
Sandy Bridge CPU article online | Megol | 2010/09/30 02:13 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 03:47 AM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 08:54 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 10:18 AM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 12:04 PM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 12:38 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/30 01:02 PM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 08:09 PM |
Sandy Bridge CPU article online | mpx | 2010/09/30 12:40 PM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 01:00 PM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 08:44 PM |
Sandy Bridge CPU article online | David Hess | 2010/09/30 10:36 AM |
Sandy Bridge CPU article online | someone | 2010/09/30 11:23 AM |
Sandy Bridge CPU article online | mpx | 2010/09/30 01:50 PM |
wii lesson | Michael S | 2010/09/30 02:12 PM |
wii lesson | Dan Downs | 2010/09/30 03:33 PM |
wii lesson | Kevin G | 2010/10/01 12:27 AM |
wii lesson | Rohit | 2010/10/01 07:53 AM |
wii lesson | Kevin G | 2010/10/02 03:30 AM |
wii lesson | mpx | 2010/10/01 09:02 AM |
wii lesson | IntelUser2000 | 2010/10/01 09:31 AM |
GPUs and games | David Kanter | 2010/09/30 08:17 PM |
GPUs and games | hobold | 2010/10/01 05:27 AM |
GPUs and games | anonymous | 2010/10/01 06:35 AM |
GPUs and games | Gabriele Svelto | 2010/10/01 09:07 AM |
GPUs and games | Linus Torvalds | 2010/10/01 10:41 AM |
GPUs and games | Anon | 2010/10/01 11:23 AM |
Can Intel do *this* ??? | Mark Roulo | 2010/10/03 03:17 PM |
Can Intel do *this* ??? | Anon | 2010/10/03 03:29 PM |
Can Intel do *this* ??? | Mark Roulo | 2010/10/03 03:55 PM |
Can Intel do *this* ??? | Anon | 2010/10/03 05:45 PM |
Can Intel do *this* ??? | Ian Ameline | 2010/10/03 10:35 PM |
Graphics, IGPs, and Cache | Joe | 2010/10/10 09:51 AM |
Graphics, IGPs, and Cache | Anon | 2010/10/10 10:18 PM |
Graphics, IGPs, and Cache | Rohit | 2010/10/11 06:14 AM |
Graphics, IGPs, and Cache | hobold | 2010/10/11 06:43 AM |
Maybe the IGPU doesn't load into the L3 | Mark Roulo | 2010/10/11 08:05 AM |
Graphics, IGPs, and Cache | David Kanter | 2010/10/11 09:01 AM |
Can Intel do *this* ??? | Gabriele Svelto | 2010/10/04 12:31 AM |
Kanter's Law. | Ian Ameline | 2010/10/01 02:05 PM |
Kanter's Law. | David Kanter | 2010/10/01 02:18 PM |
Kanter's Law. | Ian Ameline | 2010/10/01 02:33 PM |
Kanter's Law. | Kevin G | 2010/10/01 04:19 PM |
Kanter's Law. | IntelUser2000 | 2010/10/01 10:36 PM |
Kanter's Law. | Kevin G | 2010/10/02 03:15 AM |
Kanter's Law. | IntelUser2000 | 2010/10/02 02:35 PM |
Wii vs pc's | Rohit | 2010/10/01 07:34 PM |
Wii vs pc's | Gabriele Svelto | 2010/10/01 11:54 PM |
GPUs and games | mpx | 2010/10/02 11:30 AM |
GPUs and games | Foo_ | 2010/10/02 04:03 PM |
GPUs and games | mpx | 2010/10/03 11:29 AM |
GPUs and games | Foo_ | 2010/10/03 01:52 PM |
GPUs and games | mpx | 2010/10/03 03:29 PM |
GPUs and games | Anon | 2010/10/03 03:49 PM |
GPUs and games | mpx | 2010/10/04 11:42 AM |
GPUs and games | MS | 2010/10/04 02:51 PM |
GPUs and games | Anon | 2010/10/04 08:29 PM |
persistence of vision | hobold | 2010/10/04 11:47 PM |
GPUs and games | mpx | 2010/10/05 12:51 AM |
GPUs and games | MS | 2010/10/05 06:49 AM |
GPUs and games | Jack | 2010/10/05 11:17 AM |
GPUs and games | MS | 2010/10/05 05:19 PM |
GPUs and games | Jack | 2010/10/05 11:11 AM |
GPUs and games | mpx | 2010/10/05 12:51 PM |
GPUs and games | David Kanter | 2010/10/06 09:04 AM |
GPUs and games | jack | 2010/10/06 09:34 PM |
GPUs and games | Linus Torvalds | 2010/10/05 07:29 AM |
GPUs and games | Foo_ | 2010/10/04 04:49 AM |
GPUs and games | Jeremiah | 2010/10/08 10:58 AM |
GPUs and games | MS | 2010/10/08 01:37 PM |
GPUs and games | Salvatore De Dominicis | 2010/10/04 01:41 AM |
GPUs and games | Kevin G | 2010/10/05 02:13 PM |
GPUs and games | mpx | 2010/10/03 11:36 AM |
GPUs and games | David Kanter | 2010/10/04 07:08 AM |
GPUs and games | Kevin G | 2010/10/04 10:38 AM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 09:19 PM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 12:06 PM |
Sandy Bridge CPU article online | rwessel | 2010/09/30 02:29 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/30 03:06 PM |
Sandy Bridge CPU article online | rwessel | 2010/09/30 06:55 PM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 03:53 AM |
Sandy Bridge CPU article online | rwessel | 2010/10/01 08:30 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 09:31 AM |
Sandy Bridge CPU article online | rwessel | 2010/10/01 10:56 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 08:28 PM |
Sandy Bridge CPU article online | Ricardo B | 2010/10/02 05:38 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/02 06:59 PM |
which bus more wasteful | Michael S | 2010/10/02 10:38 AM |
which bus more wasteful | rwessel | 2010/10/02 07:15 PM |
Sandy Bridge CPU article online | Ricardo B | 2010/10/01 10:08 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 08:31 PM |
Sandy Bridge CPU article online | Andi Kleen | 2010/10/01 11:55 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 08:32 PM |
Sandy Bridge CPU article online | kdg | 2010/10/01 11:26 AM |
Sandy Bridge CPU article online | Anon | 2010/10/01 11:33 AM |
Analog display out? | David Kanter | 2010/10/01 01:05 PM |
Analog display out? | mpx | 2010/10/02 11:46 AM |
Analog display out? | Anon | 2010/10/03 03:26 PM |
Digital is expensive! | David Kanter | 2010/10/03 06:36 PM |
Digital is expensive! | Anon | 2010/10/03 08:07 PM |
Digital is expensive! | David Kanter | 2010/10/03 10:02 PM |
Digital is expensive! | Steve Underwood | 2010/10/04 03:52 AM |
Digital is expensive! | David Kanter | 2010/10/04 07:03 AM |
Digital is expensive! | anonymous | 2010/10/04 07:11 AM |
Digital is not very expensive! | Steve Underwood | 2010/10/04 06:08 PM |
Digital is not very expensive! | Anon | 2010/10/04 08:33 PM |
Digital is not very expensive! | Steve Underwood | 2010/10/04 11:03 PM |
Digital is not very expensive! | mpx | 2010/10/05 01:10 PM |
Digital is not very expensive! | Gabriele Svelto | 2010/10/05 12:24 AM |
Digital is expensive! | jal142 | 2010/10/04 11:46 AM |
Digital is expensive! | mpx | 2010/10/04 01:04 AM |
Digital is expensive! | Gabriele Svelto | 2010/10/04 03:28 AM |
Digital is expensive! | Mark Christiansen | 2010/10/04 03:12 PM |
Analog display out? | slacker | 2010/10/03 06:44 PM |
Analog display out? | Anon | 2010/10/03 08:05 PM |
Analog display out? | Steve Underwood | 2010/10/04 03:48 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 08:37 PM |
Sandy Bridge CPU article online | slacker | 2010/10/02 02:53 PM |
Sandy Bridge CPU article online | David Hess | 2010/10/02 06:49 PM |
memory bandwith | Max | 2010/09/30 12:19 PM |
memory bandwith | Anon | 2010/10/01 11:28 AM |
memory bandwith | Jack | 2010/10/01 07:45 PM |
memory bandwith | Anon | 2010/10/03 03:19 PM |
Sandy Bridge CPU article online | PiedPiper | 2010/09/30 07:05 PM |
Sandy Bridge CPU article online | Matt Sayler | 2010/09/29 04:38 PM |
Sandy Bridge CPU article online | Jack | 2010/09/29 09:39 PM |
Sandy Bridge CPU article online | mpx | 2010/09/30 12:24 AM |
Sandy Bridge CPU article online | passer | 2010/09/30 03:15 AM |
Sandy Bridge CPU article online | mpx | 2010/09/30 03:47 AM |
Sandy Bridge CPU article online | passer | 2010/09/30 04:25 AM |
SB and web browsing | Rohit | 2010/09/30 06:47 AM |
SB and web browsing | David Hess | 2010/09/30 07:10 AM |
SB and web browsing | MS | 2010/09/30 10:21 AM |
SB and web browsing | passer | 2010/09/30 10:26 AM |
SB and web browsing | MS | 2010/10/02 06:41 PM |
SB and web browsing | Rohit | 2010/10/01 08:02 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/30 08:35 AM |
Sandy Bridge CPU article online | Jack | 2010/09/30 10:40 PM |
processor evolution | hobold | 2010/09/29 02:16 PM |
processor evolution | Foo_ | 2010/09/30 06:10 AM |
processor evolution | Jack | 2010/09/30 07:07 PM |
3D gaming as GPGPU app | hobold | 2010/10/01 04:59 AM |
3D gaming as GPGPU app | Jack | 2010/10/01 07:39 PM |
processor evolution | hobold | 2010/10/01 04:35 AM |
processor evolution | David Kanter | 2010/10/01 10:02 AM |
processor evolution | Anon | 2010/10/01 11:46 AM |
Display | David Kanter | 2010/10/01 01:26 PM |
Display | Rohit | 2010/10/02 02:56 AM |
Display | Linus Torvalds | 2010/10/02 07:40 AM |
Display | rwessel | 2010/10/02 08:58 AM |
Display | sJ | 2010/10/02 10:28 PM |
Display | rwessel | 2010/10/03 08:38 AM |
Display | Anon | 2010/10/03 03:06 PM |
Display tech and compute are different | David Kanter | 2010/10/03 06:33 PM |
Display tech and compute are different | Anon | 2010/10/03 08:16 PM |
Display tech and compute are different | David Kanter | 2010/10/03 10:00 PM |
Display tech and compute are different | hobold | 2010/10/04 01:40 AM |
Display | ? | 2010/10/03 03:02 AM |
Display | Linus Torvalds | 2010/10/03 10:18 AM |
Display | Richard Cownie | 2010/10/03 11:12 AM |
Display | Linus Torvalds | 2010/10/03 12:16 PM |
Display | slacker | 2010/10/03 07:35 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/04 07:06 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/04 11:44 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/04 02:59 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/04 03:13 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/04 08:58 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/05 01:39 AM |
current V12 engines with >6.0 displacement | MS | 2010/10/05 06:57 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/05 01:20 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/05 09:26 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 05:39 AM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 01:22 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/06 03:07 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 03:56 PM |
current V12 engines with >6.0 displacement | rwessel | 2010/10/06 03:30 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 03:53 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/07 01:32 PM |
current V12 engines with >6.0 displacement | rwessel | 2010/10/07 07:54 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 09:02 PM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | slacker | 2010/10/06 07:20 PM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | Ricardo B | 2010/10/07 01:32 AM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | slacker | 2010/10/07 08:15 AM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | Ricardo B | 2010/10/07 10:51 AM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 05:03 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 06:26 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 11:15 PM |
current V12 engines with >6.0 displacement | Howard Chu | 2010/10/07 02:16 PM |
current V12 engines with >6.0 displacement | Anon | 2010/10/05 10:31 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 05:55 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/06 06:15 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 06:34 AM |
I wonder is there any tech area that this forum doesn't have an opinion on (NT) | Rob Thorpe | 2010/10/06 10:11 AM |
Cunieform tablets | David Kanter | 2010/10/06 12:57 PM |
Cunieform tablets | Linus Torvalds | 2010/10/06 01:06 PM |
Ouch...maybe I should hire a new editor (NT) | David Kanter | 2010/10/06 04:38 PM |
Cunieform tablets | rwessel | 2010/10/06 03:41 PM |
Cunieform tablets | seni | 2010/10/07 10:56 AM |
Cunieform tablets | Howard Chu | 2010/10/07 01:44 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/06 06:10 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/06 10:44 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 07:55 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 08:51 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 07:38 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 08:33 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 09:04 PM |
Practical vehicles for commuting | Rob Thorpe | 2010/10/08 05:50 AM |
Practical vehicles for commuting | Gabriele Svelto | 2010/10/08 06:05 AM |
Practical vehicles for commuting | Rob Thorpe | 2010/10/08 06:21 AM |
Practical vehicles for commuting | j | 2010/10/08 02:20 PM |
Practical vehicles for commuting | Rob Thorpe | 2010/12/09 07:00 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/08 10:14 AM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/07 01:23 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/07 04:08 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 05:41 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 08:05 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 08:52 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/08 07:52 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 11:28 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 12:37 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/07 01:37 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/05 02:02 AM |
Display | Linus Torvalds | 2010/10/04 10:39 AM |
Display | Gabriele Svelto | 2010/10/05 12:34 AM |
Display | Richard Cownie | 2010/10/04 06:22 AM |
Display | anon | 2010/10/04 09:22 PM |
Display | Richard Cownie | 2010/10/05 06:42 AM |
Display | mpx | 2010/10/03 11:55 AM |
Display | rcf | 2010/10/03 01:12 PM |
Display | mpx | 2010/10/03 02:36 PM |
Display | rcf | 2010/10/03 05:36 PM |
Display | Ricardo B | 2010/10/04 02:50 PM |
Display | gallier2 | 2010/10/05 03:44 AM |
Display | David Hess | 2010/10/05 05:21 AM |
Display | gallier2 | 2010/10/05 08:21 AM |
Display | David Hess | 2010/10/03 11:21 PM |
Display | rcf | 2010/10/04 08:06 AM |
Display | David Kanter | 2010/10/03 01:54 PM |
Alternative integration | Paul A. Clayton | 2010/10/06 08:51 AM |
Display | slacker | 2010/10/03 07:26 PM |
Display & marketing & analogies | ? | 2010/10/04 02:33 AM |
Display & marketing & analogies | kdg | 2010/10/04 06:00 AM |
Display | Kevin G | 2010/10/02 09:49 AM |
Display | Anon | 2010/10/03 03:43 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/29 03:17 PM |
Sandy Bridge CPU article online | Jack | 2010/09/28 06:27 AM |
Sandy Bridge CPU article online | IntelUser2000 | 2010/09/28 03:07 AM |
Sandy Bridge CPU article online | mpx | 2010/09/28 12:34 PM |
Sandy Bridge CPU article online | Aaron Spink | 2010/09/28 01:28 PM |
Sandy Bridge CPU article online | JoshW | 2010/09/28 02:13 PM |
Sandy Bridge CPU article online | mpx | 2010/09/28 02:54 PM |
Sandy Bridge CPU article online | Foo_ | 2010/09/29 01:19 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 03:06 AM |
Sandy Bridge CPU article online | JS | 2010/09/29 03:42 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 04:03 AM |
Sandy Bridge CPU article online | Foo_ | 2010/09/29 05:55 AM |
Sandy Bridge CPU article online | ajensen | 2010/09/28 12:19 AM |
Sandy Bridge CPU article online | Ian Ollmann | 2010/09/28 04:52 PM |
Sandy Bridge CPU article online | a reader | 2010/09/28 05:05 PM |
Sandy Bridge CPU article online | ajensen | 2010/09/28 11:35 PM |
Updated: Sandy Bridge CPU article | David Kanter | 2010/10/01 05:11 AM |
Updated: Sandy Bridge CPU article | anon | 2011/01/07 09:55 PM |
Updated: Sandy Bridge CPU article | Eric Bron | 2011/01/08 03:29 AM |
Updated: Sandy Bridge CPU article | anon | 2011/01/11 11:24 PM |
Updated: Sandy Bridge CPU article | anon | 2011/01/15 11:21 AM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anon | 2011/01/16 11:22 PM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anonymous | 2011/01/17 02:04 AM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anon | 2011/01/17 07:12 AM |
I can try.... | David Kanter | 2011/01/18 03:54 PM |
I can try.... | anon | 2011/01/18 08:07 PM |
I can try.... | David Kanter | 2011/01/18 11:24 PM |
I can try.... | anon | 2011/01/19 07:51 AM |
Wider fetch than execute makes sense | Paul A. Clayton | 2011/01/19 08:53 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/04 07:29 AM |
Sandy Bridge CPU article online | Seni | 2011/01/04 09:07 PM |
Sandy Bridge CPU article online | hobold | 2011/01/04 11:26 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 02:01 AM |
software assist exceptions | hobold | 2011/01/05 04:36 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 01:58 AM |
Sandy Bridge CPU article online | anon | 2011/01/05 04:51 AM |
Sandy Bridge CPU article online | Seni | 2011/01/05 08:53 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 09:03 AM |
Sandy Bridge CPU article online | anon | 2011/01/05 04:14 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 04:50 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/05 05:00 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 07:26 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/05 07:50 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 08:39 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 03:50 PM |
permuting vector elements | hobold | 2011/01/05 05:03 PM |
permuting vector elements | Nicolas Capens | 2011/01/05 06:01 PM |
permuting vector elements | Nicolas Capens | 2011/01/06 08:27 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/11 11:33 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/11 01:51 PM |
Sandy Bridge CPU article online | hobold | 2011/01/11 02:11 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/11 06:07 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/12 03:25 AM |
Sandy Bridge CPU article online | hobold | 2011/01/12 05:03 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/12 11:27 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/13 02:38 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/13 03:32 AM |
Sandy Bridge CPU article online | hobold | 2011/01/13 01:53 PM |
What happened to VPERMIL2PS? | Michael S | 2011/01/13 03:46 AM |
What happened to VPERMIL2PS? | Eric Bron | 2011/01/13 06:46 AM |
Lower cost permute | Paul A. Clayton | 2011/01/13 12:11 PM |
Sandy Bridge CPU article online | anon | 2011/01/25 06:31 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/12 06:34 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/13 07:38 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/15 09:47 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/16 03:13 AM |
And just to make a further example | Gabriele Svelto | 2011/01/16 04:24 AM |
Sandy Bridge CPU article online | mpx | 2011/01/16 01:27 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/25 02:56 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/25 04:11 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/26 08:49 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/26 04:35 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/27 02:51 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/27 02:40 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/28 03:24 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/28 03:49 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/30 02:11 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/31 03:43 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/01 04:02 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/01 04:28 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/01 04:43 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/28 07:14 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/01 02:58 AM |
Sandy Bridge CPU article online | EduardoS | 2011/02/01 02:36 PM |
Sandy Bridge CPU article online | anon | 2011/02/01 04:56 PM |
Sandy Bridge CPU article online | EduardoS | 2011/02/01 09:17 PM |
Sandy Bridge CPU article online | anon | 2011/02/01 10:13 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/02 04:08 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/02 04:26 AM |
Sandy Bridge CPU article online | kalmaegi | 2011/02/01 09:29 AM |
SW Rasterization | David Kanter | 2011/01/27 05:18 PM |
Lower pin count memory | iz | 2011/01/27 09:19 PM |
Lower pin count memory | David Kanter | 2011/01/27 09:25 PM |
Lower pin count memory | iz | 2011/01/27 11:31 PM |
Lower pin count memory | David Kanter | 2011/01/27 11:52 PM |
Lower pin count memory | iz | 2011/01/28 12:28 AM |
Lower pin count memory | David Kanter | 2011/01/28 01:05 AM |
Lower pin count memory | iz | 2011/01/28 03:55 AM |
Lower pin count memory | David Hess | 2011/01/28 01:15 PM |
Lower pin count memory | David Kanter | 2011/01/28 01:57 PM |
Lower pin count memory | iz | 2011/01/28 05:20 PM |
Two years later | ForgotPants | 2013/10/26 11:33 AM |
Two years later | anon | 2013/10/26 11:36 AM |
Two years later | Exophase | 2013/10/26 12:56 PM |
Two years later | David Hess | 2013/10/26 05:05 PM |
Herz is totally the thing you DON*T care. | Jouni Osmala | 2013/10/27 01:48 AM |
Herz is totally the thing you DON*T care. | EduardoS | 2013/10/27 07:00 AM |
Herz is totally the thing you DON*T care. | Michael S | 2013/10/27 07:45 AM |
Two years later | someone | 2013/10/28 07:21 AM |
Lower pin count memory | Martin Høyer Kristiansen | 2011/01/28 01:41 AM |
Lower pin count memory | iz | 2011/01/28 03:07 AM |
Lower pin count memory | Darrell Coker | 2011/01/27 10:39 PM |
Lower pin count memory | iz | 2011/01/28 12:20 AM |
Lower pin count memory | Darrell Coker | 2011/01/28 06:07 PM |
Lower pin count memory | iz | 2011/01/28 11:57 PM |
Lower pin count memory | Darrell Coker | 2011/01/29 02:21 AM |
Lower pin count memory | iz | 2011/01/31 10:28 PM |
SW Rasterization | Nicolas Capens | 2011/02/02 08:48 AM |
SW Rasterization | Eric Bron | 2011/02/02 09:37 AM |
SW Rasterization | Nicolas Capens | 2011/02/02 04:35 PM |
SW Rasterization | Eric Bron | 2011/02/02 05:11 PM |
SW Rasterization | Eric Bron | 2011/02/03 02:13 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 07:57 AM |
SW Rasterization | Eric Bron | 2011/02/04 08:50 AM |
erratum | Eric Bron | 2011/02/04 08:58 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 05:25 PM |
SW Rasterization | David Kanter | 2011/02/04 05:33 PM |
SW Rasterization | anon | 2011/02/04 06:04 PM |
SW Rasterization | Nicolas Capens | 2011/02/05 03:39 PM |
SW Rasterization | David Kanter | 2011/02/05 05:07 PM |
SW Rasterization | Nicolas Capens | 2011/02/05 11:39 PM |
SW Rasterization | Eric Bron | 2011/02/04 10:55 AM |
Comments pt 1 | David Kanter | 2011/02/02 01:08 PM |
Comments pt 1 | Eric Bron | 2011/02/02 03:16 PM |
Comments pt 1 | Gabriele Svelto | 2011/02/03 01:37 AM |
Comments pt 1 | Eric Bron | 2011/02/03 02:36 AM |
Comments pt 1 | Nicolas Capens | 2011/02/03 11:08 PM |
Comments pt 1 | Nicolas Capens | 2011/02/03 10:26 PM |
Comments pt 1 | Eric Bron | 2011/02/04 03:33 AM |
Comments pt 1 | Nicolas Capens | 2011/02/04 05:24 AM |
example code | Eric Bron | 2011/02/04 04:51 AM |
example code | Nicolas Capens | 2011/02/04 08:24 AM |
example code | Eric Bron | 2011/02/04 08:36 AM |
example code | Nicolas Capens | 2011/02/05 11:43 PM |
Comments pt 1 | Rohit | 2011/02/04 12:43 PM |
Comments pt 1 | Nicolas Capens | 2011/02/04 05:05 PM |
Comments pt 1 | David Kanter | 2011/02/04 05:36 PM |
Comments pt 1 | Nicolas Capens | 2011/02/05 02:45 PM |
Comments pt 1 | Eric Bron | 2011/02/05 04:13 PM |
Comments pt 1 | Nicolas Capens | 2011/02/05 11:52 PM |
Comments pt 1 | Eric Bron | 2011/02/06 01:31 AM |
Comments pt 1 | Nicolas Capens | 2011/02/06 04:06 PM |
Comments pt 1 | Eric Bron | 2011/02/07 03:12 AM |
The need for gather/scatter support | Nicolas Capens | 2011/02/10 10:07 AM |
The need for gather/scatter support | Eric Bron | 2011/02/11 03:11 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/13 03:39 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 07:46 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 07:48 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 09:32 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 10:07 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 09:00 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 07:49 AM |
Gather/scatter performance data | Eric Bron | 2011/02/15 02:23 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 05:06 PM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 07:52 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 09:43 AM |
SW Rasterization - a long way off | Rohit | 2011/02/02 01:17 PM |
SW Rasterization - a long way off | Nicolas Capens | 2011/02/04 03:59 AM |
CPU only rendering - a long way off | Rohit | 2011/02/04 11:52 AM |
CPU only rendering - a long way off | Nicolas Capens | 2011/02/04 07:15 PM |
CPU only rendering - a long way off | Rohit | 2011/02/05 02:00 AM |
CPU only rendering - a long way off | Nicolas Capens | 2011/02/05 09:45 PM |
CPU only rendering - a long way off | David Kanter | 2011/02/06 09:51 PM |
CPU only rendering - a long way off | Gian-Carlo Pascutto | 2011/02/07 12:22 AM |
Encryption | David Kanter | 2011/02/07 01:18 AM |
Encryption | Nicolas Capens | 2011/02/07 07:51 AM |
Encryption | David Kanter | 2011/02/07 11:50 AM |
Encryption | Nicolas Capens | 2011/02/08 10:26 AM |
CPUs are latency optimized | David Kanter | 2011/02/08 11:38 AM |
efficient compiler on an efficient GPU real today. | sJ | 2011/02/08 11:29 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/09 09:49 PM |
CPUs are latency optimized | Eric Bron | 2011/02/10 12:49 AM |
CPUs are latency optimized | Antti-Ville Tuunainen | 2011/02/10 06:16 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/10 07:04 AM |
CPUs are latency optimized | Eric Bron | 2011/02/10 07:48 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/10 01:31 PM |
CPUs are latency optimized | Eric Bron | 2011/02/11 02:43 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/11 07:31 AM |
CPUs are latency optimized | EduardoS | 2011/02/10 05:29 PM |
CPUs are latency optimized | Anon | 2011/02/10 06:40 PM |
CPUs are latency optimized | David Kanter | 2011/02/10 08:33 PM |
CPUs are latency optimized | EduardoS | 2011/02/11 02:18 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/11 05:56 AM |
CPUs are latency optimized | Rohit | 2011/02/11 07:33 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/14 02:19 AM |
CPUs are latency optimized | Eric Bron | 2011/02/14 03:23 AM |
CPUs are latency optimized | EduardoS | 2011/02/14 01:11 PM |
CPUs are latency optimized | David Kanter | 2011/02/11 02:45 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/15 05:22 AM |
CPUs are latency optimized | David Kanter | 2011/02/15 12:47 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/15 07:10 PM |
Have fun | David Kanter | 2011/02/15 10:04 PM |
Have fun | Nicolas Capens | 2011/02/17 03:59 AM |
Have fun | Brett | 2011/02/17 12:56 PM |
Have fun | Nicolas Capens | 2011/02/19 04:53 PM |
Have fun | Brett | 2011/02/20 06:08 PM |
Have fun | Brett | 2011/02/20 07:13 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/02/23 05:37 PM |
On-die storage to fight Amdahl | Brett | 2011/02/23 09:59 PM |
On-die storage to fight Amdahl | Brett | 2011/02/23 10:08 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/02/24 07:42 PM |
On-die storage to fight Amdahl | Rohit | 2011/02/25 11:02 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/03/09 06:53 PM |
On-die storage to fight Amdahl | Rohit | 2011/03/10 08:02 AM |
NVIDIA using tile based rendering? | Nathan Monson | 2011/03/11 07:58 PM |
NVIDIA using tile based rendering? | Rohit | 2011/03/12 04:29 AM |
NVIDIA using tile based rendering? | Nathan Monson | 2011/03/12 11:05 AM |
NVIDIA using tile based rendering? | Rohit | 2011/03/12 11:16 AM |
On-die storage to fight Amdahl | Brett | 2011/02/26 02:10 AM |
On-die storage to fight Amdahl | Nathan Monson | 2011/02/26 01:51 PM |
On-die storage to fight Amdahl | Brett | 2011/02/26 04:40 PM |
Convergence is inevitable | Nicolas Capens | 2011/03/09 08:22 PM |
Convergence is inevitable | Brett | 2011/03/09 10:59 PM |
Convergence is inevitable | Antti-Ville Tuunainen | 2011/03/10 03:34 PM |
Convergence is inevitable | Brett | 2011/03/10 09:39 PM |
Procedural texturing? | David Kanter | 2011/03/11 01:32 AM |
Procedural texturing? | hobold | 2011/03/11 03:59 AM |
Procedural texturing? | Dan Downs | 2011/03/11 09:28 AM |
Procedural texturing? | Mark Roulo | 2011/03/11 02:58 PM |
Procedural texturing? | Anon | 2011/03/11 06:11 PM |
Procedural texturing? | Nathan Monson | 2011/03/11 07:30 PM |
Procedural texturing? | Brett | 2011/03/15 07:45 AM |
Procedural texturing? | Seni | 2011/03/15 10:13 AM |
Procedural texturing? | Brett | 2011/03/15 11:45 AM |
Procedural texturing? | Seni | 2011/03/15 02:09 PM |
Procedural texturing? | Brett | 2011/03/11 10:02 PM |
Procedural texturing? | Brett | 2011/03/11 09:34 PM |
Procedural texturing? | Eric Bron | 2011/03/12 03:37 AM |
Convergence is inevitable | Jouni Osmala | 2011/03/09 11:28 PM |
Convergence is inevitable | Brett | 2011/04/05 05:08 PM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 05:23 AM |
Convergence is inevitable | none | 2011/04/07 07:03 AM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 10:34 AM |
Convergence is inevitable | anon | 2011/04/07 02:15 PM |
Convergence is inevitable | none | 2011/04/08 01:57 AM |
Convergence is inevitable | Brett | 2011/04/07 08:04 PM |
Convergence is inevitable | none | 2011/04/08 02:14 AM |
Gather implementation | David Kanter | 2011/04/08 12:01 PM |
RAM Latency | David Hess | 2011/04/07 08:22 AM |
RAM Latency | Brett | 2011/04/07 07:20 PM |
RAM Latency | Nicolas Capens | 2011/04/07 10:18 PM |
RAM Latency | Brett | 2011/04/08 05:33 AM |
RAM Latency | Nicolas Capens | 2011/04/10 02:23 PM |
RAM Latency | Rohit | 2011/04/08 06:57 AM |
RAM Latency | Nicolas Capens | 2011/04/10 01:23 PM |
RAM Latency | David Kanter | 2011/04/10 02:27 PM |
RAM Latency | Rohit | 2011/04/11 06:17 AM |
Convergence is inevitable | Eric Bron | 2011/04/07 09:46 AM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 09:50 PM |
Convergence is inevitable | Eric Bron | 2011/04/08 12:39 AM |
Flaws in PowerVR | Rohit | 2011/02/25 11:21 PM |
Flaws in PowerVR | Brett | 2011/02/26 12:37 AM |
Flaws in PowerVR | Paul | 2011/02/26 05:17 AM |
Have fun | David Kanter | 2011/02/18 12:52 PM |
Have fun | Michael S | 2011/02/19 12:12 PM |
Have fun | David Kanter | 2011/02/19 03:26 PM |
Have fun | Michael S | 2011/02/19 04:43 PM |
Have fun | anon | 2011/02/19 05:02 PM |
Have fun | Michael S | 2011/02/19 05:56 PM |
Have fun | anon | 2011/02/20 03:50 PM |
Have fun | EduardoS | 2011/02/20 02:44 PM |
Linear vs non-linear | EduardoS | 2011/02/20 02:55 PM |
Have fun | Michael S | 2011/02/20 04:19 PM |
Have fun | EduardoS | 2011/02/20 05:51 PM |
Have fun | Nicolas Capens | 2011/02/21 11:12 AM |
Have fun | Michael S | 2011/02/21 12:38 PM |
Have fun | Eric Bron | 2011/02/21 02:10 PM |
Have fun | Eric Bron | 2011/02/21 02:39 PM |
Have fun | Michael S | 2011/02/21 06:13 PM |
Have fun | Eric Bron | 2011/02/22 12:43 AM |
Have fun | Michael S | 2011/02/22 01:47 AM |
Have fun | Eric Bron | 2011/02/22 02:10 AM |
Have fun | Michael S | 2011/02/22 11:37 AM |
Have fun | anon | 2011/02/22 01:38 PM |
Have fun | EduardoS | 2011/02/22 03:49 PM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/23 06:37 PM |
Gather/scatter efficiency | anonymous | 2011/02/23 06:51 PM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/24 06:57 PM |
Gather/scatter efficiency | anonymous | 2011/02/24 07:16 PM |
Gather/scatter efficiency | Michael S | 2011/02/25 07:45 AM |
Gather implementation | David Kanter | 2011/02/25 05:34 PM |
Gather implementation | Michael S | 2011/02/26 10:40 AM |
Gather implementation | anon | 2011/02/26 11:52 AM |
Gather implementation | Michael S | 2011/02/26 12:16 PM |
Gather implementation | anon | 2011/02/26 11:22 PM |
Gather implementation | Michael S | 2011/02/27 07:23 AM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/28 03:14 PM |
Consider yourself ignored | David Kanter | 2011/02/22 01:05 AM |
one more anti-FMA flame. By me. | Michael S | 2011/02/16 07:40 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/16 08:30 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/16 09:15 AM |
one more anti-FMA flame. By me. | Nicolas Capens | 2011/02/17 06:27 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/17 07:42 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/17 05:46 PM |
Tarantula paper | Paul A. Clayton | 2011/02/18 12:38 AM |
Tarantula paper | Nicolas Capens | 2011/02/19 05:19 PM |
anti-FMA != anti-throughput or anti-SG | Eric Bron | 2011/02/18 01:48 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/20 03:46 PM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/20 05:00 PM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 04:05 AM |
Software pipelining on x86 | David Kanter | 2011/02/23 05:04 AM |
Software pipelining on x86 | JS | 2011/02/23 05:25 AM |
Software pipelining on x86 | Salvatore De Dominicis | 2011/02/23 08:37 AM |
Software pipelining on x86 | Jouni Osmala | 2011/02/23 09:10 AM |
Software pipelining on x86 | LeeMiller | 2011/02/23 10:07 PM |
Software pipelining on x86 | Nicolas Capens | 2011/02/24 03:17 PM |
Software pipelining on x86 | anonymous | 2011/02/24 07:04 PM |
Software pipelining on x86 | Nicolas Capens | 2011/02/28 09:27 AM |
Software pipelining on x86 | Antti-Ville Tuunainen | 2011/03/02 04:31 AM |
Software pipelining on x86 | Megol | 2011/03/02 12:55 PM |
Software pipelining on x86 | Geert Bosch | 2011/03/03 07:58 AM |
FMA benefits and latency predictions | David Kanter | 2011/02/25 05:14 PM |
FMA benefits and latency predictions | Antti-Ville Tuunainen | 2011/02/26 10:43 AM |
FMA benefits and latency predictions | Matt Waldhauer | 2011/02/27 06:42 AM |
FMA benefits and latency predictions | Nicolas Capens | 2011/03/09 06:11 PM |
FMA benefits and latency predictions | Rohit | 2011/03/10 08:11 AM |
FMA benefits and latency predictions | Eric Bron | 2011/03/10 09:30 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/23 05:19 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 07:50 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/23 10:37 AM |
FMA and beyond | Nicolas Capens | 2011/02/24 04:47 PM |
detour on terminology | hobold | 2011/02/24 07:08 PM |
detour on terminology | Nicolas Capens | 2011/02/28 02:24 PM |
detour on terminology | Eric Bron | 2011/03/01 02:38 AM |
detour on terminology | Michael S | 2011/03/01 05:03 AM |
detour on terminology | Eric Bron | 2011/03/01 05:39 AM |
detour on terminology | Michael S | 2011/03/01 08:33 AM |
detour on terminology | Eric Bron | 2011/03/01 09:34 AM |
erratum | Eric Bron | 2011/03/01 09:54 AM |
detour on terminology | Nicolas Capens | 2011/03/10 08:39 AM |
detour on terminology | Eric Bron | 2011/03/10 09:50 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 06:12 AM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 11:25 PM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/17 06:51 PM |
Tarantula vector unit well-integrated | Paul A. Clayton | 2011/02/18 12:38 AM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/19 02:17 PM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 02:09 AM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/20 09:55 AM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 01:39 PM |
anti-FMA != anti-throughput or anti-SG | EduardoS | 2011/02/20 02:35 PM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/21 08:12 AM |
anti-FMA != anti-throughput or anti-SG | anon | 2011/02/17 10:44 PM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/18 06:20 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/17 08:24 AM |
thanks | Michael S | 2011/02/17 04:56 PM |
CPUs are latency optimized | EduardoS | 2011/02/15 01:24 PM |
SwiftShader SNB test | Eric Bron | 2011/02/15 03:46 PM |
SwiftShader NHM test | Eric Bron | 2011/02/15 04:50 PM |
SwiftShader SNB test | Nicolas Capens | 2011/02/17 12:06 AM |
SwiftShader SNB test | Eric Bron | 2011/02/17 01:21 AM |
SwiftShader SNB test | Eric Bron | 2011/02/22 10:32 AM |
SwiftShader SNB test 2nd run | Eric Bron | 2011/02/22 10:51 AM |
SwiftShader SNB test 2nd run | Nicolas Capens | 2011/02/23 02:14 PM |
SwiftShader SNB test 2nd run | Eric Bron | 2011/02/23 02:42 PM |
Win7SP1 out but no AVX hype? | Michael S | 2011/02/24 03:14 AM |
Win7SP1 out but no AVX hype? | Eric Bron | 2011/02/24 03:39 AM |
CPUs are latency optimized | Eric Bron | 2011/02/15 08:02 AM |
CPUs are latency optimized | EduardoS | 2011/02/11 03:40 PM |
CPU only rendering - not a long way off | Nicolas Capens | 2011/02/07 06:45 AM |
CPU only rendering - not a long way off | David Kanter | 2011/02/07 12:09 PM |
CPU only rendering - not a long way off | anonymous | 2011/02/07 10:25 PM |
Sandy Bridge IGP EUs | David Kanter | 2011/02/07 11:22 PM |
Sandy Bridge IGP EUs | Hannes | 2011/02/08 05:59 AM |
SW Rasterization - Why? | Seni | 2011/02/02 02:53 PM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/10 03:12 PM |
Market reasons to ditch the IGP | Seni | 2011/02/11 05:42 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/16 04:29 AM |
Market reasons to ditch the IGP | Seni | 2011/02/16 01:39 PM |
An excellent post! | David Kanter | 2011/02/16 03:18 PM |
CPUs clock higher | Moritz | 2011/02/17 08:06 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/18 06:22 PM |
Market reasons to ditch the IGP | IntelUser2000 | 2011/02/18 07:20 PM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/21 02:42 PM |
Bad data (repeated) | David Kanter | 2011/02/22 12:21 AM |
Bad data (repeated) | none | 2011/02/22 03:04 AM |
13W or 8W? | Foo_ | 2011/02/22 06:00 AM |
13W or 8W? | Linus Torvalds | 2011/02/22 08:58 AM |
13W or 8W? | David Kanter | 2011/02/22 11:33 AM |
13W or 8W? | Mark Christiansen | 2011/02/22 02:47 PM |
Bigger picture | Nicolas Capens | 2011/02/24 06:33 PM |
Bigger picture | Nicolas Capens | 2011/02/24 08:06 PM |
20+ Watt | Nicolas Capens | 2011/02/24 08:18 PM |
<20W | David Kanter | 2011/02/25 01:13 PM |
>20W | Nicolas Capens | 2011/03/08 07:34 PM |
IGP is 3X more efficient | David Kanter | 2011/03/08 10:53 PM |
IGP is 3X more efficient | Eric Bron | 2011/03/09 02:44 AM |
>20W | Eric Bron | 2011/03/09 03:48 AM |
Specious data and claims are still specious | David Kanter | 2011/02/25 02:38 AM |
IGP power consumption, LRB samplers | Nicolas Capens | 2011/03/08 06:24 PM |
IGP power consumption, LRB samplers | EduardoS | 2011/03/08 06:52 PM |
IGP power consumption, LRB samplers | Rohit | 2011/03/09 07:42 AM |
Market reasons to ditch the IGP | none | 2011/02/22 02:58 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/24 06:43 PM |
Market reasons to ditch the IGP | slacker | 2011/02/22 02:32 PM |
Market reasons to ditch the IGP | Seni | 2011/02/18 09:51 PM |
Correction - 28 comparators, not 36. (NT) | Seni | 2011/02/18 10:03 PM |
Market reasons to ditch the IGP | Gabriele Svelto | 2011/02/19 01:49 AM |
Market reasons to ditch the IGP | Seni | 2011/02/19 11:59 AM |
Market reasons to ditch the IGP | Exophase | 2011/02/20 10:43 AM |
Market reasons to ditch the IGP | EduardoS | 2011/02/19 10:13 AM |
Market reasons to ditch the IGP | Seni | 2011/02/19 11:46 AM |
The next revolution | Nicolas Capens | 2011/02/22 03:33 AM |
The next revolution | Gabriele Svelto | 2011/02/22 09:15 AM |
The next revolution | Eric Bron | 2011/02/22 09:48 AM |
The next revolution | Nicolas Capens | 2011/02/23 07:39 PM |
The next revolution | Gabriele Svelto | 2011/02/24 12:43 AM |
GPGPU content creation (or lack of it) | Nicolas Capens | 2011/02/28 07:39 AM |
GPGPU content creation (or lack of it) | The market begs to differ | 2011/03/01 06:32 AM |
GPGPU content creation (or lack of it) | Nicolas Capens | 2011/03/09 09:14 PM |
GPGPU content creation (or lack of it) | Gabriele Svelto | 2011/03/10 01:01 AM |
The market begs to differ | Gabriele Svelto | 2011/03/01 06:33 AM |
The next revolution | Anon | 2011/02/24 02:15 AM |
The next revolution | Nicolas Capens | 2011/02/28 02:34 PM |
The next revolution | Seni | 2011/02/22 02:02 PM |
The next revolution | Gabriele Svelto | 2011/02/23 06:27 AM |
The next revolution | Seni | 2011/02/23 09:03 AM |
The next revolution | Nicolas Capens | 2011/02/24 06:11 AM |
The next revolution | Seni | 2011/02/24 08:45 PM |
IGP sampler count | Nicolas Capens | 2011/03/03 05:19 AM |
Latency and throughput optimized cores | Nicolas Capens | 2011/03/07 03:28 PM |
The real reason no IGP /CPU converge. | Jouni Osmala | 2011/03/07 11:34 PM |
Still converging | Nicolas Capens | 2011/03/13 03:08 PM |
Homogeneous CPU advantages | Nicolas Capens | 2011/03/08 12:12 AM |
Homogeneous CPU advantages | Seni | 2011/03/08 09:23 AM |
Homogeneous CPU advantages | David Kanter | 2011/03/08 11:16 AM |
Homogeneous CPU advantages | Brett | 2011/03/09 03:37 AM |
Homogeneous CPU advantages | Jouni Osmala | 2011/03/09 12:27 AM |
SW Rasterization | firsttimeposter | 2011/02/03 11:18 PM |
SW Rasterization | Nicolas Capens | 2011/02/04 04:48 AM |
SW Rasterization | Eric Bron | 2011/02/04 05:14 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 08:36 AM |
SW Rasterization | Eric Bron | 2011/02/04 08:42 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/26 03:23 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/02/04 04:31 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/05 08:46 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/02/06 06:20 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/06 06:07 PM |
Sandy Bridge CPU article online | arch.comp | 2011/01/06 10:58 PM |
Sandy Bridge CPU article online | Seni | 2011/01/07 10:25 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 04:28 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 06:06 AM |
permuting vector elements (yet again) | hobold | 2011/01/05 05:15 PM |
permuting vector elements (yet again) | Nicolas Capens | 2011/01/06 06:11 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/05 12:46 PM |
wow ...! | hobold | 2011/01/05 05:19 PM |
wow ...! | Nicolas Capens | 2011/01/05 06:11 PM |
wow ...! | Eric Bron | 2011/01/05 10:46 PM |
compress LUT | Eric Bron | 2011/01/05 11:05 PM |
wow ...! | Michael S | 2011/01/06 02:25 AM |
wow ...! | Nicolas Capens | 2011/01/06 06:26 AM |
wow ...! | Eric Bron | 2011/01/06 09:08 AM |
wow ...! | Nicolas Capens | 2011/01/07 07:19 AM |
wow ...! | Steve Underwood | 2011/01/07 10:53 PM |
saturation | hobold | 2011/01/08 10:25 AM |
saturation | Steve Underwood | 2011/01/08 12:38 PM |
saturation | Michael S | 2011/01/08 01:05 PM |
128 bit floats | Brett | 2011/01/08 01:39 PM |
128 bit floats | Michael S | 2011/01/08 02:10 PM |
128 bit floats | Anil Maliyekkel | 2011/01/08 03:46 PM |
128 bit floats | Kevin G | 2011/02/27 11:15 AM |
128 bit floats | hobold | 2011/02/27 04:42 PM |
128 bit floats | Ian Ollmann | 2011/02/28 04:56 PM |
OpenCL FP accuracy | hobold | 2011/03/01 06:45 AM |
OpenCL FP accuracy | anon | 2011/03/01 08:03 PM |
OpenCL FP accuracy | hobold | 2011/03/02 03:53 AM |
OpenCL FP accuracy | Eric Bron | 2011/03/02 07:10 AM |
pet project | hobold | 2011/03/02 09:22 AM |
pet project | Anon | 2011/03/02 09:10 PM |
pet project | hobold | 2011/03/03 04:57 AM |
pet project | Eric Bron | 2011/03/03 02:29 AM |
pet project | hobold | 2011/03/03 05:14 AM |
pet project | Eric Bron | 2011/03/03 03:10 PM |
pet project | hobold | 2011/03/03 04:04 PM |
OpenCL and AMD | Vincent Diepeveen | 2011/03/07 01:44 PM |
OpenCL and AMD | Eric Bron | 2011/03/08 02:05 AM |
OpenCL and AMD | Vincent Diepeveen | 2011/03/08 08:27 AM |
128 bit floats | Michael S | 2011/02/27 04:46 PM |
128 bit floats | Anil Maliyekkel | 2011/02/27 06:14 PM |
saturation | Steve Underwood | 2011/01/17 04:42 AM |
wow ...! | hobold | 2011/01/06 05:05 PM |
Ring | Moritz | 2011/01/20 10:51 PM |
Ring | Antti-Ville Tuunainen | 2011/01/21 12:25 PM |
Ring | Moritz | 2011/01/23 01:38 AM |
Ring | Michael S | 2011/01/23 04:04 AM |
So fast | Moritz | 2011/01/23 07:57 AM |
So fast | David Kanter | 2011/01/23 10:05 AM |
Sandy Bridge CPU (L1D cache) | Gordon Ward | 2011/09/09 02:47 AM |
Sandy Bridge CPU (L1D cache) | David Kanter | 2011/09/09 04:19 PM |
Sandy Bridge CPU (L1D cache) | EduardoS | 2011/09/09 08:53 PM |
Sandy Bridge CPU (L1D cache) | Paul A. Clayton | 2011/09/10 05:12 AM |
Sandy Bridge CPU (L1D cache) | Michael S | 2011/09/10 09:41 AM |
Sandy Bridge CPU (L1D cache) | EduardoS | 2011/09/10 11:17 AM |
Address Ports on Sandy Bridge Scheduler | Victor | 2011/10/16 06:40 AM |
Address Ports on Sandy Bridge Scheduler | EduardoS | 2011/10/16 07:45 PM |
Address Ports on Sandy Bridge Scheduler | Megol | 2011/10/17 09:20 AM |
Address Ports on Sandy Bridge Scheduler | Victor | 2011/10/18 05:34 PM |
Benefits of early scheduling | Paul A. Clayton | 2011/10/18 06:53 PM |
Benefits of early scheduling | Victor | 2011/10/19 05:58 PM |
Consistency and invalidation ordering | Paul A. Clayton | 2011/10/20 04:43 AM |
Address Ports on Sandy Bridge Scheduler | John Upcroft | 2011/10/21 04:16 PM |
Address Ports on Sandy Bridge Scheduler | David Kanter | 2011/10/22 10:49 AM |
Address Ports on Sandy Bridge Scheduler | John Upcroft | 2011/10/26 01:24 PM |
Store TLB look-up at commit? | Paul A. Clayton | 2011/10/26 08:30 PM |
Store TLB look-up at commit? | Richard Scott | 2011/10/26 09:40 PM |
Just a guess | Paul A. Clayton | 2011/10/27 01:54 PM |