By: Nicolas Capens (nicolas.capens.delete@this.gmail.com), February 2, 2011 7:48 am
Room: Moderated Discussions
Hi David,
David Kanter (dkanter@realworldtech.com) on 1/27/11 wrote:
---------------------------
>>Doom 3 on a GTX 460 at 2560x1600 4xAA runs at 53 FPS at >Ultra High Detail, and
>>at 56 FPS at High Detail. I was being generous when I said >10%.
>
>That shows nothing about compression, that merely tells about the change in performance
>due to larger textures. It's also largely about an older game that isn't designed for 2560x1600.
The change in performance is the whole point.
Doom 3's uncompressed textures are equal in dimensions to the compressed ones.
>What I'd want to see is for a number of MODERN games:
>
>1. Texture size (uncompressed vs. compressed)
>2. Bandwidth usage (uncompressed vs. compressed)
Texture size is irrelevant. They can be kept in compressed form when unused. Only a fraction of all the texture levels is needed during a frame.
And I've tested 3DMark06 with SwiftShader while forcing the mipmap LOD down one level (which is equivalent to reducing the texture bandwidth by a factor 4), and the SM3.0 score went from 250 to 249. Yes, due to some statistical variance the score was actually lower. If texture bandwidth was of great significance, you'd expect a much higher score.
>You're claiming that for #2 the difference is 10% and I don't see any real evidence
>of that. Compression should be vastly more effective.
Texture compression rates of 1:2 and 1:4 are very common, but that doesn't translate into big performance improvements. Most of the time there's sufficient bandwidth headroom to allow uncompressed textures without an impact on performance. And even in bandwidth limited situations, there's already a large chunk of it used by color, z and geometry. So the performance won't drop by much.
>>To clarify this result somewhat, here's the bandwidth usage of Unreal Tournament 2004 at 1024x768, using compressed textures: http://personales.ac.upc.edu/vmoya/img/bw.png
>>
>>Although bandwidth usage differs a lot between games, we can observe a few things
>>which reduce the impact of texture compression. First of all, games aren't memory
>>limited all the time. The memory bus is overdimensioned to prevent that from happening
>>too frequently. It means that even if the bandwidth requirement would double, the
>>performance would not reduce in half. Also note how color and Z each take a very
>>significant amount of bandwidth. And the balance tips over >even further with deferred rendering.
>
>I happen to know the author of that study in question. The data is INCREDIBLY
>OLD. It's from a simulator that did not use any Z or color compression, so the results cannot be taken seriously.
Yes, it's from a simulator called ATTILA. And for the record it did use z compression to generate that graph.
And even though it's old, this data is still very relevant. UT2004 has a very high TEX:ALU ratio, meaning that contemporary games are far less likely to become bandwidth limited. UT2004 also has simple geometry and no pre-z pass, again meaning that texture bandwidth has only become less relevant with newer games.
>>Note that in the case of Doom 3, the main reason for not disabling texture compression
>>was VRAM size. When textures had to be swapped in and out over the AGP or PCIe bus
>>the performance hit was huge, but it wasn't a VRAM >bandwidth issue.
>
>>For a GPU the texture decompression logic is worth it >mainly to keep VRAM size
>>smaller (read: cheaper). The bandwidth savings are a >welcome bonus but I doubt
>it would be a justification by >itself.
>
>I think you underestimate the cost of adding pins to your memory controller.
I'm not suggesting adding extra pins to make software rendering viable. It's already viable bandwidth-wise.
Multi-core is driving the bandwidth needs up for all applications, so I'm confident that it will be increased anyway in due time. But there's no need for additional dedicated hardware. The caches do an excellent job at reducing the overall bandwidth needs.
>>>>That said, it's not at all impossible to add texture >decompression as a CPU instruction...
>>>
>>>Won't happen.
>>
>>That was my conclusion as well. They would never add instructions which could become
>>of little or no use in several years.
>>
>>I merely mentioned it to note that there's nothing a GPU can do, which a CPU could
>>not possibly do. Also, there are plenty of other alternatives, should it be necessary.
>>And again, it doesn't have to match the efficiency of >dedicated hardware to be viable.
>
>Actually for the mobile space it does have to come close to dedicated hardware. Battery life matters, a lot.
For laptops we see the graphics solution range from IGPs to lower clocked high-end desktop GPUs. So while battery life is important, it doesn't mean the TDP has to be the lowest possible. It just has to be acceptable. A cheaper IGP which consumes more power is likely to sell better than a more expensive efficient one. Also note that today's GPUs have far more features than the average consumer will really use, meaning they are less energy efficient than they could have been. But the TDP is still acceptable for a good battery life.
Furthermore, nobody expects a long battery life during intense gaming. Even with dedicated graphics hardware the power consumption during gaming is relatively high. So instead of a multi-core CPU with an IGP you might as well have a CPU with a couple more cores. As long as the TDP is the same, it's acceptable.
>>Most likely the bandwidth will just steadily keep increasing, helping all (high
>>bandwidth) applications equally. DDR3 is standard now accross the entire market,
>>and it's evolving toward higher frequencies and lower voltage. Next up is DDR4,
>>and if necessary the number of memory lanes can be >increased.
>
>More memory lanes substantially increases cost, which is something that everyone wants to avoid.
They'll only avoid it till it's the cheapest solution. Just like dual-channel and DDR3 became standard after some time, things are still evolving toward higher bandwidth technologies. Five years ago the bandwidth achieved by today's budget CPUs was unthinkable. So frankly I don't care how they'll do it in the future, but CPUs reaching 100's of GB/s of bandwidth will sooner or later be perfectly normal.
>>And again CPU technology is not at a standstill. With >T-RAM just around the corner
>>we're looking at 20+ MB of cache for mainstream CPUs in >the not too distant future.
>
>T-RAM is not just around the corner.
This news item suggests otherwise: http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
But even if it does take longer, it doesn't really matter to the long-term viability of software rendering. There will be a breakthrough at some point and it will advance the convergence by making dedicated decompression hardware totally unnecessary (if it even has any relevance left today).
>>And while the expectations for 'adequate' graphics go up as well, it's only a slow
>>moving target. First we saw the appearance of IGPs as an adequate solution for a
>>large portion of the market, and now things are evolving >in favor of software rendering.
>
>I think if you look at the improvement in IGPs, that's a very FAST improving target.
The hardware isn't the target. Consumer expectation is the target. Sandy Bridge leaves a hole in the market for consumers who want a powerful CPU but are content with minimal graphics support.
>>>>SwiftShader runs 30% faster on a Sandy Bridge chip with >only 55% of the bandwidth.
>>>>There is no clearer proof that software rendering isn't >bandwidth limited.
>>>
>>>That's only meaningful for swiftshader, not all SW rendering. How does it work
>>>with MSAA and anisotropic filtering and other more advanced techniques?
>>
>>I don't see why it would only be "meaningful" for SwiftShader. The public demo
>>does not take advantage of Sandy Bridge's new features. So >I expect other software
>>renderer's to benefit roughly the same.
>
>Swiftshader is swiftshader - other SW rendering systems work differently. They may (or may not) see similar benefits.
Again, why does that make SwiftShader's results only "meaningful" for SwiftShader?
All reviews only include a select number of benchmark applications. Does that mean the results are meaningless for other applications? Of course not.
Unless you can give me any sort of indication how another software renderer with Shader Model 3.0 support could be bandwidth limited, 30% higher performance with 55% of the bandwidth is extremely meaningful for any such software renderer.
You seem to be suggesting that SwiftShader is doing something wrong which makes it 30% faster with 55% of the bandwidth. If that's the case, great! It means that things can get much faster still.
>>I haven't had the chance yet to see how Sandy Bridge >performs with multi-sampling,
>>but I'll keep you posted when I do. Anisotropic filtering >doesn't seem relevant
>>to bandwidth, since all texels will be in cache due to >high temporal coherence.
>>In fact it would make it even more gather/scatter limited.
>
>>>Also, what is your comparison against? It sounds like you're saying Sandy Bridge
>>>has 55% more memory bandwidth than whatever your baseline is...which sounds like
>>>an awful lot. Sandy Bridge's memory controller isn't that much faster than Arrandale AFAIK.
>>
>>No, I said Sandy Bridge has only 55% of the bandwidth of my baseline, which is
>>an i7 920 @ 3.2 GHz, using 1600 MHz memory and three channels. So if I'm not mistaken
>>that's 1.85 times the memory bandwidth versus the i7 2600 >I tested. Despite that, the 2600 is 30% faster!
>
>I don't know what an 7 920 is, but if it only has 4 cores, it cannot really utilize the memory bandwidth.
http://ark.intel.com/Product.aspx?id=37147
>Also, note that there are big differences in the cache hierarchy.
Exactly. If there was any lack of bandwidth in the first place, it's clearly compensated by other CPU technology. I don't see the slightest indication that software rendering will become bandwidth limited any time soon.
The most likely explanation for the fact that SwiftShader is 30% faster on Sandy Bridge is probably the dual load units. Which means a gather instruction would increase performance further, especially with 256-bit AVX.
If you have a better explanation, please tell me because it probably means SwiftShader could be even faster.
>Perhaps a more interesting question to ask is to take Sandy Bridge and start downclocking
>the memory and seeing what the performance degradation is. Comparing two totally
>different systems tells you very little about memory bandwidth.
I don't have a Sandy Bridge system yet (and the chipset bug probably means it will take another month). But feel free to benchmark it yourself. The public demo is fully featured.
Anyway, to meet you in the middle I downclocked my i7-920's memory from 1600 MHz to 960 MHz, and the 3DMark06 SM3.0 score went from 250 to 247. So once again, reducing the bandwidth to 60% has no significant impact on performance.
>>So again, software rendering is nowhere near being RAM >bandwidth limited. Just like many other compute intensive >applications, it needs gather/scatter before any other >parameter starts to matter. Note that gather/scatter >wouldn't just allow a massive reduction in the number of >serial load/store instructions, it would also reduce the number of swizzle operations: http://software.intel.com/en-us/forums/showpost.php?p=139770
>
>Reducing the number of loads and stores isn't really relevant. It's the number
>of operations that matters. If you are gathering 16 different addresses, you are
>really doing 16 different load operations.
Not with Larrabee's implementation. It only takes as many uops as the number of cache lines that are needed to collect all elements, per load unit.
>>>>The trend is that the number of texture accesses is going >down, relative to uncompressed
>>>>unfiltered memory accesses.
>>>>So the bandwidth savings is getting smaller. The reason
>>>>for this is obvious: there's only so many textures you can >slap onto your scene's geometry.
>>>
>>>Then you just increase the geometry count, or use tessellation. It seems like
>>>geometry would scale at the same rate (or maybe a bit slower) than texturing.
>>
>>With all due respect, you're sounding a bit like the >people who said dedicated
>>vertex and pixel pipelines make more sense than a unified >architecture because
>"you just increase the geometry >count".
>
>No, that's totally different. With unified shaders you can easily use more or
>less geometry...until you run out of physical shaders to execute on. You should
>look at Kayvon's work on micro-polygon rendering.
Easily? I think you're seriously underestimating the complexity of adapting your software to the hardware. Checking whether you're vertex or pixel processing limited wasn't feasible in actual games ten years ago, and it still isn't.
Fatahalian's article doesn't address this issue either. It raises more questions than it answers, really. Micropolygon rendering is just one out of many techniques. Some applications may use it, a lot won't. A flat wall is still a flat wall and tesselating it is just a waste of computing power (not to mention electrical power). And for applications that don't use it, any additional dedicated hardware is a waste of silicon. To efficiently support micropolygons, ray-tracing, Reyes, a multitude of GPGPU applications, etc. they need to make rasterization and texture sampling programmable. Software developers don't want a fixed architecture that can only be used in one way.
To get back to compressed textures, you continue to prove my point. If micropolygons are the dominant future technique, the vertex processing workload will increase drastically. Since it uses uncompressed vertex attribute data, the relative bandwidth savings from texture compression continues to drop.
>>It's clear that telling software developers what (not) to >do doesn't result in
>>a succesful next generation hardware architecture. With >non-unified architectures,
>>there were numerous applications which were vertex >processing limited, and numerous
>>ones which were pixel processing limited. And even those >in the middle have a fluctuating workload.
>
>Yes, except a unified shader architecture doesn't really preclude that many options.
That's what I'm saying.
>>So instead hardware architectures should accomodate to the >increasing variety in
>>workloads. This means generalizing the texture units into >gather/scatter units and
>>performing filtering in the shader units.
>>
>>Anyway, you were probably still referring to bandwidth? Then increasing the amount
>>of geometry in the scene won't tip the balance in favor of texture compression.
>>With deferred shading or pre-z passes additional geometry only increases the color
>>and/or z bandwidth needs (on top of the geometry bandwidth of course). Tesselation
>>can make use of compressed textures, but note that it's sampled at a low frequency.
>>Plus it massively increases the vertex processing workload, so in fact you're less
>>likely to get bandwidth limited at all.
>
>>>>At the same time, caches are still getting bigger and >bandwidth is still increasing.
>>>
>>>That's not the question. It's 'are caches increasing in >>size sufficiently faster than texturing data'.
>>
>>Absolutely. Ten years ago the L2 cache was 256 kB, nowadays we have 8 MB of L3
>>cache. That's 32 times more. Games did not evolve from ~2 >to ~64 texture samples
>>per pixel.
>>Even if you count in the increased screen resolution and >additional scene
>>complexity (which by the way also scale the computational >needs), the texture bandwidth
>>needs did not increase 32-fold. Even for the high-end the >memory bandwidth has not
>>increased by that much. And it increased far less in the >low-end graphics market.
>>IGPs are focussing on increasing their computing power >more than their texturing abilities.
>
>First, you do need to count the increases in resolution. Pixels have definitely
>increased over time.
Yes the resolution has increased but everything else scaled accordingly. More pixels doesn't mean higher benfit from texture compression. In fact TEX:ALU is going down, meaning pixel shaders are more compute limited than bandwidth limited.
>And TBH, I don't have any numbers on how fast the texture
>sizes have increased. I suspect quite a bit.
Texel size or texture dimensions?
Texture dimensions are irrelevant. The bandwidth need for an unmagnified texture sample is on average one texel per pixel. The additional texels accessed for filtering are all perfectly cachable.
>>In ten more years caches could be around 256 MB, and >that's without taking revolutionary
>>new technologies like T-RAM into account. So it's really >hard to imagine that this
>>won't suffice to compensate for the texture bandwidth >needs of the low-end graphics market.
>
>Because you are imagining that the low-end market stays put. It won't.
I didn't say it stays put. I said it's a slow moving target. Evidence of this is the ever growing gap between high-end and low-end graphics hardware. IGPs were born out of the demand for really cheap but adequate 3D graphics support. They cover the majority of the market:
http://unity3d.com/webplayer/hwstats/pages/web-2011Q1-gfxvendor.html
This massive market must obviously have a further division in price and performance expectations. Some people want a more powerful CPU for the same price by sacrificing a bit of graphics performance, while others simply want a cheaper system that isn't aimed at serious gaming. As the CPU performance continues to increase exponentially, and things like gather/scatter can make a drastic difference in graphics efficiency, software rendering can satisfy more and more people's expectations, even if those expectations themselves incease slowly.
>>SwiftShader 1.0 was first used by a 2D casual game called Galapago. Despite the
>>game's graphical simplicity, it was totally texture sampling limited and it was
>>barely reaching the necessary 10 FPS for playability. That >was five years ago.
>Today we have Crysis running at 20+ >FPS.
>
>What resolution and quality settings?
800x600 at low detail. It's twice as fast as Microsoft WARP: http://msdn.microsoft.com/en-us/library/dd285359.aspx
>>I don't need any crystal ball to see that gather/scatter >will make IGPs redundant.
>
>That's because it won't.
It will. The only strengths the GPU has left are all components based on the ability to load/store lots of data in parallel. The CPU cores already achieve higher GFLOPS than the IGP, so gather/scatter unlocks that power for graphics applications. You can either ditch the IGP to make things cheaper, or replace it with additional CPU cores so you get a really powerful processor for any workload.
>>http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
>
>That means nothing. It means that GF is investigating the technology, not that it's production ready.
Quoting the announcement: "...into a joint DEVELOPMENT agreement targeted toward the APPLICATION of T-RAM’s Thyristor-RAM embedded memory..."
Emphasis mine. Why would a major foundry enter into a development agreement with a startup, unless the technology has already been proven on a smaller scale?
Quoting http://www.t-ram.com/news/media/3B.1_IRPS2009_Salling.pdf: "Taken together, the results of this study show
that T-RAM is a reliable and manufacturable memory
technology."
Quoting t-ram.com: "T-RAM Semiconductor has successfully developed the Thyristor-RAM technology from concept to production-readiness. Our Thyristor-RAM technology has been successfully implemented on both Bulk and SOI CMOS. "
Sounds like production ready to me.
>>Anyway, there are multiple high density cache technologies. There's Thyristor-RAM,
>
>That's T-RAM.
I know. I was just summing up "high density cache technologies".
>>1T-RAM,
>
>Not a replacement for SRAM.
Why not? Looks useful as L3 cache to me.
>>2nd gen Z-RAM
>
>Doesn't work at all.
Maybe not as cache memory, but it's hopeful as a DRAM replacement: http://www.z-ram.com/en/pdf/Z-RAM_LV_and_bulk_PR_Final_for_press.pdf
>>Together with gather/scatter this helps the CPU compensate
>>for its lack of dedicated texturing hardware, allowing it >to significantly gain
>>on GPUs. So sooner or later software rendering is going to >take over the low-end graphics market.
>
>It's possible, but they will need to become more competitive from an energy perspective with fixed function stuff.
There's not a lot of fixed-function stuff left. The majority of the GPU's die space consist of programmable or generic components.
And I've shown before that the CPUs FLOPS/Watt is in the same league as GPUs:
- Core i7-282QM: 150 GFLOPS / 45 Watt (more with Turbo Boost)
- GeForce GT 420: 134.4 GFLOPS / 50 Watt
Obviously software rendering requires a bit more arithmetic power to implement the remaining fixed-function functionality, but programmable shaders take the bulk.
So there's no lack of energy efficiency. The CPU simply can't utilize its computing power effectively
>Audio consumes an insignificant number of cycles. Miniscule.
During the early days of AC'97 there was some pretty serious debate about moving the audio processing workload to the CPU. It made a real difference in benchmark results. People who back then swore by the efficiency of dedicated sound cards, now happily use HD Audio.
>Perhaps when graphics
>gets to that point, it will be fine to put it in SW.
Exactly. There's no doubt it will happen, some day. My take is that gather/scatter support is sufficient to initiate the move to software rendering.
>And yes, power efficiency matters a lot. You may not think so, but it does.
I do think it matters, a lot. But I think you're underestimating how power efficient CPUs already are. It just doesn't translate into high effective performance for 3D graphics due to wasting a lot of cycles on moving data elements around.
An AVX FMA instruction can perform 16 operations every single cycle, but it would take a whopping 72 uops if every address and element was extracted/inserted sequentially. When it comes to load/store, we haven't evolved beyond x87 yet. Of course this is the worst case and typically not every vector load/store has to be a gather/scatter, but for situations where you do need them it makes a massive difference.
>>Sandy Bridge is designed to be able to use four cores plus an IGP, all within 95
>>Watt. During gaming or other intensive applications, that's perfectly acceptable.
>>As you know, Sandy Bridge is amazingly efficient. So it should be equally acceptable
>>to trade the IGP for two more CPU cores, add gather/scatter capabilities, and as
>>a result get a really powerful CPU which can also take >care of graphics, all within 95 Watt.
>
>I don't think you design CPUs for high volume applications. Most don't need scatter/gather
>and the hardware cost is high.
All applications that contain loops can benefit from gather/scatter. That's all applications.
With sather/scatter support every scalar operation would have a parallel equivalent. So any loop with independent iterations can be parallelized and execute up to 8 times faster.
And I don't think the hardware cost is that high. All you need is a bit of logic to check which elements are located in the same cache line, and four byte shift units per 128-bit load units instead of one, to collect the individual elements. Note that logic for sequentially accessing the cache lines is already largely in place to support load operations which straddle a cache line boundary.
>>>>But what's the next step? Obviously there's a quest for higher quality so it was
>>>>only a matter of time before compressed HDR formats became available so they actually
>>>>fit in VRAM. But we don't really want compression; it's just a temporary necessity
>>>>for dedicated gaphics.
>>>
>>>You always want compression for something you store in memory. It increases effective
>>>bandwidth and reduces capacity used. You might want higher fidelity or lossless
>>>compression, but you'll still want it. Less memory traffic is lower power and fewer pins.
>>>
>>>>Once the bandwidth and VRAM goes up the purpose of >compression
>>>>diminishes again. There's not much beyond this, and if >there is, it's not likely
>>>>of interest to the markets that would show an interest in >software rendering first.
>>>
>>>Again, only sort of. There's always a benefit to compression.
>>
>>You don't unconditionally want it. It's still a trade-off. >For this reason lossless
>>compression of system memory is hardly more than a >conceptual idea. Apparently it's
>>cheaper to increase the bandwidth and capacity the brute >force way.
>
>Really? Have you heard of Vertica? They do an awful lot of lossless compression of data in memory.
No, I hadn't heard about them before. Could you point me to some document where they detail how they added hardware support for compressed memory transfers to reduce bandwidth?
>>>>To me it's very clear that the lack of parallel load/store >is a huge bottleneck.
>>>>The ALU's can do 16 operations in parallel, but only 2 >load operations and 1 store
>>>>operation can be executed per clock. And it gets worse >with FMA.
>>>
>>>Um, Sandy Bridge can do 32B/cycle of loads, that's 8 >values.
>>
>>It can load 2 x 4 *consecutive* 32-bit values (per core >per cycle). That's not
>>the same as 8 values. As illustrated in the link I gave >above, that's worthless
>>for something like a parallel table lookup.
>
>Many applications use adjacent values.
Yes, and many applications also use non-adjacent values.
If a loop contains just one load or store at an address which isn't consecutive, it can't be vectorized (unless you want to resort to serially extracting/inserting addresses and values). So even if the majority of values are adjacent, it doesn't take a lot of non-adjacent data to cripple the performance.
>>>That's part of it...but scatter/gather generally means you MISS in the cache/TLBs and chew up more bandwidth.
>>
>>Why? It only accesses the cache lines it needs. If all >elements are from the same
>>cache line, it's as fast as accessing a single element.
>
>And exactly as fast as using AVX! i.e. no improvement and more complexity/power.
No. The addresses are unknown at compile time. So the only option with AVX1 is to sequentially extract each address from the address vector, and insert the read element into the result vector. This takes 18 instructions.
With gather support it would be just one instruction. Assuming it gets split into two 128-bit gather uops, the maximum throughput is 1 every cycle and the minimal throughput is 1 every 4 cycles.
>>But even in the worst case
>>it can't generate more misses or consume more bandwidth.
>
>It sure can. Now instead of having 1-2 TLB accesses per cycle, you get 16. How
>many TLB copies do you want? How many misses in flight do you want to support?
You're still not getting it. It only accesses one cache line per cycle. It simply has to check which elements are within the same cache line, and perform a single TLB access for all of these elements. Checking whether the addresses land on the same cache line doesn't require full translation of each address.
>>Once the IGP has gone the way of the dodo, more powerful >CPUs can be created for
>>people who want mid-end graphics performance. I bet Intel >would really love selling
>>quad-channel 12-core CPUs not just to the server market but to gamers as well. Sandy
>>Bridge already pretty much made the low-end discrete graphics cards obsolete, and
>>Intel is not likely going to lose its grip on that massive >market.
>
>I'm sure they'd love to, but they won't. Some things run better on dedicated hardware
>than not. It's possible some dedicated hardware will migrate into the core...but not guaranteed.
Nothing other than graphics runs better on the IGP. As I've mentioned before, GPGPU is only succesful using high-end hardware.
So the CPU is better than the IGP at absolutely everything else. That makes it really tempting to have a closer look at what it would take to make it adequately efficient at graphics as well.
The answer: gather/scatter.
>I think one of the points people are raising is that CPUs are generally latency
>oriented devices, whereas scatter/gather tends to be more useful in the context
>of throughput devices. That suggests that to some extent scatter/gather in the
>GPU (i.e. throughput device) is the right approach...instead of trying to make the CPU into a throughput device.
Multi-core, 256-bit vectors, Hyper-Threading, software pipelining... the CPU is already a throughput device! It's just being held back by the lack of parallel load/store support. It's the one missing part to let all those GFLOPS come to full fruition.
The architecture of the future balances ILP, DLP, and TLP. GPUs also started to embrace ILP (with true superscalar instruction scheduling), and TLP (with concurrent kernel execution). They still have a long way to go to become good at anything other than graphics though. It's just never going to happen. If the IGP becomes capable of anything other than graphics, it will be very CPU-like so it makes no sense to keep it a heterogenous architecture.
Long before IGPs can even be considered for GPGPU applications, the CPU will have support for the last few missing instructions to turn it into an efficient throughput device.
>>Not necessarily. Opting for an IGP means accepting a less >powerful CPU.
>
>It doesn't though - it merely means fewer CPU cores. Look at Sandy Bridge for
>consumers versus the EP version. The difference is in the core count. You cannot
>simply afford to create different CPU cores for folks who want IGPs and those who don't. THat's bad economics.
The CPU core is the same for everyone, and kicking out the IGP for some models is cheap. As I've mentioned above gather/scatter is useful for practially any application.
>>I know
>>plenty of people who would buy a laptop with a more >powerful CPU instead, if it
>>costed the same and the graphics were adequate for their >everyday experience.
>
>I think you're missing the point. Right now, it appears like the trade-off is # of CPU cores vs. graphics.
There's HD Graphics 2000 and 3000. Surely there's room for HD Graphics 1000 if it means you still get a very powerful CPU.
>>Seriously, let's analyze the case of the GMA 950. It sucks donkey balls (pardon
>>my French). There's no excuse for buying it if you care >even a little about graphics.
>>Right? Despite having all odds against it, massive numbers >of this IGP were sold
>>in dual-core laptops, Mac Mini's, set-top boxes, etc. A >lite version (!) of this
>>architecture will even make it in lots of Smart TV's as >part of the Intel CE 2110
>>SoC. Lots of products based on this chip are still to >appear. Where does this
>commercial success come from? It's >cheap.
>
>It's also power efficient. At this point, it's not clear that SW rendering is
>comparably efficient and I'd argue it's unlikely to be without specialized hardware.
What specialized hardware would that be? I've already shown that texture compression hardly makes a difference, and sampling and filtering is becoming programmable anyway. Gather/scatter speeds up just about every other pipeline stage as well.
>>So people's expectation of adequate graphics can be very >low. But Sandy Bridge
>>actually offers more performance than these people need. >Software rendering can fill the void, as it's even cheaper.
>
>You're assuming that people's appetite for graphics stays constant, which isn't a good assumption.
I'm not assuming it stays constant. I'm assuming it's a slow moving target. It's impossible to prove what people want or don't want, but I personally believe there are strong enough indications that there's a market for software rendering if the efficiency is cranked up using gather/scatter. And because of the strong indications that CPUs and GPUs continue to converge that market is only getting bigger.
>>For a 500$ system with a 10% margin, it would increase >profit by 10% to not include that IGP. So 5$ is a lot.
>
>Yes, but you can charge more for the system since it gets better battery life.
No you can't, because the competition will sell it for less and take away your market share.
>>Furthermore, gather/scatter doesn't just help make >software rendering viable, it
>>helps a whole range of multimedia applications, including >some that have not yet
>>seen the light of day. So you'd be selling a very >attractive system if you spent
>>that 5$ on generic CPU technology instead.
>
>I totally agree that scatter/gather is a great capability to have. But what's
>the cost in die area, power and complexity? Not just to the core, but also the memory controller, etc.
Larrabee has wider vectors and smaller cores, but features gather/scatter support. So I don't think it takes a lot of die space either way. It doesn't require any changes to the memory controller, just the load/store units. I'm not entirely sure but collecting four elements from a cache line can probably largely make use of the existing network to extract one (unaligned) value. And checking which addresses land on the same cache line is a very simple equality test of the upper bits.
Cheers,
Nicolas
David Kanter (dkanter@realworldtech.com) on 1/27/11 wrote:
---------------------------
>>Doom 3 on a GTX 460 at 2560x1600 4xAA runs at 53 FPS at >Ultra High Detail, and
>>at 56 FPS at High Detail. I was being generous when I said >10%.
>
>That shows nothing about compression, that merely tells about the change in performance
>due to larger textures. It's also largely about an older game that isn't designed for 2560x1600.
The change in performance is the whole point.
Doom 3's uncompressed textures are equal in dimensions to the compressed ones.
>What I'd want to see is for a number of MODERN games:
>
>1. Texture size (uncompressed vs. compressed)
>2. Bandwidth usage (uncompressed vs. compressed)
Texture size is irrelevant. They can be kept in compressed form when unused. Only a fraction of all the texture levels is needed during a frame.
And I've tested 3DMark06 with SwiftShader while forcing the mipmap LOD down one level (which is equivalent to reducing the texture bandwidth by a factor 4), and the SM3.0 score went from 250 to 249. Yes, due to some statistical variance the score was actually lower. If texture bandwidth was of great significance, you'd expect a much higher score.
>You're claiming that for #2 the difference is 10% and I don't see any real evidence
>of that. Compression should be vastly more effective.
Texture compression rates of 1:2 and 1:4 are very common, but that doesn't translate into big performance improvements. Most of the time there's sufficient bandwidth headroom to allow uncompressed textures without an impact on performance. And even in bandwidth limited situations, there's already a large chunk of it used by color, z and geometry. So the performance won't drop by much.
>>To clarify this result somewhat, here's the bandwidth usage of Unreal Tournament 2004 at 1024x768, using compressed textures: http://personales.ac.upc.edu/vmoya/img/bw.png
>>
>>Although bandwidth usage differs a lot between games, we can observe a few things
>>which reduce the impact of texture compression. First of all, games aren't memory
>>limited all the time. The memory bus is overdimensioned to prevent that from happening
>>too frequently. It means that even if the bandwidth requirement would double, the
>>performance would not reduce in half. Also note how color and Z each take a very
>>significant amount of bandwidth. And the balance tips over >even further with deferred rendering.
>
>I happen to know the author of that study in question. The data is INCREDIBLY
>OLD. It's from a simulator that did not use any Z or color compression, so the results cannot be taken seriously.
Yes, it's from a simulator called ATTILA. And for the record it did use z compression to generate that graph.
And even though it's old, this data is still very relevant. UT2004 has a very high TEX:ALU ratio, meaning that contemporary games are far less likely to become bandwidth limited. UT2004 also has simple geometry and no pre-z pass, again meaning that texture bandwidth has only become less relevant with newer games.
>>Note that in the case of Doom 3, the main reason for not disabling texture compression
>>was VRAM size. When textures had to be swapped in and out over the AGP or PCIe bus
>>the performance hit was huge, but it wasn't a VRAM >bandwidth issue.
>
>>For a GPU the texture decompression logic is worth it >mainly to keep VRAM size
>>smaller (read: cheaper). The bandwidth savings are a >welcome bonus but I doubt
>it would be a justification by >itself.
>
>I think you underestimate the cost of adding pins to your memory controller.
I'm not suggesting adding extra pins to make software rendering viable. It's already viable bandwidth-wise.
Multi-core is driving the bandwidth needs up for all applications, so I'm confident that it will be increased anyway in due time. But there's no need for additional dedicated hardware. The caches do an excellent job at reducing the overall bandwidth needs.
>>>>That said, it's not at all impossible to add texture >decompression as a CPU instruction...
>>>
>>>Won't happen.
>>
>>That was my conclusion as well. They would never add instructions which could become
>>of little or no use in several years.
>>
>>I merely mentioned it to note that there's nothing a GPU can do, which a CPU could
>>not possibly do. Also, there are plenty of other alternatives, should it be necessary.
>>And again, it doesn't have to match the efficiency of >dedicated hardware to be viable.
>
>Actually for the mobile space it does have to come close to dedicated hardware. Battery life matters, a lot.
For laptops we see the graphics solution range from IGPs to lower clocked high-end desktop GPUs. So while battery life is important, it doesn't mean the TDP has to be the lowest possible. It just has to be acceptable. A cheaper IGP which consumes more power is likely to sell better than a more expensive efficient one. Also note that today's GPUs have far more features than the average consumer will really use, meaning they are less energy efficient than they could have been. But the TDP is still acceptable for a good battery life.
Furthermore, nobody expects a long battery life during intense gaming. Even with dedicated graphics hardware the power consumption during gaming is relatively high. So instead of a multi-core CPU with an IGP you might as well have a CPU with a couple more cores. As long as the TDP is the same, it's acceptable.
>>Most likely the bandwidth will just steadily keep increasing, helping all (high
>>bandwidth) applications equally. DDR3 is standard now accross the entire market,
>>and it's evolving toward higher frequencies and lower voltage. Next up is DDR4,
>>and if necessary the number of memory lanes can be >increased.
>
>More memory lanes substantially increases cost, which is something that everyone wants to avoid.
They'll only avoid it till it's the cheapest solution. Just like dual-channel and DDR3 became standard after some time, things are still evolving toward higher bandwidth technologies. Five years ago the bandwidth achieved by today's budget CPUs was unthinkable. So frankly I don't care how they'll do it in the future, but CPUs reaching 100's of GB/s of bandwidth will sooner or later be perfectly normal.
>>And again CPU technology is not at a standstill. With >T-RAM just around the corner
>>we're looking at 20+ MB of cache for mainstream CPUs in >the not too distant future.
>
>T-RAM is not just around the corner.
This news item suggests otherwise: http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
But even if it does take longer, it doesn't really matter to the long-term viability of software rendering. There will be a breakthrough at some point and it will advance the convergence by making dedicated decompression hardware totally unnecessary (if it even has any relevance left today).
>>And while the expectations for 'adequate' graphics go up as well, it's only a slow
>>moving target. First we saw the appearance of IGPs as an adequate solution for a
>>large portion of the market, and now things are evolving >in favor of software rendering.
>
>I think if you look at the improvement in IGPs, that's a very FAST improving target.
The hardware isn't the target. Consumer expectation is the target. Sandy Bridge leaves a hole in the market for consumers who want a powerful CPU but are content with minimal graphics support.
>>>>SwiftShader runs 30% faster on a Sandy Bridge chip with >only 55% of the bandwidth.
>>>>There is no clearer proof that software rendering isn't >bandwidth limited.
>>>
>>>That's only meaningful for swiftshader, not all SW rendering. How does it work
>>>with MSAA and anisotropic filtering and other more advanced techniques?
>>
>>I don't see why it would only be "meaningful" for SwiftShader. The public demo
>>does not take advantage of Sandy Bridge's new features. So >I expect other software
>>renderer's to benefit roughly the same.
>
>Swiftshader is swiftshader - other SW rendering systems work differently. They may (or may not) see similar benefits.
Again, why does that make SwiftShader's results only "meaningful" for SwiftShader?
All reviews only include a select number of benchmark applications. Does that mean the results are meaningless for other applications? Of course not.
Unless you can give me any sort of indication how another software renderer with Shader Model 3.0 support could be bandwidth limited, 30% higher performance with 55% of the bandwidth is extremely meaningful for any such software renderer.
You seem to be suggesting that SwiftShader is doing something wrong which makes it 30% faster with 55% of the bandwidth. If that's the case, great! It means that things can get much faster still.
>>I haven't had the chance yet to see how Sandy Bridge >performs with multi-sampling,
>>but I'll keep you posted when I do. Anisotropic filtering >doesn't seem relevant
>>to bandwidth, since all texels will be in cache due to >high temporal coherence.
>>In fact it would make it even more gather/scatter limited.
>
>>>Also, what is your comparison against? It sounds like you're saying Sandy Bridge
>>>has 55% more memory bandwidth than whatever your baseline is...which sounds like
>>>an awful lot. Sandy Bridge's memory controller isn't that much faster than Arrandale AFAIK.
>>
>>No, I said Sandy Bridge has only 55% of the bandwidth of my baseline, which is
>>an i7 920 @ 3.2 GHz, using 1600 MHz memory and three channels. So if I'm not mistaken
>>that's 1.85 times the memory bandwidth versus the i7 2600 >I tested. Despite that, the 2600 is 30% faster!
>
>I don't know what an 7 920 is, but if it only has 4 cores, it cannot really utilize the memory bandwidth.
http://ark.intel.com/Product.aspx?id=37147
>Also, note that there are big differences in the cache hierarchy.
Exactly. If there was any lack of bandwidth in the first place, it's clearly compensated by other CPU technology. I don't see the slightest indication that software rendering will become bandwidth limited any time soon.
The most likely explanation for the fact that SwiftShader is 30% faster on Sandy Bridge is probably the dual load units. Which means a gather instruction would increase performance further, especially with 256-bit AVX.
If you have a better explanation, please tell me because it probably means SwiftShader could be even faster.
>Perhaps a more interesting question to ask is to take Sandy Bridge and start downclocking
>the memory and seeing what the performance degradation is. Comparing two totally
>different systems tells you very little about memory bandwidth.
I don't have a Sandy Bridge system yet (and the chipset bug probably means it will take another month). But feel free to benchmark it yourself. The public demo is fully featured.
Anyway, to meet you in the middle I downclocked my i7-920's memory from 1600 MHz to 960 MHz, and the 3DMark06 SM3.0 score went from 250 to 247. So once again, reducing the bandwidth to 60% has no significant impact on performance.
>>So again, software rendering is nowhere near being RAM >bandwidth limited. Just like many other compute intensive >applications, it needs gather/scatter before any other >parameter starts to matter. Note that gather/scatter >wouldn't just allow a massive reduction in the number of >serial load/store instructions, it would also reduce the number of swizzle operations: http://software.intel.com/en-us/forums/showpost.php?p=139770
>
>Reducing the number of loads and stores isn't really relevant. It's the number
>of operations that matters. If you are gathering 16 different addresses, you are
>really doing 16 different load operations.
Not with Larrabee's implementation. It only takes as many uops as the number of cache lines that are needed to collect all elements, per load unit.
>>>>The trend is that the number of texture accesses is going >down, relative to uncompressed
>>>>unfiltered memory accesses.
>>>>So the bandwidth savings is getting smaller. The reason
>>>>for this is obvious: there's only so many textures you can >slap onto your scene's geometry.
>>>
>>>Then you just increase the geometry count, or use tessellation. It seems like
>>>geometry would scale at the same rate (or maybe a bit slower) than texturing.
>>
>>With all due respect, you're sounding a bit like the >people who said dedicated
>>vertex and pixel pipelines make more sense than a unified >architecture because
>"you just increase the geometry >count".
>
>No, that's totally different. With unified shaders you can easily use more or
>less geometry...until you run out of physical shaders to execute on. You should
>look at Kayvon's work on micro-polygon rendering.
Easily? I think you're seriously underestimating the complexity of adapting your software to the hardware. Checking whether you're vertex or pixel processing limited wasn't feasible in actual games ten years ago, and it still isn't.
Fatahalian's article doesn't address this issue either. It raises more questions than it answers, really. Micropolygon rendering is just one out of many techniques. Some applications may use it, a lot won't. A flat wall is still a flat wall and tesselating it is just a waste of computing power (not to mention electrical power). And for applications that don't use it, any additional dedicated hardware is a waste of silicon. To efficiently support micropolygons, ray-tracing, Reyes, a multitude of GPGPU applications, etc. they need to make rasterization and texture sampling programmable. Software developers don't want a fixed architecture that can only be used in one way.
To get back to compressed textures, you continue to prove my point. If micropolygons are the dominant future technique, the vertex processing workload will increase drastically. Since it uses uncompressed vertex attribute data, the relative bandwidth savings from texture compression continues to drop.
>>It's clear that telling software developers what (not) to >do doesn't result in
>>a succesful next generation hardware architecture. With >non-unified architectures,
>>there were numerous applications which were vertex >processing limited, and numerous
>>ones which were pixel processing limited. And even those >in the middle have a fluctuating workload.
>
>Yes, except a unified shader architecture doesn't really preclude that many options.
That's what I'm saying.
>>So instead hardware architectures should accomodate to the >increasing variety in
>>workloads. This means generalizing the texture units into >gather/scatter units and
>>performing filtering in the shader units.
>>
>>Anyway, you were probably still referring to bandwidth? Then increasing the amount
>>of geometry in the scene won't tip the balance in favor of texture compression.
>>With deferred shading or pre-z passes additional geometry only increases the color
>>and/or z bandwidth needs (on top of the geometry bandwidth of course). Tesselation
>>can make use of compressed textures, but note that it's sampled at a low frequency.
>>Plus it massively increases the vertex processing workload, so in fact you're less
>>likely to get bandwidth limited at all.
>
>>>>At the same time, caches are still getting bigger and >bandwidth is still increasing.
>>>
>>>That's not the question. It's 'are caches increasing in >>size sufficiently faster than texturing data'.
>>
>>Absolutely. Ten years ago the L2 cache was 256 kB, nowadays we have 8 MB of L3
>>cache. That's 32 times more. Games did not evolve from ~2 >to ~64 texture samples
>>per pixel.
>>Even if you count in the increased screen resolution and >additional scene
>>complexity (which by the way also scale the computational >needs), the texture bandwidth
>>needs did not increase 32-fold. Even for the high-end the >memory bandwidth has not
>>increased by that much. And it increased far less in the >low-end graphics market.
>>IGPs are focussing on increasing their computing power >more than their texturing abilities.
>
>First, you do need to count the increases in resolution. Pixels have definitely
>increased over time.
Yes the resolution has increased but everything else scaled accordingly. More pixels doesn't mean higher benfit from texture compression. In fact TEX:ALU is going down, meaning pixel shaders are more compute limited than bandwidth limited.
>And TBH, I don't have any numbers on how fast the texture
>sizes have increased. I suspect quite a bit.
Texel size or texture dimensions?
Texture dimensions are irrelevant. The bandwidth need for an unmagnified texture sample is on average one texel per pixel. The additional texels accessed for filtering are all perfectly cachable.
>>In ten more years caches could be around 256 MB, and >that's without taking revolutionary
>>new technologies like T-RAM into account. So it's really >hard to imagine that this
>>won't suffice to compensate for the texture bandwidth >needs of the low-end graphics market.
>
>Because you are imagining that the low-end market stays put. It won't.
I didn't say it stays put. I said it's a slow moving target. Evidence of this is the ever growing gap between high-end and low-end graphics hardware. IGPs were born out of the demand for really cheap but adequate 3D graphics support. They cover the majority of the market:
http://unity3d.com/webplayer/hwstats/pages/web-2011Q1-gfxvendor.html
This massive market must obviously have a further division in price and performance expectations. Some people want a more powerful CPU for the same price by sacrificing a bit of graphics performance, while others simply want a cheaper system that isn't aimed at serious gaming. As the CPU performance continues to increase exponentially, and things like gather/scatter can make a drastic difference in graphics efficiency, software rendering can satisfy more and more people's expectations, even if those expectations themselves incease slowly.
>>SwiftShader 1.0 was first used by a 2D casual game called Galapago. Despite the
>>game's graphical simplicity, it was totally texture sampling limited and it was
>>barely reaching the necessary 10 FPS for playability. That >was five years ago.
>Today we have Crysis running at 20+ >FPS.
>
>What resolution and quality settings?
800x600 at low detail. It's twice as fast as Microsoft WARP: http://msdn.microsoft.com/en-us/library/dd285359.aspx
>>I don't need any crystal ball to see that gather/scatter >will make IGPs redundant.
>
>That's because it won't.
It will. The only strengths the GPU has left are all components based on the ability to load/store lots of data in parallel. The CPU cores already achieve higher GFLOPS than the IGP, so gather/scatter unlocks that power for graphics applications. You can either ditch the IGP to make things cheaper, or replace it with additional CPU cores so you get a really powerful processor for any workload.
>>http://www.businesswire.com/portal/site/home/permalink/?ndmViewId=news_view&newsId=20090518005181
>
>That means nothing. It means that GF is investigating the technology, not that it's production ready.
Quoting the announcement: "...into a joint DEVELOPMENT agreement targeted toward the APPLICATION of T-RAM’s Thyristor-RAM embedded memory..."
Emphasis mine. Why would a major foundry enter into a development agreement with a startup, unless the technology has already been proven on a smaller scale?
Quoting http://www.t-ram.com/news/media/3B.1_IRPS2009_Salling.pdf: "Taken together, the results of this study show
that T-RAM is a reliable and manufacturable memory
technology."
Quoting t-ram.com: "T-RAM Semiconductor has successfully developed the Thyristor-RAM technology from concept to production-readiness. Our Thyristor-RAM technology has been successfully implemented on both Bulk and SOI CMOS. "
Sounds like production ready to me.
>>Anyway, there are multiple high density cache technologies. There's Thyristor-RAM,
>
>That's T-RAM.
I know. I was just summing up "high density cache technologies".
>>1T-RAM,
>
>Not a replacement for SRAM.
Why not? Looks useful as L3 cache to me.
>>2nd gen Z-RAM
>
>Doesn't work at all.
Maybe not as cache memory, but it's hopeful as a DRAM replacement: http://www.z-ram.com/en/pdf/Z-RAM_LV_and_bulk_PR_Final_for_press.pdf
>>Together with gather/scatter this helps the CPU compensate
>>for its lack of dedicated texturing hardware, allowing it >to significantly gain
>>on GPUs. So sooner or later software rendering is going to >take over the low-end graphics market.
>
>It's possible, but they will need to become more competitive from an energy perspective with fixed function stuff.
There's not a lot of fixed-function stuff left. The majority of the GPU's die space consist of programmable or generic components.
And I've shown before that the CPUs FLOPS/Watt is in the same league as GPUs:
- Core i7-282QM: 150 GFLOPS / 45 Watt (more with Turbo Boost)
- GeForce GT 420: 134.4 GFLOPS / 50 Watt
Obviously software rendering requires a bit more arithmetic power to implement the remaining fixed-function functionality, but programmable shaders take the bulk.
So there's no lack of energy efficiency. The CPU simply can't utilize its computing power effectively
>Audio consumes an insignificant number of cycles. Miniscule.
During the early days of AC'97 there was some pretty serious debate about moving the audio processing workload to the CPU. It made a real difference in benchmark results. People who back then swore by the efficiency of dedicated sound cards, now happily use HD Audio.
>Perhaps when graphics
>gets to that point, it will be fine to put it in SW.
Exactly. There's no doubt it will happen, some day. My take is that gather/scatter support is sufficient to initiate the move to software rendering.
>And yes, power efficiency matters a lot. You may not think so, but it does.
I do think it matters, a lot. But I think you're underestimating how power efficient CPUs already are. It just doesn't translate into high effective performance for 3D graphics due to wasting a lot of cycles on moving data elements around.
An AVX FMA instruction can perform 16 operations every single cycle, but it would take a whopping 72 uops if every address and element was extracted/inserted sequentially. When it comes to load/store, we haven't evolved beyond x87 yet. Of course this is the worst case and typically not every vector load/store has to be a gather/scatter, but for situations where you do need them it makes a massive difference.
>>Sandy Bridge is designed to be able to use four cores plus an IGP, all within 95
>>Watt. During gaming or other intensive applications, that's perfectly acceptable.
>>As you know, Sandy Bridge is amazingly efficient. So it should be equally acceptable
>>to trade the IGP for two more CPU cores, add gather/scatter capabilities, and as
>>a result get a really powerful CPU which can also take >care of graphics, all within 95 Watt.
>
>I don't think you design CPUs for high volume applications. Most don't need scatter/gather
>and the hardware cost is high.
All applications that contain loops can benefit from gather/scatter. That's all applications.
With sather/scatter support every scalar operation would have a parallel equivalent. So any loop with independent iterations can be parallelized and execute up to 8 times faster.
And I don't think the hardware cost is that high. All you need is a bit of logic to check which elements are located in the same cache line, and four byte shift units per 128-bit load units instead of one, to collect the individual elements. Note that logic for sequentially accessing the cache lines is already largely in place to support load operations which straddle a cache line boundary.
>>>>But what's the next step? Obviously there's a quest for higher quality so it was
>>>>only a matter of time before compressed HDR formats became available so they actually
>>>>fit in VRAM. But we don't really want compression; it's just a temporary necessity
>>>>for dedicated gaphics.
>>>
>>>You always want compression for something you store in memory. It increases effective
>>>bandwidth and reduces capacity used. You might want higher fidelity or lossless
>>>compression, but you'll still want it. Less memory traffic is lower power and fewer pins.
>>>
>>>>Once the bandwidth and VRAM goes up the purpose of >compression
>>>>diminishes again. There's not much beyond this, and if >there is, it's not likely
>>>>of interest to the markets that would show an interest in >software rendering first.
>>>
>>>Again, only sort of. There's always a benefit to compression.
>>
>>You don't unconditionally want it. It's still a trade-off. >For this reason lossless
>>compression of system memory is hardly more than a >conceptual idea. Apparently it's
>>cheaper to increase the bandwidth and capacity the brute >force way.
>
>Really? Have you heard of Vertica? They do an awful lot of lossless compression of data in memory.
No, I hadn't heard about them before. Could you point me to some document where they detail how they added hardware support for compressed memory transfers to reduce bandwidth?
>>>>To me it's very clear that the lack of parallel load/store >is a huge bottleneck.
>>>>The ALU's can do 16 operations in parallel, but only 2 >load operations and 1 store
>>>>operation can be executed per clock. And it gets worse >with FMA.
>>>
>>>Um, Sandy Bridge can do 32B/cycle of loads, that's 8 >values.
>>
>>It can load 2 x 4 *consecutive* 32-bit values (per core >per cycle). That's not
>>the same as 8 values. As illustrated in the link I gave >above, that's worthless
>>for something like a parallel table lookup.
>
>Many applications use adjacent values.
Yes, and many applications also use non-adjacent values.
If a loop contains just one load or store at an address which isn't consecutive, it can't be vectorized (unless you want to resort to serially extracting/inserting addresses and values). So even if the majority of values are adjacent, it doesn't take a lot of non-adjacent data to cripple the performance.
>>>That's part of it...but scatter/gather generally means you MISS in the cache/TLBs and chew up more bandwidth.
>>
>>Why? It only accesses the cache lines it needs. If all >elements are from the same
>>cache line, it's as fast as accessing a single element.
>
>And exactly as fast as using AVX! i.e. no improvement and more complexity/power.
No. The addresses are unknown at compile time. So the only option with AVX1 is to sequentially extract each address from the address vector, and insert the read element into the result vector. This takes 18 instructions.
With gather support it would be just one instruction. Assuming it gets split into two 128-bit gather uops, the maximum throughput is 1 every cycle and the minimal throughput is 1 every 4 cycles.
>>But even in the worst case
>>it can't generate more misses or consume more bandwidth.
>
>It sure can. Now instead of having 1-2 TLB accesses per cycle, you get 16. How
>many TLB copies do you want? How many misses in flight do you want to support?
You're still not getting it. It only accesses one cache line per cycle. It simply has to check which elements are within the same cache line, and perform a single TLB access for all of these elements. Checking whether the addresses land on the same cache line doesn't require full translation of each address.
>>Once the IGP has gone the way of the dodo, more powerful >CPUs can be created for
>>people who want mid-end graphics performance. I bet Intel >would really love selling
>>quad-channel 12-core CPUs not just to the server market but to gamers as well. Sandy
>>Bridge already pretty much made the low-end discrete graphics cards obsolete, and
>>Intel is not likely going to lose its grip on that massive >market.
>
>I'm sure they'd love to, but they won't. Some things run better on dedicated hardware
>than not. It's possible some dedicated hardware will migrate into the core...but not guaranteed.
Nothing other than graphics runs better on the IGP. As I've mentioned before, GPGPU is only succesful using high-end hardware.
So the CPU is better than the IGP at absolutely everything else. That makes it really tempting to have a closer look at what it would take to make it adequately efficient at graphics as well.
The answer: gather/scatter.
>I think one of the points people are raising is that CPUs are generally latency
>oriented devices, whereas scatter/gather tends to be more useful in the context
>of throughput devices. That suggests that to some extent scatter/gather in the
>GPU (i.e. throughput device) is the right approach...instead of trying to make the CPU into a throughput device.
Multi-core, 256-bit vectors, Hyper-Threading, software pipelining... the CPU is already a throughput device! It's just being held back by the lack of parallel load/store support. It's the one missing part to let all those GFLOPS come to full fruition.
The architecture of the future balances ILP, DLP, and TLP. GPUs also started to embrace ILP (with true superscalar instruction scheduling), and TLP (with concurrent kernel execution). They still have a long way to go to become good at anything other than graphics though. It's just never going to happen. If the IGP becomes capable of anything other than graphics, it will be very CPU-like so it makes no sense to keep it a heterogenous architecture.
Long before IGPs can even be considered for GPGPU applications, the CPU will have support for the last few missing instructions to turn it into an efficient throughput device.
>>Not necessarily. Opting for an IGP means accepting a less >powerful CPU.
>
>It doesn't though - it merely means fewer CPU cores. Look at Sandy Bridge for
>consumers versus the EP version. The difference is in the core count. You cannot
>simply afford to create different CPU cores for folks who want IGPs and those who don't. THat's bad economics.
The CPU core is the same for everyone, and kicking out the IGP for some models is cheap. As I've mentioned above gather/scatter is useful for practially any application.
>>I know
>>plenty of people who would buy a laptop with a more >powerful CPU instead, if it
>>costed the same and the graphics were adequate for their >everyday experience.
>
>I think you're missing the point. Right now, it appears like the trade-off is # of CPU cores vs. graphics.
There's HD Graphics 2000 and 3000. Surely there's room for HD Graphics 1000 if it means you still get a very powerful CPU.
>>Seriously, let's analyze the case of the GMA 950. It sucks donkey balls (pardon
>>my French). There's no excuse for buying it if you care >even a little about graphics.
>>Right? Despite having all odds against it, massive numbers >of this IGP were sold
>>in dual-core laptops, Mac Mini's, set-top boxes, etc. A >lite version (!) of this
>>architecture will even make it in lots of Smart TV's as >part of the Intel CE 2110
>>SoC. Lots of products based on this chip are still to >appear. Where does this
>commercial success come from? It's >cheap.
>
>It's also power efficient. At this point, it's not clear that SW rendering is
>comparably efficient and I'd argue it's unlikely to be without specialized hardware.
What specialized hardware would that be? I've already shown that texture compression hardly makes a difference, and sampling and filtering is becoming programmable anyway. Gather/scatter speeds up just about every other pipeline stage as well.
>>So people's expectation of adequate graphics can be very >low. But Sandy Bridge
>>actually offers more performance than these people need. >Software rendering can fill the void, as it's even cheaper.
>
>You're assuming that people's appetite for graphics stays constant, which isn't a good assumption.
I'm not assuming it stays constant. I'm assuming it's a slow moving target. It's impossible to prove what people want or don't want, but I personally believe there are strong enough indications that there's a market for software rendering if the efficiency is cranked up using gather/scatter. And because of the strong indications that CPUs and GPUs continue to converge that market is only getting bigger.
>>For a 500$ system with a 10% margin, it would increase >profit by 10% to not include that IGP. So 5$ is a lot.
>
>Yes, but you can charge more for the system since it gets better battery life.
No you can't, because the competition will sell it for less and take away your market share.
>>Furthermore, gather/scatter doesn't just help make >software rendering viable, it
>>helps a whole range of multimedia applications, including >some that have not yet
>>seen the light of day. So you'd be selling a very >attractive system if you spent
>>that 5$ on generic CPU technology instead.
>
>I totally agree that scatter/gather is a great capability to have. But what's
>the cost in die area, power and complexity? Not just to the core, but also the memory controller, etc.
Larrabee has wider vectors and smaller cores, but features gather/scatter support. So I don't think it takes a lot of die space either way. It doesn't require any changes to the memory controller, just the load/store units. I'm not entirely sure but collecting four elements from a cache line can probably largely make use of the existing network to extract one (unaligned) value. And checking which addresses land on the same cache line is a very simple equality test of the upper bits.
Cheers,
Nicolas
Topic | Posted By | Date |
---|---|---|
Sandy Bridge CPU article online | David Kanter | 2010/09/26 08:35 PM |
Sandy Bridge CPU article online | Alex | 2010/09/27 04:22 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:06 AM |
Sandy Bridge CPU article online | someone | 2010/09/27 05:03 AM |
Sandy Bridge CPU article online | slacker | 2010/09/27 01:08 PM |
PowerPC is now Power | Paul A. Clayton | 2010/09/27 03:34 PM |
Sandy Bridge CPU article online | Dave | 2010/11/10 09:15 PM |
Sandy Bridge CPU article online | someone | 2010/09/27 05:23 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 05:39 PM |
Optimizing register clear | Paul A. Clayton | 2010/09/28 11:34 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 05:54 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:15 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 10:02 AM |
Sandy Bridge CPU article online | mpx | 2010/09/27 10:44 AM |
Sandy Bridge CPU article online | MS | 2010/09/27 01:37 PM |
Precisely | David Kanter | 2010/09/27 02:22 PM |
Sandy Bridge CPU article online | Richard Cownie | 2010/09/27 07:27 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 09:01 AM |
Sandy Bridge CPU article online | Richard Cownie | 2010/09/27 09:40 AM |
Sandy Bridge CPU article online | boots | 2010/09/27 10:19 AM |
Right, mid-2011, not 2010. Sorry (NT) | Richard Cownie | 2010/09/27 10:42 AM |
bulldozer single thread performance | Max | 2010/09/27 11:57 AM |
bulldozer single thread performance | Matt Waldhauer | 2011/03/02 10:32 AM |
Sandy Bridge CPU article online | Pun Zu | 2010/09/27 10:32 AM |
Sandy Bridge CPU article online | ? | 2010/09/27 10:44 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 12:11 PM |
My opinion is that anything that would take advantage of 256-bit AVX | redpriest | 2010/09/27 12:17 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/27 02:09 PM |
My opinion is that anything that would take advantage of 256-bit AVX | redpriest | 2010/09/27 03:06 PM |
My opinion is that anything that would take advantage of 256-bit AVX | David Kanter | 2010/09/27 04:23 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 02:57 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:35 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Matt Waldhauer | 2010/09/28 09:58 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/27 05:39 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:14 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Megol | 2010/09/28 01:17 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Michael S | 2010/09/28 04:47 AM |
PGI | Carlie Coats | 2010/09/28 09:23 AM |
gfortran... | Carlie Coats | 2010/09/29 08:33 AM |
My opinion is that anything that would take advantage of 256-bit AVX | mpx | 2010/09/28 11:58 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Michael S | 2010/09/28 12:36 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Foo_ | 2010/09/29 12:08 AM |
My opinion is that anything that would take advantage of 256-bit AVX | mpx | 2010/09/28 10:37 AM |
My opinion is that anything that would take advantage of 256-bit AVX | Aaron Spink | 2010/09/28 12:19 PM |
My opinion is that anything that would take advantage of 256-bit AVX | hobold | 2010/09/28 02:08 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Ian Ollmann | 2010/09/28 03:26 PM |
My opinion is that anything that would take advantage of 256-bit AVX | Anthony | 2010/09/28 09:31 PM |
Sandy Bridge CPU article online | Hans de Vries | 2010/09/27 01:19 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 02:19 PM |
Sandy Bridge CPU article online | -Sweeper_ | 2010/09/27 04:50 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 05:41 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/27 01:55 PM |
Sandy Bridge CPU article online | line98 | 2010/09/27 02:05 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 02:20 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/27 02:23 PM |
Sandy Bridge CPU article online | line98 | 2010/09/27 02:42 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 08:33 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 03:04 PM |
Sandy Bridge CPU article online | Jack | 2010/09/27 03:40 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 10:47 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/27 10:54 PM |
Sandy Bridge CPU article online | Royi | 2010/09/27 10:59 PM |
Sandy Bridge CPU article online | JS | 2010/09/28 12:18 AM |
Sandy Bridge CPU article online | Royi | 2010/09/28 12:31 AM |
Sandy Bridge CPU article online | Jack | 2010/09/28 05:34 AM |
Sandy Bridge CPU article online | Royi | 2010/09/28 07:22 AM |
Sandy Bridge CPU article online | Foo_ | 2010/09/28 11:53 AM |
Sandy Bridge CPU article online | Paul | 2010/09/28 12:17 PM |
Sandy Bridge CPU article online | mpx | 2010/09/28 12:22 PM |
Sandy Bridge CPU article online | anonymous | 2010/09/28 01:06 PM |
Sandy Bridge CPU article online | IntelUser2000 | 2010/09/29 12:49 AM |
Sandy Bridge CPU article online | Jack | 2010/09/28 04:08 PM |
Sandy Bridge CPU article online | mpx | 2010/09/29 12:50 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/29 11:01 AM |
Sandy Bridge CPU article online | Royi | 2010/09/29 11:48 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 01:15 PM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/29 01:27 PM |
Sandy Bridge CPU article online | ? | 2010/09/29 10:18 PM |
Sandy Bridge CPU article online | savantu | 2010/09/29 11:28 PM |
Sandy Bridge CPU article online | ? | 2010/09/30 02:43 AM |
Sandy Bridge CPU article online | gallier2 | 2010/09/30 03:18 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 07:38 AM |
Sandy Bridge CPU article online | David Hess | 2010/09/30 09:28 AM |
moderation (again) | hobold | 2010/10/01 04:08 AM |
Sandy Bridge CPU article online | Megol | 2010/09/30 01:13 AM |
Sandy Bridge CPU article online | ? | 2010/09/30 02:47 AM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 07:54 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 09:18 AM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 11:04 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 11:38 AM |
Sandy Bridge CPU article online | Michael S | 2010/09/30 12:02 PM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 07:09 PM |
Sandy Bridge CPU article online | mpx | 2010/09/30 11:40 AM |
Sandy Bridge CPU article online | Linus Torvalds | 2010/09/30 12:00 PM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 07:44 PM |
Sandy Bridge CPU article online | David Hess | 2010/09/30 09:36 AM |
Sandy Bridge CPU article online | someone | 2010/09/30 10:23 AM |
Sandy Bridge CPU article online | mpx | 2010/09/30 12:50 PM |
wii lesson | Michael S | 2010/09/30 01:12 PM |
wii lesson | Dan Downs | 2010/09/30 02:33 PM |
wii lesson | Kevin G | 2010/09/30 11:27 PM |
wii lesson | Rohit | 2010/10/01 06:53 AM |
wii lesson | Kevin G | 2010/10/02 02:30 AM |
wii lesson | mpx | 2010/10/01 08:02 AM |
wii lesson | IntelUser2000 | 2010/10/01 08:31 AM |
GPUs and games | David Kanter | 2010/09/30 07:17 PM |
GPUs and games | hobold | 2010/10/01 04:27 AM |
GPUs and games | anonymous | 2010/10/01 05:35 AM |
GPUs and games | Gabriele Svelto | 2010/10/01 08:07 AM |
GPUs and games | Linus Torvalds | 2010/10/01 09:41 AM |
GPUs and games | Anon | 2010/10/01 10:23 AM |
Can Intel do *this* ??? | Mark Roulo | 2010/10/03 02:17 PM |
Can Intel do *this* ??? | Anon | 2010/10/03 02:29 PM |
Can Intel do *this* ??? | Mark Roulo | 2010/10/03 02:55 PM |
Can Intel do *this* ??? | Anon | 2010/10/03 04:45 PM |
Can Intel do *this* ??? | Ian Ameline | 2010/10/03 09:35 PM |
Graphics, IGPs, and Cache | Joe | 2010/10/10 08:51 AM |
Graphics, IGPs, and Cache | Anon | 2010/10/10 09:18 PM |
Graphics, IGPs, and Cache | Rohit | 2010/10/11 05:14 AM |
Graphics, IGPs, and Cache | hobold | 2010/10/11 05:43 AM |
Maybe the IGPU doesn't load into the L3 | Mark Roulo | 2010/10/11 07:05 AM |
Graphics, IGPs, and Cache | David Kanter | 2010/10/11 08:01 AM |
Can Intel do *this* ??? | Gabriele Svelto | 2010/10/03 11:31 PM |
Kanter's Law. | Ian Ameline | 2010/10/01 01:05 PM |
Kanter's Law. | David Kanter | 2010/10/01 01:18 PM |
Kanter's Law. | Ian Ameline | 2010/10/01 01:33 PM |
Kanter's Law. | Kevin G | 2010/10/01 03:19 PM |
Kanter's Law. | IntelUser2000 | 2010/10/01 09:36 PM |
Kanter's Law. | Kevin G | 2010/10/02 02:15 AM |
Kanter's Law. | IntelUser2000 | 2010/10/02 01:35 PM |
Wii vs pc's | Rohit | 2010/10/01 06:34 PM |
Wii vs pc's | Gabriele Svelto | 2010/10/01 10:54 PM |
GPUs and games | mpx | 2010/10/02 10:30 AM |
GPUs and games | Foo_ | 2010/10/02 03:03 PM |
GPUs and games | mpx | 2010/10/03 10:29 AM |
GPUs and games | Foo_ | 2010/10/03 12:52 PM |
GPUs and games | mpx | 2010/10/03 02:29 PM |
GPUs and games | Anon | 2010/10/03 02:49 PM |
GPUs and games | mpx | 2010/10/04 10:42 AM |
GPUs and games | MS | 2010/10/04 01:51 PM |
GPUs and games | Anon | 2010/10/04 07:29 PM |
persistence of vision | hobold | 2010/10/04 10:47 PM |
GPUs and games | mpx | 2010/10/04 11:51 PM |
GPUs and games | MS | 2010/10/05 05:49 AM |
GPUs and games | Jack | 2010/10/05 10:17 AM |
GPUs and games | MS | 2010/10/05 04:19 PM |
GPUs and games | Jack | 2010/10/05 10:11 AM |
GPUs and games | mpx | 2010/10/05 11:51 AM |
GPUs and games | David Kanter | 2010/10/06 08:04 AM |
GPUs and games | jack | 2010/10/06 08:34 PM |
GPUs and games | Linus Torvalds | 2010/10/05 06:29 AM |
GPUs and games | Foo_ | 2010/10/04 03:49 AM |
GPUs and games | Jeremiah | 2010/10/08 09:58 AM |
GPUs and games | MS | 2010/10/08 12:37 PM |
GPUs and games | Salvatore De Dominicis | 2010/10/04 12:41 AM |
GPUs and games | Kevin G | 2010/10/05 01:13 PM |
GPUs and games | mpx | 2010/10/03 10:36 AM |
GPUs and games | David Kanter | 2010/10/04 06:08 AM |
GPUs and games | Kevin G | 2010/10/04 09:38 AM |
Sandy Bridge CPU article online | NEON cortex | 2010/11/17 08:19 PM |
Sandy Bridge CPU article online | Ian Ameline | 2010/09/30 11:06 AM |
Sandy Bridge CPU article online | rwessel | 2010/09/30 01:29 PM |
Sandy Bridge CPU article online | Michael S | 2010/09/30 02:06 PM |
Sandy Bridge CPU article online | rwessel | 2010/09/30 05:55 PM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 02:53 AM |
Sandy Bridge CPU article online | rwessel | 2010/10/01 07:30 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 08:31 AM |
Sandy Bridge CPU article online | rwessel | 2010/10/01 09:56 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:28 PM |
Sandy Bridge CPU article online | Ricardo B | 2010/10/02 04:38 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/02 05:59 PM |
which bus more wasteful | Michael S | 2010/10/02 09:38 AM |
which bus more wasteful | rwessel | 2010/10/02 06:15 PM |
Sandy Bridge CPU article online | Ricardo B | 2010/10/01 09:08 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:31 PM |
Sandy Bridge CPU article online | Andi Kleen | 2010/10/01 10:55 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:32 PM |
Sandy Bridge CPU article online | kdg | 2010/10/01 10:26 AM |
Sandy Bridge CPU article online | Anon | 2010/10/01 10:33 AM |
Analog display out? | David Kanter | 2010/10/01 12:05 PM |
Analog display out? | mpx | 2010/10/02 10:46 AM |
Analog display out? | Anon | 2010/10/03 02:26 PM |
Digital is expensive! | David Kanter | 2010/10/03 05:36 PM |
Digital is expensive! | Anon | 2010/10/03 07:07 PM |
Digital is expensive! | David Kanter | 2010/10/03 09:02 PM |
Digital is expensive! | Steve Underwood | 2010/10/04 02:52 AM |
Digital is expensive! | David Kanter | 2010/10/04 06:03 AM |
Digital is expensive! | anonymous | 2010/10/04 06:11 AM |
Digital is not very expensive! | Steve Underwood | 2010/10/04 05:08 PM |
Digital is not very expensive! | Anon | 2010/10/04 07:33 PM |
Digital is not very expensive! | Steve Underwood | 2010/10/04 10:03 PM |
Digital is not very expensive! | mpx | 2010/10/05 12:10 PM |
Digital is not very expensive! | Gabriele Svelto | 2010/10/04 11:24 PM |
Digital is expensive! | jal142 | 2010/10/04 10:46 AM |
Digital is expensive! | mpx | 2010/10/04 12:04 AM |
Digital is expensive! | Gabriele Svelto | 2010/10/04 02:28 AM |
Digital is expensive! | Mark Christiansen | 2010/10/04 02:12 PM |
Analog display out? | slacker | 2010/10/03 05:44 PM |
Analog display out? | Anon | 2010/10/03 07:05 PM |
Analog display out? | Steve Underwood | 2010/10/04 02:48 AM |
Sandy Bridge CPU article online | David Hess | 2010/10/01 07:37 PM |
Sandy Bridge CPU article online | slacker | 2010/10/02 01:53 PM |
Sandy Bridge CPU article online | David Hess | 2010/10/02 05:49 PM |
memory bandwith | Max | 2010/09/30 11:19 AM |
memory bandwith | Anon | 2010/10/01 10:28 AM |
memory bandwith | Jack | 2010/10/01 06:45 PM |
memory bandwith | Anon | 2010/10/03 02:19 PM |
Sandy Bridge CPU article online | PiedPiper | 2010/09/30 06:05 PM |
Sandy Bridge CPU article online | Matt Sayler | 2010/09/29 03:38 PM |
Sandy Bridge CPU article online | Jack | 2010/09/29 08:39 PM |
Sandy Bridge CPU article online | mpx | 2010/09/29 11:24 PM |
Sandy Bridge CPU article online | passer | 2010/09/30 02:15 AM |
Sandy Bridge CPU article online | mpx | 2010/09/30 02:47 AM |
Sandy Bridge CPU article online | passer | 2010/09/30 03:25 AM |
SB and web browsing | Rohit | 2010/09/30 05:47 AM |
SB and web browsing | David Hess | 2010/09/30 06:10 AM |
SB and web browsing | MS | 2010/09/30 09:21 AM |
SB and web browsing | passer | 2010/09/30 09:26 AM |
SB and web browsing | MS | 2010/10/02 05:41 PM |
SB and web browsing | Rohit | 2010/10/01 07:02 AM |
Sandy Bridge CPU article online | David Kanter | 2010/09/30 07:35 AM |
Sandy Bridge CPU article online | Jack | 2010/09/30 09:40 PM |
processor evolution | hobold | 2010/09/29 01:16 PM |
processor evolution | Foo_ | 2010/09/30 05:10 AM |
processor evolution | Jack | 2010/09/30 06:07 PM |
3D gaming as GPGPU app | hobold | 2010/10/01 03:59 AM |
3D gaming as GPGPU app | Jack | 2010/10/01 06:39 PM |
processor evolution | hobold | 2010/10/01 03:35 AM |
processor evolution | David Kanter | 2010/10/01 09:02 AM |
processor evolution | Anon | 2010/10/01 10:46 AM |
Display | David Kanter | 2010/10/01 12:26 PM |
Display | Rohit | 2010/10/02 01:56 AM |
Display | Linus Torvalds | 2010/10/02 06:40 AM |
Display | rwessel | 2010/10/02 07:58 AM |
Display | sJ | 2010/10/02 09:28 PM |
Display | rwessel | 2010/10/03 07:38 AM |
Display | Anon | 2010/10/03 02:06 PM |
Display tech and compute are different | David Kanter | 2010/10/03 05:33 PM |
Display tech and compute are different | Anon | 2010/10/03 07:16 PM |
Display tech and compute are different | David Kanter | 2010/10/03 09:00 PM |
Display tech and compute are different | hobold | 2010/10/04 12:40 AM |
Display | ? | 2010/10/03 02:02 AM |
Display | Linus Torvalds | 2010/10/03 09:18 AM |
Display | Richard Cownie | 2010/10/03 10:12 AM |
Display | Linus Torvalds | 2010/10/03 11:16 AM |
Display | slacker | 2010/10/03 06:35 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/04 06:06 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/04 10:44 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/04 01:59 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/04 02:13 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/04 07:58 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/05 12:39 AM |
current V12 engines with >6.0 displacement | MS | 2010/10/05 05:57 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/05 12:20 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/05 08:26 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 04:39 AM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 12:22 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/06 02:07 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 02:56 PM |
current V12 engines with >6.0 displacement | rwessel | 2010/10/06 02:30 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 02:53 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/07 12:32 PM |
current V12 engines with >6.0 displacement | rwessel | 2010/10/07 06:54 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 08:02 PM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | slacker | 2010/10/06 06:20 PM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | Ricardo B | 2010/10/07 12:32 AM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | slacker | 2010/10/07 07:15 AM |
Top Gear is awful, and Jeremy Clarkson cannot drive. | Ricardo B | 2010/10/07 09:51 AM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 04:03 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 05:26 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 10:15 PM |
current V12 engines with >6.0 displacement | Howard Chu | 2010/10/07 01:16 PM |
current V12 engines with >6.0 displacement | Anon | 2010/10/05 09:31 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 04:55 AM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/06 05:15 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/06 05:34 AM |
I wonder is there any tech area that this forum doesn't have an opinion on (NT) | Rob Thorpe | 2010/10/06 09:11 AM |
Cunieform tablets | David Kanter | 2010/10/06 11:57 AM |
Cunieform tablets | Linus Torvalds | 2010/10/06 12:06 PM |
Ouch...maybe I should hire a new editor (NT) | David Kanter | 2010/10/06 03:38 PM |
Cunieform tablets | rwessel | 2010/10/06 02:41 PM |
Cunieform tablets | seni | 2010/10/07 09:56 AM |
Cunieform tablets | Howard Chu | 2010/10/07 12:44 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/06 05:10 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/06 09:44 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 06:55 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 07:51 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 06:38 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 07:33 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/07 08:04 PM |
Practical vehicles for commuting | Rob Thorpe | 2010/10/08 04:50 AM |
Practical vehicles for commuting | Gabriele Svelto | 2010/10/08 05:05 AM |
Practical vehicles for commuting | Rob Thorpe | 2010/10/08 05:21 AM |
Practical vehicles for commuting | j | 2010/10/08 01:20 PM |
Practical vehicles for commuting | Rob Thorpe | 2010/12/09 06:00 AM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/08 09:14 AM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/07 12:23 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/07 03:08 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 04:41 PM |
current V12 engines with >6.0 displacement | slacker | 2010/10/07 07:05 PM |
current V12 engines with >6.0 displacement | anonymous | 2010/10/07 07:52 PM |
current V12 engines with >6.0 displacement | Anonymous | 2010/10/08 06:52 PM |
current V12 engines with >6.0 displacement | anon | 2010/10/06 10:28 PM |
current V12 engines with >6.0 displacement | Aaron Spink | 2010/10/06 11:37 PM |
current V12 engines with >6.0 displacement | Ricardo B | 2010/10/07 12:37 AM |
current V12 engines with >6.0 displacement | slacker | 2010/10/05 01:02 AM |
Display | Linus Torvalds | 2010/10/04 09:39 AM |
Display | Gabriele Svelto | 2010/10/04 11:34 PM |
Display | Richard Cownie | 2010/10/04 05:22 AM |
Display | anon | 2010/10/04 08:22 PM |
Display | Richard Cownie | 2010/10/05 05:42 AM |
Display | mpx | 2010/10/03 10:55 AM |
Display | rcf | 2010/10/03 12:12 PM |
Display | mpx | 2010/10/03 01:36 PM |
Display | rcf | 2010/10/03 04:36 PM |
Display | Ricardo B | 2010/10/04 01:50 PM |
Display | gallier2 | 2010/10/05 02:44 AM |
Display | David Hess | 2010/10/05 04:21 AM |
Display | gallier2 | 2010/10/05 07:21 AM |
Display | David Hess | 2010/10/03 10:21 PM |
Display | rcf | 2010/10/04 07:06 AM |
Display | David Kanter | 2010/10/03 12:54 PM |
Alternative integration | Paul A. Clayton | 2010/10/06 07:51 AM |
Display | slacker | 2010/10/03 06:26 PM |
Display & marketing & analogies | ? | 2010/10/04 01:33 AM |
Display & marketing & analogies | kdg | 2010/10/04 05:00 AM |
Display | Kevin G | 2010/10/02 08:49 AM |
Display | Anon | 2010/10/03 02:43 PM |
Sandy Bridge CPU article online | David Kanter | 2010/09/29 02:17 PM |
Sandy Bridge CPU article online | Jack | 2010/09/28 05:27 AM |
Sandy Bridge CPU article online | IntelUser2000 | 2010/09/28 02:07 AM |
Sandy Bridge CPU article online | mpx | 2010/09/28 11:34 AM |
Sandy Bridge CPU article online | Aaron Spink | 2010/09/28 12:28 PM |
Sandy Bridge CPU article online | JoshW | 2010/09/28 01:13 PM |
Sandy Bridge CPU article online | mpx | 2010/09/28 01:54 PM |
Sandy Bridge CPU article online | Foo_ | 2010/09/29 12:19 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 02:06 AM |
Sandy Bridge CPU article online | JS | 2010/09/29 02:42 AM |
Sandy Bridge CPU article online | mpx | 2010/09/29 03:03 AM |
Sandy Bridge CPU article online | Foo_ | 2010/09/29 04:55 AM |
Sandy Bridge CPU article online | ajensen | 2010/09/27 11:19 PM |
Sandy Bridge CPU article online | Ian Ollmann | 2010/09/28 03:52 PM |
Sandy Bridge CPU article online | a reader | 2010/09/28 04:05 PM |
Sandy Bridge CPU article online | ajensen | 2010/09/28 10:35 PM |
Updated: Sandy Bridge CPU article | David Kanter | 2010/10/01 04:11 AM |
Updated: Sandy Bridge CPU article | anon | 2011/01/07 08:55 PM |
Updated: Sandy Bridge CPU article | Eric Bron | 2011/01/08 02:29 AM |
Updated: Sandy Bridge CPU article | anon | 2011/01/11 10:24 PM |
Updated: Sandy Bridge CPU article | anon | 2011/01/15 10:21 AM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anon | 2011/01/16 10:22 PM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anonymous | 2011/01/17 01:04 AM |
David Kanter can you shed some light? Re Updated: Sandy Bridge CPU article | anon | 2011/01/17 06:12 AM |
I can try.... | David Kanter | 2011/01/18 02:54 PM |
I can try.... | anon | 2011/01/18 07:07 PM |
I can try.... | David Kanter | 2011/01/18 10:24 PM |
I can try.... | anon | 2011/01/19 06:51 AM |
Wider fetch than execute makes sense | Paul A. Clayton | 2011/01/19 07:53 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/04 06:29 AM |
Sandy Bridge CPU article online | Seni | 2011/01/04 08:07 PM |
Sandy Bridge CPU article online | hobold | 2011/01/04 10:26 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 01:01 AM |
software assist exceptions | hobold | 2011/01/05 03:36 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 12:58 AM |
Sandy Bridge CPU article online | anon | 2011/01/05 03:51 AM |
Sandy Bridge CPU article online | Seni | 2011/01/05 07:53 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 08:03 AM |
Sandy Bridge CPU article online | anon | 2011/01/05 03:14 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 03:50 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/05 04:00 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 06:26 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/05 06:50 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 07:39 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 02:50 PM |
permuting vector elements | hobold | 2011/01/05 04:03 PM |
permuting vector elements | Nicolas Capens | 2011/01/05 05:01 PM |
permuting vector elements | Nicolas Capens | 2011/01/06 07:27 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/11 10:33 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/11 12:51 PM |
Sandy Bridge CPU article online | hobold | 2011/01/11 01:11 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/11 05:07 PM |
Sandy Bridge CPU article online | Michael S | 2011/01/12 02:25 AM |
Sandy Bridge CPU article online | hobold | 2011/01/12 04:03 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/12 10:27 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/13 01:38 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/13 02:32 AM |
Sandy Bridge CPU article online | hobold | 2011/01/13 12:53 PM |
What happened to VPERMIL2PS? | Michael S | 2011/01/13 02:46 AM |
What happened to VPERMIL2PS? | Eric Bron | 2011/01/13 05:46 AM |
Lower cost permute | Paul A. Clayton | 2011/01/13 11:11 AM |
Sandy Bridge CPU article online | anon | 2011/01/25 05:31 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/12 05:34 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/13 06:38 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/15 08:47 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/01/16 02:13 AM |
And just to make a further example | Gabriele Svelto | 2011/01/16 03:24 AM |
Sandy Bridge CPU article online | mpx | 2011/01/16 12:27 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/25 01:56 PM |
Sandy Bridge CPU article online | David Kanter | 2011/01/25 03:11 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/26 07:49 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/26 03:35 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/27 01:51 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/27 01:40 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/28 02:24 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/28 02:49 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/30 01:11 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/31 02:43 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/01 03:02 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/01 03:28 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/01 03:43 AM |
Sandy Bridge CPU article online | EduardoS | 2011/01/28 06:14 PM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/01 01:58 AM |
Sandy Bridge CPU article online | EduardoS | 2011/02/01 01:36 PM |
Sandy Bridge CPU article online | anon | 2011/02/01 03:56 PM |
Sandy Bridge CPU article online | EduardoS | 2011/02/01 08:17 PM |
Sandy Bridge CPU article online | anon | 2011/02/01 09:13 PM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/02 03:08 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/02/02 03:26 AM |
Sandy Bridge CPU article online | kalmaegi | 2011/02/01 08:29 AM |
SW Rasterization | David Kanter | 2011/01/27 04:18 PM |
Lower pin count memory | iz | 2011/01/27 08:19 PM |
Lower pin count memory | David Kanter | 2011/01/27 08:25 PM |
Lower pin count memory | iz | 2011/01/27 10:31 PM |
Lower pin count memory | David Kanter | 2011/01/27 10:52 PM |
Lower pin count memory | iz | 2011/01/27 11:28 PM |
Lower pin count memory | David Kanter | 2011/01/28 12:05 AM |
Lower pin count memory | iz | 2011/01/28 02:55 AM |
Lower pin count memory | David Hess | 2011/01/28 12:15 PM |
Lower pin count memory | David Kanter | 2011/01/28 12:57 PM |
Lower pin count memory | iz | 2011/01/28 04:20 PM |
Two years later | ForgotPants | 2013/10/26 10:33 AM |
Two years later | anon | 2013/10/26 10:36 AM |
Two years later | Exophase | 2013/10/26 11:56 AM |
Two years later | David Hess | 2013/10/26 04:05 PM |
Herz is totally the thing you DON*T care. | Jouni Osmala | 2013/10/27 12:48 AM |
Herz is totally the thing you DON*T care. | EduardoS | 2013/10/27 06:00 AM |
Herz is totally the thing you DON*T care. | Michael S | 2013/10/27 06:45 AM |
Two years later | someone | 2013/10/28 06:21 AM |
Lower pin count memory | Martin Høyer Kristiansen | 2011/01/28 12:41 AM |
Lower pin count memory | iz | 2011/01/28 02:07 AM |
Lower pin count memory | Darrell Coker | 2011/01/27 09:39 PM |
Lower pin count memory | iz | 2011/01/27 11:20 PM |
Lower pin count memory | Darrell Coker | 2011/01/28 05:07 PM |
Lower pin count memory | iz | 2011/01/28 10:57 PM |
Lower pin count memory | Darrell Coker | 2011/01/29 01:21 AM |
Lower pin count memory | iz | 2011/01/31 09:28 PM |
SW Rasterization | Nicolas Capens | 2011/02/02 07:48 AM |
SW Rasterization | Eric Bron | 2011/02/02 08:37 AM |
SW Rasterization | Nicolas Capens | 2011/02/02 03:35 PM |
SW Rasterization | Eric Bron | 2011/02/02 04:11 PM |
SW Rasterization | Eric Bron | 2011/02/03 01:13 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 06:57 AM |
SW Rasterization | Eric Bron | 2011/02/04 07:50 AM |
erratum | Eric Bron | 2011/02/04 07:58 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 04:25 PM |
SW Rasterization | David Kanter | 2011/02/04 04:33 PM |
SW Rasterization | anon | 2011/02/04 05:04 PM |
SW Rasterization | Nicolas Capens | 2011/02/05 02:39 PM |
SW Rasterization | David Kanter | 2011/02/05 04:07 PM |
SW Rasterization | Nicolas Capens | 2011/02/05 10:39 PM |
SW Rasterization | Eric Bron | 2011/02/04 09:55 AM |
Comments pt 1 | David Kanter | 2011/02/02 12:08 PM |
Comments pt 1 | Eric Bron | 2011/02/02 02:16 PM |
Comments pt 1 | Gabriele Svelto | 2011/02/03 12:37 AM |
Comments pt 1 | Eric Bron | 2011/02/03 01:36 AM |
Comments pt 1 | Nicolas Capens | 2011/02/03 10:08 PM |
Comments pt 1 | Nicolas Capens | 2011/02/03 09:26 PM |
Comments pt 1 | Eric Bron | 2011/02/04 02:33 AM |
Comments pt 1 | Nicolas Capens | 2011/02/04 04:24 AM |
example code | Eric Bron | 2011/02/04 03:51 AM |
example code | Nicolas Capens | 2011/02/04 07:24 AM |
example code | Eric Bron | 2011/02/04 07:36 AM |
example code | Nicolas Capens | 2011/02/05 10:43 PM |
Comments pt 1 | Rohit | 2011/02/04 11:43 AM |
Comments pt 1 | Nicolas Capens | 2011/02/04 04:05 PM |
Comments pt 1 | David Kanter | 2011/02/04 04:36 PM |
Comments pt 1 | Nicolas Capens | 2011/02/05 01:45 PM |
Comments pt 1 | Eric Bron | 2011/02/05 03:13 PM |
Comments pt 1 | Nicolas Capens | 2011/02/05 10:52 PM |
Comments pt 1 | Eric Bron | 2011/02/06 12:31 AM |
Comments pt 1 | Nicolas Capens | 2011/02/06 03:06 PM |
Comments pt 1 | Eric Bron | 2011/02/07 02:12 AM |
The need for gather/scatter support | Nicolas Capens | 2011/02/10 09:07 AM |
The need for gather/scatter support | Eric Bron | 2011/02/11 02:11 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/13 02:39 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 06:46 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 06:48 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 08:32 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 09:07 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 08:00 AM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 06:49 AM |
Gather/scatter performance data | Eric Bron | 2011/02/15 01:23 AM |
Gather/scatter performance data | Eric Bron | 2011/02/13 04:06 PM |
Gather/scatter performance data | Nicolas Capens | 2011/02/14 06:52 AM |
Gather/scatter performance data | Eric Bron | 2011/02/14 08:43 AM |
SW Rasterization - a long way off | Rohit | 2011/02/02 12:17 PM |
SW Rasterization - a long way off | Nicolas Capens | 2011/02/04 02:59 AM |
CPU only rendering - a long way off | Rohit | 2011/02/04 10:52 AM |
CPU only rendering - a long way off | Nicolas Capens | 2011/02/04 06:15 PM |
CPU only rendering - a long way off | Rohit | 2011/02/05 01:00 AM |
CPU only rendering - a long way off | Nicolas Capens | 2011/02/05 08:45 PM |
CPU only rendering - a long way off | David Kanter | 2011/02/06 08:51 PM |
CPU only rendering - a long way off | Gian-Carlo Pascutto | 2011/02/06 11:22 PM |
Encryption | David Kanter | 2011/02/07 12:18 AM |
Encryption | Nicolas Capens | 2011/02/07 06:51 AM |
Encryption | David Kanter | 2011/02/07 10:50 AM |
Encryption | Nicolas Capens | 2011/02/08 09:26 AM |
CPUs are latency optimized | David Kanter | 2011/02/08 10:38 AM |
efficient compiler on an efficient GPU real today. | sJ | 2011/02/08 10:29 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/09 08:49 PM |
CPUs are latency optimized | Eric Bron | 2011/02/09 11:49 PM |
CPUs are latency optimized | Antti-Ville Tuunainen | 2011/02/10 05:16 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/10 06:04 AM |
CPUs are latency optimized | Eric Bron | 2011/02/10 06:48 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/10 12:31 PM |
CPUs are latency optimized | Eric Bron | 2011/02/11 01:43 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/11 06:31 AM |
CPUs are latency optimized | EduardoS | 2011/02/10 04:29 PM |
CPUs are latency optimized | Anon | 2011/02/10 05:40 PM |
CPUs are latency optimized | David Kanter | 2011/02/10 07:33 PM |
CPUs are latency optimized | EduardoS | 2011/02/11 01:18 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/11 04:56 AM |
CPUs are latency optimized | Rohit | 2011/02/11 06:33 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/14 01:19 AM |
CPUs are latency optimized | Eric Bron | 2011/02/14 02:23 AM |
CPUs are latency optimized | EduardoS | 2011/02/14 12:11 PM |
CPUs are latency optimized | David Kanter | 2011/02/11 01:45 PM |
CPUs are latency optimized | Nicolas Capens | 2011/02/15 04:22 AM |
CPUs are latency optimized | David Kanter | 2011/02/15 11:47 AM |
CPUs are latency optimized | Nicolas Capens | 2011/02/15 06:10 PM |
Have fun | David Kanter | 2011/02/15 09:04 PM |
Have fun | Nicolas Capens | 2011/02/17 02:59 AM |
Have fun | Brett | 2011/02/17 11:56 AM |
Have fun | Nicolas Capens | 2011/02/19 03:53 PM |
Have fun | Brett | 2011/02/20 05:08 PM |
Have fun | Brett | 2011/02/20 06:13 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/02/23 04:37 PM |
On-die storage to fight Amdahl | Brett | 2011/02/23 08:59 PM |
On-die storage to fight Amdahl | Brett | 2011/02/23 09:08 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/02/24 06:42 PM |
On-die storage to fight Amdahl | Rohit | 2011/02/25 10:02 PM |
On-die storage to fight Amdahl | Nicolas Capens | 2011/03/09 05:53 PM |
On-die storage to fight Amdahl | Rohit | 2011/03/10 07:02 AM |
NVIDIA using tile based rendering? | Nathan Monson | 2011/03/11 06:58 PM |
NVIDIA using tile based rendering? | Rohit | 2011/03/12 03:29 AM |
NVIDIA using tile based rendering? | Nathan Monson | 2011/03/12 10:05 AM |
NVIDIA using tile based rendering? | Rohit | 2011/03/12 10:16 AM |
On-die storage to fight Amdahl | Brett | 2011/02/26 01:10 AM |
On-die storage to fight Amdahl | Nathan Monson | 2011/02/26 12:51 PM |
On-die storage to fight Amdahl | Brett | 2011/02/26 03:40 PM |
Convergence is inevitable | Nicolas Capens | 2011/03/09 07:22 PM |
Convergence is inevitable | Brett | 2011/03/09 09:59 PM |
Convergence is inevitable | Antti-Ville Tuunainen | 2011/03/10 02:34 PM |
Convergence is inevitable | Brett | 2011/03/10 08:39 PM |
Procedural texturing? | David Kanter | 2011/03/11 12:32 AM |
Procedural texturing? | hobold | 2011/03/11 02:59 AM |
Procedural texturing? | Dan Downs | 2011/03/11 08:28 AM |
Procedural texturing? | Mark Roulo | 2011/03/11 01:58 PM |
Procedural texturing? | Anon | 2011/03/11 05:11 PM |
Procedural texturing? | Nathan Monson | 2011/03/11 06:30 PM |
Procedural texturing? | Brett | 2011/03/15 06:45 AM |
Procedural texturing? | Seni | 2011/03/15 09:13 AM |
Procedural texturing? | Brett | 2011/03/15 10:45 AM |
Procedural texturing? | Seni | 2011/03/15 01:09 PM |
Procedural texturing? | Brett | 2011/03/11 09:02 PM |
Procedural texturing? | Brett | 2011/03/11 08:34 PM |
Procedural texturing? | Eric Bron | 2011/03/12 02:37 AM |
Convergence is inevitable | Jouni Osmala | 2011/03/09 10:28 PM |
Convergence is inevitable | Brett | 2011/04/05 04:08 PM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 04:23 AM |
Convergence is inevitable | none | 2011/04/07 06:03 AM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 09:34 AM |
Convergence is inevitable | anon | 2011/04/07 01:15 PM |
Convergence is inevitable | none | 2011/04/08 12:57 AM |
Convergence is inevitable | Brett | 2011/04/07 07:04 PM |
Convergence is inevitable | none | 2011/04/08 01:14 AM |
Gather implementation | David Kanter | 2011/04/08 11:01 AM |
RAM Latency | David Hess | 2011/04/07 07:22 AM |
RAM Latency | Brett | 2011/04/07 06:20 PM |
RAM Latency | Nicolas Capens | 2011/04/07 09:18 PM |
RAM Latency | Brett | 2011/04/08 04:33 AM |
RAM Latency | Nicolas Capens | 2011/04/10 01:23 PM |
RAM Latency | Rohit | 2011/04/08 05:57 AM |
RAM Latency | Nicolas Capens | 2011/04/10 12:23 PM |
RAM Latency | David Kanter | 2011/04/10 01:27 PM |
RAM Latency | Rohit | 2011/04/11 05:17 AM |
Convergence is inevitable | Eric Bron | 2011/04/07 08:46 AM |
Convergence is inevitable | Nicolas Capens | 2011/04/07 08:50 PM |
Convergence is inevitable | Eric Bron | 2011/04/07 11:39 PM |
Flaws in PowerVR | Rohit | 2011/02/25 10:21 PM |
Flaws in PowerVR | Brett | 2011/02/25 11:37 PM |
Flaws in PowerVR | Paul | 2011/02/26 04:17 AM |
Have fun | David Kanter | 2011/02/18 11:52 AM |
Have fun | Michael S | 2011/02/19 11:12 AM |
Have fun | David Kanter | 2011/02/19 02:26 PM |
Have fun | Michael S | 2011/02/19 03:43 PM |
Have fun | anon | 2011/02/19 04:02 PM |
Have fun | Michael S | 2011/02/19 04:56 PM |
Have fun | anon | 2011/02/20 02:50 PM |
Have fun | EduardoS | 2011/02/20 01:44 PM |
Linear vs non-linear | EduardoS | 2011/02/20 01:55 PM |
Have fun | Michael S | 2011/02/20 03:19 PM |
Have fun | EduardoS | 2011/02/20 04:51 PM |
Have fun | Nicolas Capens | 2011/02/21 10:12 AM |
Have fun | Michael S | 2011/02/21 11:38 AM |
Have fun | Eric Bron | 2011/02/21 01:10 PM |
Have fun | Eric Bron | 2011/02/21 01:39 PM |
Have fun | Michael S | 2011/02/21 05:13 PM |
Have fun | Eric Bron | 2011/02/21 11:43 PM |
Have fun | Michael S | 2011/02/22 12:47 AM |
Have fun | Eric Bron | 2011/02/22 01:10 AM |
Have fun | Michael S | 2011/02/22 10:37 AM |
Have fun | anon | 2011/02/22 12:38 PM |
Have fun | EduardoS | 2011/02/22 02:49 PM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/23 05:37 PM |
Gather/scatter efficiency | anonymous | 2011/02/23 05:51 PM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/24 05:57 PM |
Gather/scatter efficiency | anonymous | 2011/02/24 06:16 PM |
Gather/scatter efficiency | Michael S | 2011/02/25 06:45 AM |
Gather implementation | David Kanter | 2011/02/25 04:34 PM |
Gather implementation | Michael S | 2011/02/26 09:40 AM |
Gather implementation | anon | 2011/02/26 10:52 AM |
Gather implementation | Michael S | 2011/02/26 11:16 AM |
Gather implementation | anon | 2011/02/26 10:22 PM |
Gather implementation | Michael S | 2011/02/27 06:23 AM |
Gather/scatter efficiency | Nicolas Capens | 2011/02/28 02:14 PM |
Consider yourself ignored | David Kanter | 2011/02/22 12:05 AM |
one more anti-FMA flame. By me. | Michael S | 2011/02/16 06:40 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/16 07:30 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/16 08:15 AM |
one more anti-FMA flame. By me. | Nicolas Capens | 2011/02/17 05:27 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/17 06:42 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/17 04:46 PM |
Tarantula paper | Paul A. Clayton | 2011/02/17 11:38 PM |
Tarantula paper | Nicolas Capens | 2011/02/19 04:19 PM |
anti-FMA != anti-throughput or anti-SG | Eric Bron | 2011/02/18 12:48 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/20 02:46 PM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/20 04:00 PM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 03:05 AM |
Software pipelining on x86 | David Kanter | 2011/02/23 04:04 AM |
Software pipelining on x86 | JS | 2011/02/23 04:25 AM |
Software pipelining on x86 | Salvatore De Dominicis | 2011/02/23 07:37 AM |
Software pipelining on x86 | Jouni Osmala | 2011/02/23 08:10 AM |
Software pipelining on x86 | LeeMiller | 2011/02/23 09:07 PM |
Software pipelining on x86 | Nicolas Capens | 2011/02/24 02:17 PM |
Software pipelining on x86 | anonymous | 2011/02/24 06:04 PM |
Software pipelining on x86 | Nicolas Capens | 2011/02/28 08:27 AM |
Software pipelining on x86 | Antti-Ville Tuunainen | 2011/03/02 03:31 AM |
Software pipelining on x86 | Megol | 2011/03/02 11:55 AM |
Software pipelining on x86 | Geert Bosch | 2011/03/03 06:58 AM |
FMA benefits and latency predictions | David Kanter | 2011/02/25 04:14 PM |
FMA benefits and latency predictions | Antti-Ville Tuunainen | 2011/02/26 09:43 AM |
FMA benefits and latency predictions | Matt Waldhauer | 2011/02/27 05:42 AM |
FMA benefits and latency predictions | Nicolas Capens | 2011/03/09 05:11 PM |
FMA benefits and latency predictions | Rohit | 2011/03/10 07:11 AM |
FMA benefits and latency predictions | Eric Bron | 2011/03/10 08:30 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/23 04:19 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 06:50 AM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/23 09:37 AM |
FMA and beyond | Nicolas Capens | 2011/02/24 03:47 PM |
detour on terminology | hobold | 2011/02/24 06:08 PM |
detour on terminology | Nicolas Capens | 2011/02/28 01:24 PM |
detour on terminology | Eric Bron | 2011/03/01 01:38 AM |
detour on terminology | Michael S | 2011/03/01 04:03 AM |
detour on terminology | Eric Bron | 2011/03/01 04:39 AM |
detour on terminology | Michael S | 2011/03/01 07:33 AM |
detour on terminology | Eric Bron | 2011/03/01 08:34 AM |
erratum | Eric Bron | 2011/03/01 08:54 AM |
detour on terminology | Nicolas Capens | 2011/03/10 07:39 AM |
detour on terminology | Eric Bron | 2011/03/10 08:50 AM |
anti-FMA != anti-throughput or anti-SG | Nicolas Capens | 2011/02/23 05:12 AM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 10:25 PM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/17 05:51 PM |
Tarantula vector unit well-integrated | Paul A. Clayton | 2011/02/17 11:38 PM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/19 01:17 PM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 01:09 AM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/20 08:55 AM |
anti-FMA != anti-throughput or anti-SG | David Kanter | 2011/02/20 12:39 PM |
anti-FMA != anti-throughput or anti-SG | EduardoS | 2011/02/20 01:35 PM |
anti-FMA != anti-throughput or anti-SG | Megol | 2011/02/21 07:12 AM |
anti-FMA != anti-throughput or anti-SG | anon | 2011/02/17 09:44 PM |
anti-FMA != anti-throughput or anti-SG | Michael S | 2011/02/18 05:20 AM |
one more anti-FMA flame. By me. | Eric Bron | 2011/02/17 07:24 AM |
thanks | Michael S | 2011/02/17 03:56 PM |
CPUs are latency optimized | EduardoS | 2011/02/15 12:24 PM |
SwiftShader SNB test | Eric Bron | 2011/02/15 02:46 PM |
SwiftShader NHM test | Eric Bron | 2011/02/15 03:50 PM |
SwiftShader SNB test | Nicolas Capens | 2011/02/16 11:06 PM |
SwiftShader SNB test | Eric Bron | 2011/02/17 12:21 AM |
SwiftShader SNB test | Eric Bron | 2011/02/22 09:32 AM |
SwiftShader SNB test 2nd run | Eric Bron | 2011/02/22 09:51 AM |
SwiftShader SNB test 2nd run | Nicolas Capens | 2011/02/23 01:14 PM |
SwiftShader SNB test 2nd run | Eric Bron | 2011/02/23 01:42 PM |
Win7SP1 out but no AVX hype? | Michael S | 2011/02/24 02:14 AM |
Win7SP1 out but no AVX hype? | Eric Bron | 2011/02/24 02:39 AM |
CPUs are latency optimized | Eric Bron | 2011/02/15 07:02 AM |
CPUs are latency optimized | EduardoS | 2011/02/11 02:40 PM |
CPU only rendering - not a long way off | Nicolas Capens | 2011/02/07 05:45 AM |
CPU only rendering - not a long way off | David Kanter | 2011/02/07 11:09 AM |
CPU only rendering - not a long way off | anonymous | 2011/02/07 09:25 PM |
Sandy Bridge IGP EUs | David Kanter | 2011/02/07 10:22 PM |
Sandy Bridge IGP EUs | Hannes | 2011/02/08 04:59 AM |
SW Rasterization - Why? | Seni | 2011/02/02 01:53 PM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/10 02:12 PM |
Market reasons to ditch the IGP | Seni | 2011/02/11 04:42 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/16 03:29 AM |
Market reasons to ditch the IGP | Seni | 2011/02/16 12:39 PM |
An excellent post! | David Kanter | 2011/02/16 02:18 PM |
CPUs clock higher | Moritz | 2011/02/17 07:06 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/18 05:22 PM |
Market reasons to ditch the IGP | IntelUser2000 | 2011/02/18 06:20 PM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/21 01:42 PM |
Bad data (repeated) | David Kanter | 2011/02/21 11:21 PM |
Bad data (repeated) | none | 2011/02/22 02:04 AM |
13W or 8W? | Foo_ | 2011/02/22 05:00 AM |
13W or 8W? | Linus Torvalds | 2011/02/22 07:58 AM |
13W or 8W? | David Kanter | 2011/02/22 10:33 AM |
13W or 8W? | Mark Christiansen | 2011/02/22 01:47 PM |
Bigger picture | Nicolas Capens | 2011/02/24 05:33 PM |
Bigger picture | Nicolas Capens | 2011/02/24 07:06 PM |
20+ Watt | Nicolas Capens | 2011/02/24 07:18 PM |
<20W | David Kanter | 2011/02/25 12:13 PM |
>20W | Nicolas Capens | 2011/03/08 06:34 PM |
IGP is 3X more efficient | David Kanter | 2011/03/08 09:53 PM |
IGP is 3X more efficient | Eric Bron | 2011/03/09 01:44 AM |
>20W | Eric Bron | 2011/03/09 02:48 AM |
Specious data and claims are still specious | David Kanter | 2011/02/25 01:38 AM |
IGP power consumption, LRB samplers | Nicolas Capens | 2011/03/08 05:24 PM |
IGP power consumption, LRB samplers | EduardoS | 2011/03/08 05:52 PM |
IGP power consumption, LRB samplers | Rohit | 2011/03/09 06:42 AM |
Market reasons to ditch the IGP | none | 2011/02/22 01:58 AM |
Market reasons to ditch the IGP | Nicolas Capens | 2011/02/24 05:43 PM |
Market reasons to ditch the IGP | slacker | 2011/02/22 01:32 PM |
Market reasons to ditch the IGP | Seni | 2011/02/18 08:51 PM |
Correction - 28 comparators, not 36. (NT) | Seni | 2011/02/18 09:03 PM |
Market reasons to ditch the IGP | Gabriele Svelto | 2011/02/19 12:49 AM |
Market reasons to ditch the IGP | Seni | 2011/02/19 10:59 AM |
Market reasons to ditch the IGP | Exophase | 2011/02/20 09:43 AM |
Market reasons to ditch the IGP | EduardoS | 2011/02/19 09:13 AM |
Market reasons to ditch the IGP | Seni | 2011/02/19 10:46 AM |
The next revolution | Nicolas Capens | 2011/02/22 02:33 AM |
The next revolution | Gabriele Svelto | 2011/02/22 08:15 AM |
The next revolution | Eric Bron | 2011/02/22 08:48 AM |
The next revolution | Nicolas Capens | 2011/02/23 06:39 PM |
The next revolution | Gabriele Svelto | 2011/02/23 11:43 PM |
GPGPU content creation (or lack of it) | Nicolas Capens | 2011/02/28 06:39 AM |
GPGPU content creation (or lack of it) | The market begs to differ | 2011/03/01 05:32 AM |
GPGPU content creation (or lack of it) | Nicolas Capens | 2011/03/09 08:14 PM |
GPGPU content creation (or lack of it) | Gabriele Svelto | 2011/03/10 12:01 AM |
The market begs to differ | Gabriele Svelto | 2011/03/01 05:33 AM |
The next revolution | Anon | 2011/02/24 01:15 AM |
The next revolution | Nicolas Capens | 2011/02/28 01:34 PM |
The next revolution | Seni | 2011/02/22 01:02 PM |
The next revolution | Gabriele Svelto | 2011/02/23 05:27 AM |
The next revolution | Seni | 2011/02/23 08:03 AM |
The next revolution | Nicolas Capens | 2011/02/24 05:11 AM |
The next revolution | Seni | 2011/02/24 07:45 PM |
IGP sampler count | Nicolas Capens | 2011/03/03 04:19 AM |
Latency and throughput optimized cores | Nicolas Capens | 2011/03/07 02:28 PM |
The real reason no IGP /CPU converge. | Jouni Osmala | 2011/03/07 10:34 PM |
Still converging | Nicolas Capens | 2011/03/13 02:08 PM |
Homogeneous CPU advantages | Nicolas Capens | 2011/03/07 11:12 PM |
Homogeneous CPU advantages | Seni | 2011/03/08 08:23 AM |
Homogeneous CPU advantages | David Kanter | 2011/03/08 10:16 AM |
Homogeneous CPU advantages | Brett | 2011/03/09 02:37 AM |
Homogeneous CPU advantages | Jouni Osmala | 2011/03/08 11:27 PM |
SW Rasterization | firsttimeposter | 2011/02/03 10:18 PM |
SW Rasterization | Nicolas Capens | 2011/02/04 03:48 AM |
SW Rasterization | Eric Bron | 2011/02/04 04:14 AM |
SW Rasterization | Nicolas Capens | 2011/02/04 07:36 AM |
SW Rasterization | Eric Bron | 2011/02/04 07:42 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/26 02:23 AM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/02/04 03:31 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/05 07:46 PM |
Sandy Bridge CPU article online | Gabriele Svelto | 2011/02/06 05:20 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/02/06 05:07 PM |
Sandy Bridge CPU article online | arch.comp | 2011/01/06 09:58 PM |
Sandy Bridge CPU article online | Seni | 2011/01/07 09:25 AM |
Sandy Bridge CPU article online | Michael S | 2011/01/05 03:28 AM |
Sandy Bridge CPU article online | Nicolas Capens | 2011/01/05 05:06 AM |
permuting vector elements (yet again) | hobold | 2011/01/05 04:15 PM |
permuting vector elements (yet again) | Nicolas Capens | 2011/01/06 05:11 AM |
Sandy Bridge CPU article online | Eric Bron | 2011/01/05 11:46 AM |
wow ...! | hobold | 2011/01/05 04:19 PM |
wow ...! | Nicolas Capens | 2011/01/05 05:11 PM |
wow ...! | Eric Bron | 2011/01/05 09:46 PM |
compress LUT | Eric Bron | 2011/01/05 10:05 PM |
wow ...! | Michael S | 2011/01/06 01:25 AM |
wow ...! | Nicolas Capens | 2011/01/06 05:26 AM |
wow ...! | Eric Bron | 2011/01/06 08:08 AM |
wow ...! | Nicolas Capens | 2011/01/07 06:19 AM |
wow ...! | Steve Underwood | 2011/01/07 09:53 PM |
saturation | hobold | 2011/01/08 09:25 AM |
saturation | Steve Underwood | 2011/01/08 11:38 AM |
saturation | Michael S | 2011/01/08 12:05 PM |
128 bit floats | Brett | 2011/01/08 12:39 PM |
128 bit floats | Michael S | 2011/01/08 01:10 PM |
128 bit floats | Anil Maliyekkel | 2011/01/08 02:46 PM |
128 bit floats | Kevin G | 2011/02/27 10:15 AM |
128 bit floats | hobold | 2011/02/27 03:42 PM |
128 bit floats | Ian Ollmann | 2011/02/28 03:56 PM |
OpenCL FP accuracy | hobold | 2011/03/01 05:45 AM |
OpenCL FP accuracy | anon | 2011/03/01 07:03 PM |
OpenCL FP accuracy | hobold | 2011/03/02 02:53 AM |
OpenCL FP accuracy | Eric Bron | 2011/03/02 06:10 AM |
pet project | hobold | 2011/03/02 08:22 AM |
pet project | Anon | 2011/03/02 08:10 PM |
pet project | hobold | 2011/03/03 03:57 AM |
pet project | Eric Bron | 2011/03/03 01:29 AM |
pet project | hobold | 2011/03/03 04:14 AM |
pet project | Eric Bron | 2011/03/03 02:10 PM |
pet project | hobold | 2011/03/03 03:04 PM |
OpenCL and AMD | Vincent Diepeveen | 2011/03/07 12:44 PM |
OpenCL and AMD | Eric Bron | 2011/03/08 01:05 AM |
OpenCL and AMD | Vincent Diepeveen | 2011/03/08 07:27 AM |
128 bit floats | Michael S | 2011/02/27 03:46 PM |
128 bit floats | Anil Maliyekkel | 2011/02/27 05:14 PM |
saturation | Steve Underwood | 2011/01/17 03:42 AM |
wow ...! | hobold | 2011/01/06 04:05 PM |
Ring | Moritz | 2011/01/20 09:51 PM |
Ring | Antti-Ville Tuunainen | 2011/01/21 11:25 AM |
Ring | Moritz | 2011/01/23 12:38 AM |
Ring | Michael S | 2011/01/23 03:04 AM |
So fast | Moritz | 2011/01/23 06:57 AM |
So fast | David Kanter | 2011/01/23 09:05 AM |
Sandy Bridge CPU (L1D cache) | Gordon Ward | 2011/09/09 01:47 AM |
Sandy Bridge CPU (L1D cache) | David Kanter | 2011/09/09 03:19 PM |
Sandy Bridge CPU (L1D cache) | EduardoS | 2011/09/09 07:53 PM |
Sandy Bridge CPU (L1D cache) | Paul A. Clayton | 2011/09/10 04:12 AM |
Sandy Bridge CPU (L1D cache) | Michael S | 2011/09/10 08:41 AM |
Sandy Bridge CPU (L1D cache) | EduardoS | 2011/09/10 10:17 AM |
Address Ports on Sandy Bridge Scheduler | Victor | 2011/10/16 05:40 AM |
Address Ports on Sandy Bridge Scheduler | EduardoS | 2011/10/16 06:45 PM |
Address Ports on Sandy Bridge Scheduler | Megol | 2011/10/17 08:20 AM |
Address Ports on Sandy Bridge Scheduler | Victor | 2011/10/18 04:34 PM |
Benefits of early scheduling | Paul A. Clayton | 2011/10/18 05:53 PM |
Benefits of early scheduling | Victor | 2011/10/19 04:58 PM |
Consistency and invalidation ordering | Paul A. Clayton | 2011/10/20 03:43 AM |
Address Ports on Sandy Bridge Scheduler | John Upcroft | 2011/10/21 03:16 PM |
Address Ports on Sandy Bridge Scheduler | David Kanter | 2011/10/22 09:49 AM |
Address Ports on Sandy Bridge Scheduler | John Upcroft | 2011/10/26 12:24 PM |
Store TLB look-up at commit? | Paul A. Clayton | 2011/10/26 07:30 PM |
Store TLB look-up at commit? | Richard Scott | 2011/10/26 08:40 PM |
Just a guess | Paul A. Clayton | 2011/10/27 12:54 PM |