By: Gabriele Svelto (gabriele.svelto.delete@this.gmail.com), September 26, 2009 1:46 am
Room: Moderated Discussions
Skimming through the information available on the Cortex A9 I have noticed that there are quite a bit of peculiarities in the design. Here's a quick run-down to fuel discussion:
- Out-of-order completion is mentioned though there is very little information about it. The only other mention I found was that an instructions 'releases the resources it is consuming early'. Maybe it means that an instruction can write-back its result and free the renamed register before completion if it safe to do so?
- It seems that the LSU is skewed as they mention a single-cycle load-to-use penalty (same as on the Cortex A8). A hardware prefetcher is also mentioned but there is no information about it.
- Some sort of load-store forwarding mechanism is mentioned in the white paper though it is not described how it works or what it does exactly. Maybe back-to-back load/store couples have a lower latency because the data is allowed to bypass some stages and go directly to the store queue?
- The L1s in Cortex A8 were PIPT, I believe this still holds true for the A9. This means that the TLB must be fairly small, a potentially significant disadvantage in more desktop-oriented workloads. I wonder if there is a second-level TLB in there.
- When configured with an L2 cache, the L2 is exclusive (yay for K7!), certainly a good thing for the smaller incarnations of the A9.
- The A9 has fast loop mode for lower-power operation but in the diagrams available it is depicted as being before the decode stage. This is similar to Conroe/Penryn cores which is strange because it comes after the pre-decode stage, something that the A9 shouldn't have. I would have expected it after the decode stage (like on Nehalem) so I was wondering if its sole purpose is to shut off the L1 I-cache for power savings.
- Finally it seems that they put a lot of effort into making I/O operations as well as thread-related operations very fast. The ACP for example is absolutely brilliant: on current consoles it is normal practice to lock part of the L2 to and write the GPU command buffer into it, then send it over using DMA to save an unnecessary read-write-read copy. Having this done transparently is simply excellent. It seems to me that they've done quite some work to make operations which usually disrupt an OoOE actually run very fast (peripheral I/O, TLS access and cache-to-cache transfers). On this topic I'd like to know more about the GIQ too because interrupt handling is an area which is often underestimated from a performance POV.
- Out-of-order completion is mentioned though there is very little information about it. The only other mention I found was that an instructions 'releases the resources it is consuming early'. Maybe it means that an instruction can write-back its result and free the renamed register before completion if it safe to do so?
- It seems that the LSU is skewed as they mention a single-cycle load-to-use penalty (same as on the Cortex A8). A hardware prefetcher is also mentioned but there is no information about it.
- Some sort of load-store forwarding mechanism is mentioned in the white paper though it is not described how it works or what it does exactly. Maybe back-to-back load/store couples have a lower latency because the data is allowed to bypass some stages and go directly to the store queue?
- The L1s in Cortex A8 were PIPT, I believe this still holds true for the A9. This means that the TLB must be fairly small, a potentially significant disadvantage in more desktop-oriented workloads. I wonder if there is a second-level TLB in there.
- When configured with an L2 cache, the L2 is exclusive (yay for K7!), certainly a good thing for the smaller incarnations of the A9.
- The A9 has fast loop mode for lower-power operation but in the diagrams available it is depicted as being before the decode stage. This is similar to Conroe/Penryn cores which is strange because it comes after the pre-decode stage, something that the A9 shouldn't have. I would have expected it after the decode stage (like on Nehalem) so I was wondering if its sole purpose is to shut off the L1 I-cache for power savings.
- Finally it seems that they put a lot of effort into making I/O operations as well as thread-related operations very fast. The ACP for example is absolutely brilliant: on current consoles it is normal practice to lock part of the L2 to and write the GPU command buffer into it, then send it over using DMA to save an unnecessary read-write-read copy. Having this done transparently is simply excellent. It seems to me that they've done quite some work to make operations which usually disrupt an OoOE actually run very fast (peripheral I/O, TLS access and cache-to-cache transfers). On this topic I'd like to know more about the GIQ too because interrupt handling is an area which is often underestimated from a performance POV.
Topic | Posted By | Date |
---|---|---|
Thoughts and questions on the Cortex A9 | Gabriele Svelto | 2009/09/26 01:46 AM |
Thoughts and questions on the Cortex A9 | none | 2009/09/26 02:27 AM |
Thoughts and questions on the Cortex A9 | jeff | 2009/09/27 04:06 AM |
Thoughts and questions on the Cortex A9 | Michael S | 2009/09/27 04:29 AM |
Thoughts and questions on the Cortex A9 | none | 2009/09/27 05:01 AM |
Thoughts and questions on the Cortex A9 | Howard Chu | 2009/09/27 09:39 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/27 06:03 AM |
Thoughts and questions on the Cortex A9 | jeff | 2009/09/27 07:00 AM |
Thoughts and questions on the Cortex A9 | a reader | 2009/09/27 07:17 AM |
Thoughts and questions on the Cortex A9 | David Kanter | 2009/09/27 07:37 AM |
Thoughts and questions on the Cortex A9 | a reader | 2009/09/27 07:46 AM |
Thoughts and questions on the Cortex A9 | Mat | 2009/10/01 12:04 PM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/10/01 05:09 PM |
Thoughts and questions on the Cortex A9 | anon | 2009/10/01 07:19 PM |
Thoughts and questions on the Cortex A9 | RagingDragon | 2009/09/28 04:11 PM |
Thoughts and questions on the Cortex A9 | Linus Torvalds | 2009/09/27 08:05 AM |
OOO hw vs SW&in-order hw | no thanks | 2009/09/27 03:47 PM |
OOO hw vs SW&in-order hw | Linus Torvalds | 2009/09/28 05:22 AM |
OOO hw vs SW&in-order hw | ? | 2009/09/28 10:37 AM |
OOO hw vs SW&in-order hw | RagingDragon | 2009/09/28 04:22 PM |
OOO hw vs SW&in-order hw | Megol | 2009/09/29 03:35 AM |
OOO hw vs SW&in-order hw | Anders Jensen | 2009/09/28 10:50 PM |
OOO hw vs SW&in-order hw | Linus Torvalds | 2009/09/29 06:44 AM |
OOO hw vs SW&in-order hw | Mark Roulo | 2009/09/29 08:58 AM |
OOO hw vs SW&in-order hw | Linus Torvalds | 2009/09/29 09:30 AM |
3- and 4-issue in-order CPUs | Mark Roulo | 2009/09/29 10:06 AM |
3- and 4-issue in-order CPUs | Linus Torvalds | 2009/09/29 10:29 AM |
3- and 4-issue in-order CPUs | Gian-Carlo Pascutto | 2009/09/29 11:35 PM |
3- and 4-issue in-order CPUs | Michael S | 2009/09/30 01:01 AM |
OOO hw vs SW&in-order hw | mpx | 2009/09/30 03:14 AM |
OOO hw vs SW&in-order hw | Pun Zu | 2009/10/02 01:44 AM |
OOO hw vs SW&in-order hw | none | 2009/10/02 04:22 AM |
OOO hw vs SW&in-order hw | Linus Torvalds | 2009/10/02 06:11 AM |
OOO hw vs SW&in-order hw | a reader | 2009/10/02 08:30 AM |
OOO hw vs SW&in-order hw | Linus Torvalds | 2009/10/02 08:59 AM |
Moorestown | David Kanter | 2009/10/02 09:59 AM |
What's the difference between Moorestown and Pine Trail cores? | anon | 2009/10/03 07:37 PM |
Moorestown | none | 2009/11/03 03:34 PM |
Moorestown | Anon | 2009/11/04 02:17 PM |
Moorestown | none | 2009/11/05 12:38 AM |
Moorestown | David Kanter | 2009/11/05 03:45 PM |
Moorestown | IntelUser2000 | 2009/11/06 03:17 AM |
Moorestown | Anon | 2009/11/06 12:51 PM |
Moorestown | none | 2009/11/07 06:07 AM |
OOO hw vs SW&in-order hw | Anon | 2009/10/02 06:55 PM |
Cluebat for graphics | David Kanter | 2009/10/02 08:19 PM |
Cluebat for graphics | Anon | 2009/10/03 04:45 PM |
Cluebat for graphics | David Kanter | 2009/10/04 12:57 AM |
Cluebat for graphics | Anon | 2009/10/04 07:15 PM |
Cluebat for graphics | David Kanter | 2009/10/05 02:09 AM |
Cluebat for graphics | Anon | 2009/10/05 02:36 PM |
Cluebat for graphics | David Kanter | 2009/10/05 08:54 PM |
Cluebat for graphics | Anon | 2009/10/06 04:58 PM |
OOO hw vs SW&in-order hw | Linus Torvalds | 2009/10/03 05:58 AM |
OOO hw vs SW&in-order hw | slacker | 2009/10/02 08:11 PM |
Linux graphics drivers | RagingDragon | 2009/10/03 07:27 PM |
Linux graphics drivers | anon | 2009/10/04 06:15 AM |
Linux graphics drivers | none | 2009/10/04 09:12 AM |
Thoughts and questions on the Cortex A9 | jeff | 2009/09/27 05:31 PM |
Thoughts and questions on the Cortex A9 | someone | 2009/09/27 08:30 AM |
Thoughts and questions on the Cortex A9 | none | 2009/09/27 09:09 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/27 10:35 AM |
Thoughts and questions on the Cortex A9 | someone | 2009/09/27 10:55 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/28 01:08 AM |
Thoughts and questions on the Cortex A9 | someone | 2009/09/28 04:58 AM |
Thoughts and questions on the Cortex A9 | none | 2009/09/28 05:18 AM |
Thoughts and questions on the Cortex A9 | someone | 2009/09/28 06:35 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/28 07:25 AM |
Thoughts and questions on the Cortex A9 | Michael S | 2009/09/28 10:02 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/29 12:35 AM |
Thoughts and questions on the Cortex A9 | Chuck | 2009/09/28 06:15 PM |
samples | AM | 2009/09/27 10:20 PM |
samples | Wilco | 2009/09/28 12:51 AM |
samples | AM | 2009/09/28 03:16 AM |
Shrinks and process tech | David Kanter | 2009/09/29 12:22 AM |
Thoughts and questions on the Cortex A9 | someone | 2009/09/27 10:42 AM |
Thoughts and questions on the Cortex A9 | none | 2009/09/27 11:52 AM |
Atom to stay in-oder or go OoO? | AM | 2009/09/27 10:09 PM |
Atom to stay in-oder or go OoO? | Ungo | 2009/09/28 04:34 AM |
Atom to stay in-oder or go OoO? | a reader | 2009/09/28 09:15 AM |
Atom to stay in-oder or go OoO? | anon | 2009/09/28 06:25 PM |
Atom to stay in-oder or go OoO? | AM | 2009/09/30 02:32 AM |
Atom to stay in-oder or go OoO? | baxeel | 2009/09/30 07:25 AM |
Atom to stay in-oder or go OoO? | AM | 2009/09/30 10:12 PM |
Atom to stay in-oder or go OoO? | Ungo | 2009/10/01 02:00 AM |
Atom to stay in-oder or go OoO? | AM | 2009/10/01 04:08 AM |
Atom to stay in-oder or go OoO? | anonymous | 2009/10/01 04:33 AM |
Atom to stay in-oder or go OoO? | AM | 2009/10/03 06:24 AM |
Atom to stay in-oder or go OoO? | Pun Zu | 2009/10/02 12:30 AM |
Atom to stay in-oder or go OoO? | Ungo | 2009/10/02 12:11 PM |
Atom to stay in-oder or go OoO? | AM | 2009/10/03 06:22 AM |
Atom to stay in-oder or go OoO? | Ungo | 2009/10/03 01:53 PM |
Atom to stay in-oder or go OoO? | AM | 2009/10/04 07:44 AM |
Atom to stay in-oder or go OoO? | David Kanter | 2009/10/04 10:02 PM |
Atom to stay in-oder or go OoO? | AM | 2009/10/05 06:18 AM |
Atom to stay in-oder or go OoO? | David Kanter | 2009/10/05 10:12 AM |
Atom to stay in-oder or go OoO? | AM | 2009/10/06 03:51 AM |
Atom to stay in-oder or go OoO? | anonymous | 2009/10/06 06:58 AM |
Do you have any proof? | David Kanter | 2009/10/06 08:58 AM |
Do you? | AM | 2009/10/06 10:30 PM |
Of course I do! | anonymous | 2009/10/07 04:58 AM |
Thanks :-) | AM | 2009/10/08 02:17 AM |
Thanks :-) | anonymous | 2009/10/08 04:52 AM |
Thanks :-) | AM | 2009/10/09 02:13 AM |
Thanks :-) | anonymous | 2009/10/09 05:03 AM |
Thanks :-) | Foo_ | 2009/10/09 05:47 AM |
Thanks :-) | AM | 2009/10/10 12:15 AM |
That's what I thought... | David Kanter | 2009/10/07 08:00 AM |
That's what I thought... | AM | 2009/10/08 02:26 AM |
That's what I thought... | anonymous | 2009/10/08 05:02 AM |
let's see... | AM | 2009/10/09 02:09 AM |
let's see... | anonymous | 2009/10/09 04:43 AM |
let's see... | AM | 2009/10/09 04:52 AM |
let's see... | anonymous | 2009/10/09 05:15 AM |
let's see... | AM | 2009/10/10 12:18 AM |
Atom to stay in-oder or go OoO? | someone | 2009/09/28 05:09 AM |
I call Troll | hobold | 2009/09/28 03:51 AM |
I call Troll | someone | 2009/09/28 05:15 AM |
OT: categories of motivation in a forum | hobold | 2009/09/29 05:01 AM |
Thoughts and questions on the Cortex A9 | Michael S | 2009/09/28 09:43 AM |
Thoughts and questions on the Cortex A9 | a reader | 2009/09/28 03:12 PM |
Thoughts and questions on the Cortex A9 | someone else | 2009/09/28 11:25 PM |
Why Cortex A9? | hobold | 2009/09/29 06:20 AM |
Why Cortex A9? | someone else | 2009/09/29 09:57 AM |
Why Cortex A9? | Richard Cownie | 2009/09/29 05:09 PM |
Why Cortex A9? | hobold | 2009/09/29 11:38 PM |
Why Cortex A9? | Richard Cownie | 2009/09/30 05:49 AM |
Why Cortex A9? | hobold | 2009/09/30 06:46 AM |
Why Cortex A9? | none | 2009/09/30 06:56 AM |
Marvell Sheeva and plug computing | Richard Cownie | 2009/09/30 08:03 AM |
Why Cortex A9? | Michael S | 2009/09/30 09:07 AM |
Why Cortex A9? | none | 2009/09/30 09:40 AM |
Why Cortex A9? | Gabriele Svelto | 2009/09/30 11:43 AM |
ARM architectural license | David Kanter | 2009/09/30 04:57 PM |
ARM architectural license | a reader | 2009/10/01 06:25 AM |
ARM architectural license | Richard Cownie | 2009/10/01 07:21 AM |
Why Cortex A9? | slacker | 2009/09/30 06:12 PM |
ARM architectural license | David Kanter | 2009/09/30 06:16 PM |
Why Cortex A9? | Michael S | 2009/10/01 06:45 AM |
Why Cortex A9? | slacker | 2009/10/02 01:41 AM |
Why Cortex A9? | Richard Cownie | 2009/10/02 09:28 AM |
Questions... | David Kanter | 2009/10/02 09:56 AM |
Questions... | Richard Cownie | 2009/10/02 10:29 AM |
Questions... | Wilco | 2009/10/02 12:05 PM |
Questions... | slacker | 2009/10/02 07:51 PM |
Why Cortex A9? | slacker | 2009/10/02 07:44 PM |
Why Cortex A9? | David W. Hess | 2009/09/30 07:42 AM |
Thoughts and questions on the Cortex A9 | Gabriele Svelto | 2009/09/28 12:28 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/26 06:38 AM |
Thoughts and questions on the Cortex A9 | Gabriele Svelto | 2009/09/28 12:38 AM |
Thoughts and questions on the Cortex A9 | Costanza | 2009/10/01 02:45 PM |
Thoughts and questions on the Cortex A9 | sylt | 2009/09/28 04:54 AM |
Thoughts and questions on the Cortex A9 | Wilco | 2009/09/29 12:15 AM |