By: dmcq (dmcq.delete@this.fano.co.uk), March 5, 2015 10:01 am
Room: Moderated Discussions
Ronald Maas (rmaas.delete@this.wiwo.nl) on March 5, 2015 8:01 am wrote:
> ARM published the Cortex-A72 Technical Reference Manual. Somewhat surprisingly both cores use the exact same
> high-level architecture: 3-wide instruction decoder, 8 pipelines and same L1+ L2 cache configuration.
>
> Only differences I could find between A72 and A57:
> 1) Supports 4 MB L2 cache size (configurable)
> 2) Automatic hardware prefetcher that generates prefetches targeting the L1D cache and the L2 cache
>
> Ronald
Hmm, that looks about it to me too. The prefetch to L1 cache will help, I'd have thought making the L1 data cache three way like the instruction cache would have helped more. In their announcement ARM gave special thanks to the tools vendors in producing the core so I'd guess in Intel terms it is a tick of the A57 microarchitecture and their work night be better described as optimising the A57 design.
The A57 optimisation guide said nothing about cache organization and only a tiny bit on branch misses which are a bit of an omission I think. You can't tell from that guide or the technical documentation how many steps there are in the instruction fetch and decode pipeline or cycles are needed to access L2 cache even which would be prime targets for hardware optimisation.
> ARM published the Cortex-A72 Technical Reference Manual. Somewhat surprisingly both cores use the exact same
> high-level architecture: 3-wide instruction decoder, 8 pipelines and same L1+ L2 cache configuration.
>
> Only differences I could find between A72 and A57:
> 1) Supports 4 MB L2 cache size (configurable)
> 2) Automatic hardware prefetcher that generates prefetches targeting the L1D cache and the L2 cache
>
> Ronald
Hmm, that looks about it to me too. The prefetch to L1 cache will help, I'd have thought making the L1 data cache three way like the instruction cache would have helped more. In their announcement ARM gave special thanks to the tools vendors in producing the core so I'd guess in Intel terms it is a tick of the A57 microarchitecture and their work night be better described as optimising the A57 design.
The A57 optimisation guide said nothing about cache organization and only a tiny bit on branch misses which are a bit of an omission I think. You can't tell from that guide or the technical documentation how many steps there are in the instruction fetch and decode pipeline or cycles are needed to access L2 cache even which would be prime targets for hardware optimisation.