Predicated stores might not be that bad

Article: ARM Goes 64-bit
By: Michael S (already5chosen.delete@this.yahoo.com), August 15, 2012 11:41 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 15, 2012 1:14 am wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on August 14, 2012 7:27 am
> wrote:
> > none (none.delete@this.none.com) on August 14, 2012 6:56 am
> wrote:
> > [snip]
> > >
> > In theory yes, predication looks great.
>
> >
> > Maybe not great, but certainly
> > useful.
> >
> > >In
> practice, as I said in another post, it might constrain too
> > > many
> other things in your micro-architecture, which actually might reduce
> > >
> achievable performance. I really think spending the budget transistor
> >
> and
> > > design specification in doing a smarter branch predictor is
>
> > better.
> >
> > I do not mean to imply that the ARM architects were
> ignorant or
> > stupid. However, I think providing conditional stores is
> more attractive than
> > forcing software to implement them by conditionally
> selecting a store address.
>
> I am skeptical. Stores are very complicated,
> because of the impact on coherency and ordering. They are also rather long
> latency

Strictly speaking, stores have no latency (except you consider latency of store-to load forwarding as a store lentancy) *beacuse* the could be buffered.

> and tend to get heavily buffered. That's exactly the kind of thing I
> wouldn't want to predicate. Who knows what odd side-effects might
> occur?
>
> Predicating a register to register move is tolerable. The scope is
> very limited, and you aren't going to run into too many issues.
>
> More to the
> point, stores are damn power hungry. You need to:
>
> 1. Probe TLB
> 2. Write to
> store buffer (for forwarding)
> 3. Probe tags
> 4. Write to cache
>
> That's
> wasting a huge amount of energy if your predicate is false. Consequently, you'd
> want to evaluate the predicate before the store address is available, to avoid
> wasting power.

But, assuming you really want to avoid branch, because you know that you can't predict it, the replacement technique (select between real and dummy store addresses) is no better, power wise, then conditional store itself. If anything, probably worse.
Still, avoiding complication of store path logic by predication is probably A Good Thing regardless of power cost.

>
> I also don't see what particular use cases predicated stores
> will really target. For vectors, they might make sense...but not normal
> scalars.
>

Conditional stores are potentially very useful for quicksort family of algorithms. One could argue that in todays computing environments quicksort is no more a sorting engine choice since mergesort tends to be faster and with today's abundance of main memory in-place properties of quicksort are less beneficiary. However quicksort core loop is not just about sorts. For example, I am not aware of better algorithm for nth_element.

> > An implementation of predicated stores would only have to be
> equal to the
> > software workaround (and such could probably be implemented
> much like the
> > software version by cracking the store into an address
> generation µop and
> > a data µop where the address generation could
> produce a null
> > address--which would not match with any loads and would
> commit the store data to
> > no-place--No Big Deal [maybe]).
>
> Now you have
> another dependency feeding into the store. Doesn't make life any easier.
>
>
> > Wish branches (that inform hardware of a
> > hammock that is likely to
> benefit from predication) might be better (moving such
> > into the
> microarchitecture--which might treat such as an ordinary branch,
> >
> predicate if "sufficiently short", or make a dynamic choice on handling based
> on
> > prediction confidence information--while still allowing software to
> communicate
> > information that could avoid expensive mispredictions), but
> predicated stores
> > (in my ignorance) do not seem especially
> limiting.
>
> They don't seem like a good idea.
>
> DK
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: ARM Goes 64-bitDavid Kanter2012/08/14 12:04 AM
  New Article: ARM Goes 64-bitnone2012/08/14 12:44 AM
    New Article: ARM Goes 64-bitDavid Kanter2012/08/14 01:04 AM
    MIPS MT-ASEPaul A. Clayton2012/08/14 09:01 AM
      MONITOR/MWAITEduardoS2012/08/14 10:08 AM
        MWAIT not specifically MTPaul A. Clayton2012/08/14 10:36 AM
          MWAIT not specifically MTEduardoS2012/08/15 03:16 PM
        MONITOR/MWAITanonymou52012/08/14 11:07 AM
          MONITOR/MWAITEduardoS2012/08/15 03:20 PM
      MIPS MT-ASErwessel2012/08/14 10:14 AM
  New Article: ARM Goes 64-bitSHK2012/08/14 02:01 AM
  New Article: ARM Goes 64-bitanon2012/08/14 02:37 AM
    New Article: ARM Goes 64-bitRichard Cownie2012/08/14 03:57 AM
      New Article: ARM Goes 64-bitanon2012/08/14 04:29 AM
      New Article: ARM Goes 64-bitnone2012/08/14 04:44 AM
        New Article: ARM Goes 64-bitanon2012/08/14 05:28 AM
          New Article: ARM Goes 64-bitanon2012/08/14 05:32 AM
            New Article: ARM Goes 64-bitEduardoS2012/08/14 06:06 AM
          New Article: ARM Goes 64-bitnone2012/08/14 05:40 AM
            AArch64 select better than cmovPaul A. Clayton2012/08/14 06:08 AM
            New Article: ARM Goes 64-bitanon2012/08/14 06:12 AM
              New Article: ARM Goes 64-bitnone2012/08/14 06:25 AM
                Predicated ld/store are usefulPaul A. Clayton2012/08/14 06:48 AM
                  Predicated ld/store are usefulnone2012/08/14 06:56 AM
                    Predicated ld/store are usefulanon2012/08/14 07:07 AM
                    Predicated stores might not be that badPaul A. Clayton2012/08/14 07:27 AM
                      Predicated stores might not be that badDavid Kanter2012/08/15 01:14 AM
                        Predicated stores might not be that badMichael S2012/08/15 11:41 AM
                        Predicated stores might not be that badR Byron2012/08/17 04:09 AM
                New Article: ARM Goes 64-bitanon2012/08/14 06:54 AM
                  New Article: ARM Goes 64-bitnone2012/08/14 07:04 AM
                    New Article: ARM Goes 64-bitanon2012/08/14 07:43 AM
          New Article: ARM Goes 64-bitEduardoS2012/08/14 06:07 AM
            New Article: ARM Goes 64-bitanon2012/08/14 06:20 AM
              New Article: ARM Goes 64-bitnone2012/08/14 06:29 AM
                New Article: ARM Goes 64-bitanon2012/08/14 07:00 AM
            New Article: ARM Goes 64-bitMichael S2012/08/14 03:43 PM
        New Article: ARM Goes 64-bitRichard Cownie2012/08/14 06:53 AM
          OT: Conrad's "Youth"Richard Cownie2012/08/14 07:20 AM
      New Article: ARM Goes 64-bitEduardoS2012/08/14 06:04 AM
        New Article: ARM Goes 64-bitmpx2012/08/14 08:59 AM
          New Article: ARM Goes 64-bitAntti-Ville Tuunainen2012/08/14 09:16 AM
        New Article: ARM Goes 64-bitanonymou52012/08/14 11:03 AM
          New Article: ARM Goes 64-bitname992012/11/17 03:31 PM
            Microarchitecting a counter registerPaul A. Clayton2012/11/17 07:37 PM
    New Article: ARM Goes 64-bitbakaneko2012/08/14 04:21 AM
      New Article: ARM Goes 64-bitname992012/11/17 03:40 PM
        New Article: ARM Goes 64-bitEduardoS2012/11/17 04:52 PM
        New Article: ARM Goes 64-bitDoug S2012/11/17 05:48 PM
        New Article: ARM Goes 64-bitbakaneko2012/11/18 05:40 PM
          New Article: ARM Goes 64-bitWilco2012/11/19 07:59 AM
            New Article: ARM Goes 64-bitEduardoS2012/11/19 08:23 AM
              New Article: ARM Goes 64-bitWilco2012/11/19 09:31 AM
                Downloading µarch-specific binaries?Paul A. Clayton2012/11/19 11:21 AM
                New Article: ARM Goes 64-bitEduardoS2012/11/19 11:41 AM
                  New Article: ARM Goes 64-bitWilco2012/11/21 07:44 AM
                    JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)VMguy2012/11/22 03:21 AM
                      JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)David Kanter2012/11/22 12:12 PM
                        JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)Gabriele Svelto2012/11/23 03:50 AM
                    New Article: ARM Goes 64-bitEduardoS2012/11/23 10:09 AM
                      New Article: ARM Goes 64-bitEBFE2012/11/26 01:24 AM
                        New Article: ARM Goes 64-bitGabriele Svelto2012/11/26 03:33 AM
                          New Article: ARM Goes 64-bitEBFE2012/11/27 11:17 PM
                            New Article: ARM Goes 64-bitGabriele Svelto2012/11/28 02:32 AM
                        New Article: ARM Goes 64-bitEduardoS2012/11/26 12:16 PM
                          New Article: ARM Goes 64-bitEBFE2012/11/28 12:33 AM
                            New Article: ARM Goes 64-bitEduardoS2012/11/28 05:53 AM
                              New Article: ARM Goes 64-bitMichael S2012/11/28 06:15 AM
                                New Article: ARM Goes 64-bitEduardoS2012/11/28 07:33 AM
                                  New Article: ARM Goes 64-bitMichael S2012/11/28 09:16 AM
                                    New Article: ARM Goes 64-bitEduardoS2012/11/28 09:53 AM
                                    New Article: ARM Goes 64-bitEugene Nalimov2012/11/28 05:58 PM
                                      Amazing!EduardoS2012/11/28 07:25 PM
                                        Amazing! (non-italic response)EduardoS2012/11/28 07:25 PM
                                        Amazing!EBFE2012/11/28 08:20 PM
                                          Undefined behaviour doubles downEduardoS2012/11/28 09:10 PM
                              New Article: ARM Goes 64-bitEBFE2012/11/28 07:54 PM
                                New Article: ARM Goes 64-bitEduardoS2012/11/28 09:21 PM
                Have you heard of Transmeta?David Kanter2012/11/19 03:47 PM
            New Article: ARM Goes 64-bitbakaneko2012/11/19 09:08 AM
            New Article: ARM Goes 64-bitDavid Kanter2012/11/19 03:40 PM
              Semantic Dictionary EncodingRay2012/11/19 10:37 PM
              New Article: ARM Goes 64-bitRohit2012/11/20 04:48 PM
                New Article: ARM Goes 64-bitDavid Kanter2012/11/20 11:07 PM
                  New Article: ARM Goes 64-bitWilco2012/11/21 06:41 AM
                    New Article: ARM Goes 64-bitDavid Kanter2012/11/21 10:12 AM
                    A JIT exampleMark Roulo2012/11/21 10:30 AM
                      A JIT exampleWilco2012/11/21 07:04 PM
                        A JIT examplerwessel2012/11/21 09:05 PM
                        A JIT exampleGabriele Svelto2012/11/23 03:53 AM
                        A JIT exampleEduardoS2012/11/23 10:13 AM
                          A JIT exampleWilco2012/11/23 01:41 PM
                            A JIT exampleEduardoS2012/11/23 02:06 PM
                            A JIT exampleGabriele Svelto2012/11/23 04:09 PM
                              A JIT exampleSymmetry2012/11/26 05:58 AM
            New Article: ARM Goes 64-bitRay2012/11/19 10:27 PM
    New Article: ARM Goes 64-bitDavid Kanter2012/08/14 09:11 AM
  v7-M is Thumb-onlyPaul A. Clayton2012/08/14 06:58 AM
  Minor suggested correctionPaul A. Clayton2012/08/14 08:33 AM
    Minor suggested correctionanon2012/08/14 08:57 AM
  New Article: ARM Goes 64-bitExophase2012/08/14 08:33 AM
    New Article: ARM Goes 64-bitDavid Kanter2012/08/14 09:16 AM
      New Article: ARM Goes 64-bitjigal2012/08/15 01:49 PM
  Correction re ARM and BBC MicroPaul2012/08/14 08:59 PM
    Correction re ARM and BBC MicroPer Hesselgren2012/08/15 03:27 AM
  Memory BW so lowPer Hesselgren2012/08/15 03:14 AM
    Memory BW so lownone2012/08/15 11:16 AM
  New Article: ARM Goes 64-bitdado2012/08/15 10:25 AM
  Number of GPRsKenneth Jonsson2012/08/16 02:35 PM
    Number of GPRsExophase2012/08/16 02:52 PM
      Number of GPRsKenneth Jonsson2012/08/17 02:41 AM
        Ooops, missing link...Kenneth Jonsson2012/08/17 02:44 AM
        64-bit pointers eat some performancePaul A. Clayton2012/08/17 06:19 AM
          64-bit pointers eat some performancebakaneko2012/08/17 08:37 AM
            Brute force seems to workPaul A. Clayton2012/08/17 10:08 AM
              Brute force seems to workbakaneko2012/08/17 11:15 AM
          64-bit pointers eat some performanceRichard Cownie2012/08/17 08:46 AM
            Pointer compression is atypicalPaul A. Clayton2012/08/17 10:43 AM
              Pointer compression is atypicalRichard Cownie2012/08/17 12:57 PM
                Pointer compression is atypicalHoward Chu2012/08/22 10:17 PM
                  Pointer compression is atypicalRichard Cownie2012/08/23 04:48 AM
                    Pointer compression is atypicalHoward Chu2012/08/23 06:51 AM
              Pointer compression is atypicalWilco2012/08/17 02:41 PM
                Pointer compression is atypicalRichard Cownie2012/08/17 04:13 PM
                  Pointer compression is atypicalRicardo B2012/08/19 10:44 AM
                  Pointer compression is atypicalHoward Chu2012/08/22 10:08 PM
                    Unified libraries?Paul A. Clayton2012/08/23 07:49 AM
                    Pointer compression is atypicalRichard Cownie2012/08/23 08:44 AM
                      Pointer compression is atypicalHoward Chu2012/08/23 05:17 PM
                        Pointer compression is atypicalanon2012/08/23 08:15 PM
                          Pointer compression is atypicalHoward Chu2012/08/23 09:33 PM
            64-bit pointers eat some performanceFoo_2012/08/18 12:09 PM
              64-bit pointers eat some performanceRichard Cownie2012/08/18 05:25 PM
                64-bit pointers eat some performanceRichard Cownie2012/08/18 05:32 PM
            Page-related benefit of small pointersPaul A. Clayton2012/08/23 08:36 AM
        Number of GPRsWilco2012/08/17 06:31 AM
          Number of GPRsKenneth Jonsson2012/08/17 11:54 AM
            Number of GPRsExophase2012/08/17 12:44 PM
              Number of GPRsKenneth Jonsson2012/08/17 01:22 PM
                Number of GPRsWilco2012/08/17 02:53 PM
        What about dynamic utilization?Exophase2012/08/17 09:30 AM
          Compiler vs. assembly aliasing knowledge?Paul A. Clayton2012/08/17 10:20 AM
            Compiler vs. assembly aliasing knowledge?Exophase2012/08/17 11:09 AM
            Compiler vs. assembly aliasing knowledge?anon2012/08/18 02:23 AM
              Compiler vs. assembly aliasing knowledge?Ricardo B2012/08/19 11:02 AM
                Compiler vs. assembly aliasing knowledge?anon2012/08/19 06:07 PM
                  Compiler vs. assembly aliasing knowledge?Ricardo B2012/08/19 07:26 PM
                    Compiler vs. assembly aliasing knowledge?anon2012/08/19 10:03 PM
                      Compiler vs. assembly aliasing knowledge?anon2012/08/20 01:59 AM
        Number of GPRsDavid Kanter2012/08/17 12:46 PM
          RAT issues as part of reason 1Paul A. Clayton2012/08/17 02:18 PM
        Number of GPRsname992012/11/17 06:37 PM
          Large ARFs increase renaming costPaul A. Clayton2012/11/17 09:23 PM
    Number of GPRsDavid Kanter2012/08/16 03:31 PM
    Number of GPRsRichard Cownie2012/08/16 05:17 PM
    32 GPRs ~2-3%Paul A. Clayton2012/08/16 06:27 PM
      Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT)Paul A. Clayton2012/08/16 06:29 PM
      32 GPRs ~2-3%Exophase2012/08/16 10:06 PM
        R31 as SP/zero is kind of neat (NT)Paul A. Clayton2012/08/17 06:23 AM
        32 GPRs ~2-3%rwessel2012/08/17 08:24 AM
          32 GPRs ~2-3%Exophase2012/08/17 09:16 AM
            32 GPRs ~2-3%Max2012/08/17 04:19 PM
      32 GPRs ~2-3%name992012/11/17 07:43 PM
    Number of GPRsmpx2012/08/17 01:11 AM
      Latency and powerPaul A. Clayton2012/08/17 06:54 AM
    Number of GPRsbakaneko2012/08/17 03:09 AM
  New Article: ARM Goes 64-bitSteve2012/08/17 02:12 PM
    New Article: ARM Goes 64-bitDavid Kanter2012/08/19 12:42 PM
      New Article: ARM Goes 64-bitDoug S2012/08/19 02:02 PM
      New Article: ARM Goes 64-bitAnon2012/08/19 07:16 PM
      New Article: ARM Goes 64-bitSteve2012/08/30 07:51 AM
  Scalar vs Vector registersRobert David Graham2012/08/19 05:19 PM
    Scalar vs Vector registersDavid Kanter2012/08/19 05:29 PM
  New Article: ARM Goes 64-bitBaserock ARM servers2012/08/21 04:13 PM
    Baserock ARM serversSysanon2012/08/21 04:14 PM
    A-15 virtualization and LPAE?Paul A. Clayton2012/08/21 06:13 PM
      A-15 virtualization and LPAE?Anon2012/08/21 07:13 PM
        Half-depth advantages?Paul A. Clayton2012/08/21 08:42 PM
          Half-depth advantages?Anon2012/08/22 03:33 PM
            Thanks for the information (NT)Paul A. Clayton2012/08/22 04:04 PM
      A-15 virtualization and LPAE?C. Ladisch2012/08/23 11:12 AM
        A-15 virtualization and LPAE?Paul2012/08/23 03:17 PM
        Excessive pessimismPaul A. Clayton2012/08/23 04:08 PM
          Excessive pessimismDavid Kanter2012/08/23 05:05 PM
    New Article: ARM Goes 64-bitMichael S2012/08/22 07:12 AM
      BTW, Baserock==product, Codethink==company (NT)Paul A. Clayton2012/08/22 08:56 AM
  New Article: ARM Goes 64-bitReinoud Zandijk2012/08/21 11:27 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?