Early software zeroing !=== early hardware zeroing

By: Doug S (foo.delete@this.bar.bar), October 5, 2021 12:21 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on October 5, 2021 11:19 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> >>
> >> Zeroing has room for optimization, both since you will often zero more than one page at a time and
> >> because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> >> of the cache.
> >
> > No you don't, actually.
>
> Yes, one does want to avoid evicting a cache line to fill it
> with zeroes that will typically be overwritten before read.
>
> One does want to avoid write misses, but this does not require writing the data. A straightforward method
> to defer such actual zeroing would be to use cache compression that supports cache line granular deduplication.
> Tracking zero pages and having hardware fill on demand seems better (less storage overhead) — compressing
> at page granularity — presumably with decomposition to cache line granularity.
>
> This seems to be merely having hardware perform the same COW optimization
> that OSes typically use a virtual memory page granularity.


I'm not sure CPU architects would appreciate how casually you tossed out "merely" in that context, given the importance of latency to overall cache performance. How few cycles could you add for COWing cache lines before you cancel out the performance increase from not having a full set of zeroed lines? Anywhere from "less than one" to "not very many", I imagine.

The problem isn't so much writing twice to cache in quick succession, it is writing twice to DRAM in quick succession - that's what you must avoid. You want to the rewrite of those zeroed lines to quickly follow filling them with zeroes. Creating a page's worth of zeroes in cache requires two things, allocating the lines and writing the zeroes, and if writing zeroes dominated that time it could be easily optimized away by per line/subline "is zero" bits.

Writing zeroes only dominates if all the lines you allocate are clean. Once you start allocating dirty lines that need to be flushed before they can be used you are forced to wait on main memory before the page is made available. When you need one new page you often need more so you will be doing dirty allocation a lot of the time.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 03:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 10:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 11:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 11:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/17 12:51 AM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 05:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 07:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 02:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 03:58 PM
              microbenchmark resultsMichael S2021/09/19 04:46 PM
                microbenchmark source codeMichael S2021/09/19 04:58 PM
                  microbenchmark source code-.-2021/09/20 04:49 PM
                    microbenchmark source codeMichael S2021/09/21 10:17 AM
                      microbenchmark source code-.-2021/09/21 04:33 PM
                        microbenchmark source codeMichael S2021/09/21 06:05 PM
                microbenchmark resultsAnon2021/09/19 05:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 08:46 PM
                    microbenchmark resultsdmcq2021/09/20 02:19 AM
                      microbenchmark resultsMichael S2021/09/20 05:12 AM
                      microbenchmark results-.-2021/09/20 04:44 PM
                        microbenchmark resultsMichael S2021/09/21 10:23 AM
                          microbenchmark results-.-2021/09/21 04:35 PM
                            microbenchmark resultsAndrey2021/09/21 05:25 PM
                              I agree (NT)Michael S2021/09/21 06:07 PM
                              microbenchmark results-.-2021/09/22 05:56 PM
                                microbenchmark resultsMichael S2021/09/23 06:11 AM
                                  microbenchmark resultsdmcq2021/09/23 07:53 AM
                                  microbenchmark resultsAndrey2021/09/23 10:20 AM
                                microbenchmark resultsAndrey2021/09/23 10:11 AM
                                  microbenchmark results-.-2021/09/23 08:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 02:47 AM
                                      microbenchmark results-.-2021/09/24 06:00 PM
                                    microbenchmark resultsAndrey2021/09/24 08:29 AM
                                      microbenchmark resultsdmcq2021/09/24 01:05 PM
                                        microbenchmark resultsDoug S2021/09/24 02:12 PM
                                          microbenchmark results---2021/09/24 07:06 PM
                                            microbenchmark resultsDoug S2021/09/24 11:46 PM
                                              microbenchmark results---2021/09/25 09:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 02:01 AM
                                                microbenchmark resultsDoug S2021/09/26 09:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 01:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 10:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 07:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:49 PM
                                                microbenchmark resultsBrett2021/09/25 03:31 PM
                                              microbenchmark resultsdmcq2021/09/25 12:51 PM
                                                microbenchmark resultsDoug S2021/09/26 09:45 AM
                                            microbenchmark resultsRichard S2021/09/25 01:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:52 PM
                                                microbenchmark results---2021/09/25 03:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 06:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 04:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 05:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 09:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 09:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 06:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 12:12 PM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 02:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 06:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 06:50 AM
                    microbenchmark results---2021/09/20 07:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 05:10 AM
                      microbenchmark resultsMichael S2021/09/23 05:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 09:24 AM
                          microbenchmark resultsRoyi2021/09/26 04:25 PM
                      microbenchmark resultsdmcq2021/09/23 10:42 AM
                        microbenchmark results---2021/09/23 11:53 AM
                      microbenchmark resultsanon22021/09/23 02:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 01:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 02:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 05:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 06:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 05:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 08:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 11:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 12:23 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 02:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 06:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 06:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 11:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 07:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 07:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 10:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 01:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 01:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 06:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 09:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 01:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 02:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 03:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 07:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 01:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 01:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 03:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 07:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 07:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 08:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 08:12 AM
                                  Did they publish a full description?rwessel2021/09/30 09:18 AM
                                    Did they publish a full description?Michael S2021/09/30 10:24 AM
                                      Did they publish a full description?rwessel2021/09/30 10:42 AM
                                    Did they publish a full description?Adrian2021/10/01 12:22 AM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 10:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/10/01 12:27 AM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 04:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 09:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 10:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 02:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 05:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 04:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 05:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 11:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 02:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 03:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 09:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 10:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 11:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 06:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 10:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 11:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 11:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 03:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 03:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 03:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 07:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 01:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 10:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 10:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 11:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 12:41 PM
                                                              How about environments such as Java?rwessel2021/10/03 12:49 PM
                                                                How about environments such as Java?Mark Roulo2021/10/03 01:22 PM
                                                              How about environments such as Java?anon22021/10/03 07:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 05:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 05:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 06:04 AM
                                                                  How about environments such as Java?anon22021/10/04 06:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 07:31 AM
                                                                How about environments such as Java?---2021/10/04 09:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 10:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 12:14 PM
                                                                      How about environments such as Java?Doug S2021/10/04 01:20 PM
                                                                  How about environments such as Java?anon22021/10/04 02:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 04:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 05:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 11:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 12:21 PM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 04:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 10:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 01:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 01:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 05:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 11:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 02:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 03:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 04:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 05:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 09:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 02:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 11:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 11:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 01:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 05:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 05:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 08:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 10:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 05:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 07:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 08:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 11:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 04:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 06:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 06:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 09:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 10:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 10:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 08:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 09:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 11:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 01:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 11:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 07:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 01:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 05:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 04:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 10:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 06:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 09:47 AM
                    Armv8.8-A and Armv9.3-Ame2021/10/17 05:19 AM
                      Armv8.8-A and Armv9.3-ADoug S2021/10/17 10:17 AM
                        Armv8.8-A and Armv9.3-Ame2021/10/17 12:31 PM
                          Armv8.8-A and Armv9.3-ADoug S2021/10/17 01:33 PM
                            Armv8.8-A and Armv9.3-AzArchJon2021/10/18 10:35 AM
                              Armv8.8-A and Armv9.3-ADoug S2021/10/18 02:35 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?