Optimizing blocksize of data based on memory architecture

By: rocky (rocky.rwt.delete.delete@this.this.gmail.com), March 3, 2022 1:09 pm
Room: Moderated Discussions
rwessel (rwessel.delete@this.yahoo.com) on March 3, 2022 2:44 am wrote:
> rocky (rocky.rwt.delete@this.gmail.com) on March 2, 2022 11:28 pm wrote:
>
> If you're doing these filtering operations sequentially, then prefetching of that data into cache happens
> on all modern large processors. It's almost never worth it for a straight-sequential access pattern, but
> explicit prefetching can sometimes be useful as well. Automatic sequential prefetching is almost always
> very, very good, and sequential passes through memory tend to result in the highest possible bandwidth (NUMA-ish
> systems present some exceptions to that, but let's assume you're on more common hardware)

Everything will be very sequential, so I will take the point that I should try to leverage as much "sequentiality" as possible.

> If you're doing this filtering on multiple cores the situation becomes more complex, although if individual
> items are small they're likely going to be quick to filter, and so coordination across CPUs is going
> to have considerable overhead.

Since it's time-series data without anything involving state-based calculations (e.g.: convolation), the work is very parallel and can be divided nicely among multiple cores. If all data is in-memory, I can split the data into working sets and assign to multiple cores in a machine.

> If these cores are all on the same die, you may still get considerable
> prefetching. If you can group the items in some way you can set individual cores off processing clusters
> of these items, and leave what each core does basically sequential. IOW, keep an auxiliary table
> of pointers to where clusters of large (perhaps megabyte-ish) of items start, and then spin off those
> clusters to the different cores, or keep the circular queue as a collection of large (again megabyte-ish)
> blocks in a list, each of which is processed sequentially.

Tracking offsets into a circular queue, right?

> So the slightly oversimplified answer is that if you want to blast
> through as much memory as possible, do it sequentially.
>
> Also, if you're not measuring, you're almost certainly doing it wrong.

ack, ack
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Optimizing blocksize of data based on memory architecturerocky2022/03/03 12:28 AM
  Optimizing blocksize of data based on memory architecturerwessel2022/03/03 03:44 AM
    Optimizing blocksize of data based on memory architecturerocky2022/03/03 01:09 PM
    Optimizing blocksize of data based on memory architectureJörn Engel2022/03/03 03:28 PM
    Optimizing blocksize of data based on memory architectureBrendan2022/03/03 05:54 PM
  Optimizing blocksize of data based on memory architectureMark2022/03/03 10:17 AM
    Optimizing blocksize of data based on memory architecturerocky2022/03/03 01:04 PM
      Optimizing blocksize of data based on memory architectureMark2022/03/03 02:21 PM
  What data rate?Mark Roulo2022/03/03 12:23 PM
    What data rate?rocky2022/03/03 12:54 PM
      What data rate?rwessel2022/03/04 01:26 PM
  Optimizing blocksize of data based on memory architectureAnon2022/03/04 01:59 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊