By: Mark Roulo (nothanks.delete@this.xxx.com), March 3, 2022 12:23 pm
Room: Moderated Discussions
rocky (rocky.rwt.delete@this.gmail.com) on March 2, 2022 11:28 pm wrote:
> I have a stream of data that I wish to capture as a log and persist in memory. Messages (records)
> will be small. There will be filter/aggregation operations performed on the log, similar to
> a time-series database. Old messages should be wiped as new messages come in (FIFO).
>
> I could allocate a large block of memory and treat it as a circular queue. As new messages
> come in, they simply overwrite old messages. Conceptually, this is very simple.
>
> One alternative is to maintain a circular queue of pointers, each pointing to a distinct
> block of memory (N bytes). My thinking is that, to take advantage of important bus widths
> in memory hierarchies, the blocked approach could be significantly better. Such widths
> include: DRAM bus width, memory prefetch width, and cache line size.
>
> I'm sure this is a well-explored problem space, so I thought I'd check with the experts here. In my
> mind, the queue of pointers could allow for overlap between (a) fetching blocks from main memory into
> cache lines and (b) computation on the data now-loaded into the cache, but I also wouldn't be surprised
> to learn these types of problems are often just bottlenecked by main memory bandwidth.
What throughput do you need? You can easily optimize something that isn't the bottleneck.
Also, how many threads inserting and how many threads processing the data?
> I have a stream of data that I wish to capture as a log and persist in memory. Messages (records)
> will be small. There will be filter/aggregation operations performed on the log, similar to
> a time-series database. Old messages should be wiped as new messages come in (FIFO).
>
> I could allocate a large block of memory and treat it as a circular queue. As new messages
> come in, they simply overwrite old messages. Conceptually, this is very simple.
>
> One alternative is to maintain a circular queue of pointers, each pointing to a distinct
> block of memory (N bytes). My thinking is that, to take advantage of important bus widths
> in memory hierarchies, the blocked approach could be significantly better. Such widths
> include: DRAM bus width, memory prefetch width, and cache line size.
>
> I'm sure this is a well-explored problem space, so I thought I'd check with the experts here. In my
> mind, the queue of pointers could allow for overlap between (a) fetching blocks from main memory into
> cache lines and (b) computation on the data now-loaded into the cache, but I also wouldn't be surprised
> to learn these types of problems are often just bottlenecked by main memory bandwidth.
What throughput do you need? You can easily optimize something that isn't the bottleneck.
Also, how many threads inserting and how many threads processing the data?
Topic | Posted By | Date |
---|---|---|
Optimizing blocksize of data based on memory architecture | rocky | 2022/03/03 12:28 AM |
Optimizing blocksize of data based on memory architecture | rwessel | 2022/03/03 03:44 AM |
Optimizing blocksize of data based on memory architecture | rocky | 2022/03/03 01:09 PM |
Optimizing blocksize of data based on memory architecture | Jörn Engel | 2022/03/03 03:28 PM |
Optimizing blocksize of data based on memory architecture | Brendan | 2022/03/03 05:54 PM |
Optimizing blocksize of data based on memory architecture | Mark | 2022/03/03 10:17 AM |
Optimizing blocksize of data based on memory architecture | rocky | 2022/03/03 01:04 PM |
Optimizing blocksize of data based on memory architecture | Mark | 2022/03/03 02:21 PM |
What data rate? | Mark Roulo | 2022/03/03 12:23 PM |
What data rate? | rocky | 2022/03/03 12:54 PM |
What data rate? | rwessel | 2022/03/04 01:26 PM |
Optimizing blocksize of data based on memory architecture | Anon | 2022/03/04 01:59 PM |