By: dmcq (dmcq.delete@this.fano.co.uk), April 23, 2015 9:17 am
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on April 23, 2015 7:00 am wrote:
> Jouni Osmala (yeah.delete@this.right.com) on April 23, 2015 3:19 am wrote:
> [snip]
> > But almost every TLB works parallel with load instead of front of it and is primary reason why minimum
> > page size affects the cache design so much and why I always claim it would be really nice to have
> > bigger linear translation between physical and virtual addresses than 4k minimum page size.
>
> The Mill uses virtually tagged caches (including for L2). Homonyms (same virtual address, different physical
> address) are easily handled by an address space ID mechanism (as dcmq indicated); the Mill uses this (a
> "turf ID" is used — a "turf" is a permission domain and need not use a separate pseudo-address-space —
> the handling of UNIX fork is briefly described at the Mill Computing site). Writable synonyms (different
> virtual address, same physical address) are more problematic. I don't know how the Mill is intended to handle
> such. (It should also be noted that the Mill does not treat pointers as simply integers.)
>
> As dcmq also noted, there are a number of mechanisms for
> handling aliasing issues in a virtually tagged cache.
>
> > And little reply to this point.
> > http://millcomputing.com/wiki/Protection#Region_Table
> >
> > And they need to be faster than L1 accesses for any modern design.
> > Virtually tagged L1 caches don't work with SMT and is bad for modern requirement of running
> > lots of active processes and threads. I would really like if it would be option for CPU
> > to have virtually tagged L1 cache and TLB would only be needed for L2 cache but it isn't
> > an option, when you need to handle lots of context switches and coherency traffic.
>
> First, the Mill as currently conceived is poorly suited to SMT. (SoEMT is more friendly
> to the current conception. On the other hand, modifying the design to be more friendly
> to multithreading does not seem likely to be prohibitively difficult or expensive.)
>
> Second, virtually tagged L1 caches are not a problem with SMT. ASIDs work fairly well. (Aside from the writable
> synonym problem, there would be wasted cache capacity for commonly used read-only shared memory.)
>
> The overhead of using 64-bit addresses should be compensated for by the use of huge segments (well
> known regions) for permission checking in the common cases and slightly relaxed timing for permission
> checks. (The Mill also uses per-byte valid bits, further increasing cache metadata.)
Ah yes fork, and shared libraries. I'd forgotten about that. I think mostly from an embedded perspective and there the shared code can go in the system space. That does mean that there is a great deal of physical address sharing irrespective of any locks or anything like that. I better have a good look at how the Mill deals with fork.
> Jouni Osmala (yeah.delete@this.right.com) on April 23, 2015 3:19 am wrote:
> [snip]
> > But almost every TLB works parallel with load instead of front of it and is primary reason why minimum
> > page size affects the cache design so much and why I always claim it would be really nice to have
> > bigger linear translation between physical and virtual addresses than 4k minimum page size.
>
> The Mill uses virtually tagged caches (including for L2). Homonyms (same virtual address, different physical
> address) are easily handled by an address space ID mechanism (as dcmq indicated); the Mill uses this (a
> "turf ID" is used — a "turf" is a permission domain and need not use a separate pseudo-address-space —
> the handling of UNIX fork is briefly described at the Mill Computing site). Writable synonyms (different
> virtual address, same physical address) are more problematic. I don't know how the Mill is intended to handle
> such. (It should also be noted that the Mill does not treat pointers as simply integers.)
>
> As dcmq also noted, there are a number of mechanisms for
> handling aliasing issues in a virtually tagged cache.
>
> > And little reply to this point.
> > http://millcomputing.com/wiki/Protection#Region_Table
> >
> > And they need to be faster than L1 accesses for any modern design.
> > Virtually tagged L1 caches don't work with SMT and is bad for modern requirement of running
> > lots of active processes and threads. I would really like if it would be option for CPU
> > to have virtually tagged L1 cache and TLB would only be needed for L2 cache but it isn't
> > an option, when you need to handle lots of context switches and coherency traffic.
>
> First, the Mill as currently conceived is poorly suited to SMT. (SoEMT is more friendly
> to the current conception. On the other hand, modifying the design to be more friendly
> to multithreading does not seem likely to be prohibitively difficult or expensive.)
>
> Second, virtually tagged L1 caches are not a problem with SMT. ASIDs work fairly well. (Aside from the writable
> synonym problem, there would be wasted cache capacity for commonly used read-only shared memory.)
>
> The overhead of using 64-bit addresses should be compensated for by the use of huge segments (well
> known regions) for permission checking in the common cases and slightly relaxed timing for permission
> checks. (The Mill also uses per-byte valid bits, further increasing cache metadata.)
Ah yes fork, and shared libraries. I'd forgotten about that. I think mostly from an embedded perspective and there the shared code can go in the system space. That does mean that there is a great deal of physical address sharing irrespective of any locks or anything like that. I better have a good look at how the Mill deals with fork.