By: Paul (pavel.delete@this.noa-labs.com), December 23, 2020 3:54 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on December 23, 2020 2:55 pm wrote:
> Paul (pavel.delete@this.noa-labs.com) on December 22, 2020 6:12 pm wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on December 22, 2020 2:28 pm wrote:
> > > Paul (pavel.delete@this.noa-labs.com) on December 22, 2020 12:42 pm wrote:
> > > > Michael S (already5chosen.delete@this.yahoo.com) on December 21, 2020 1:00 am wrote:
> > > > > Paul (pavel.delete@this.noa-labs.com) on December 20, 2020 11:29 pm wrote:
> > > > > > Konrad Schwarz (no.spam.delete@this.no.spam) on December 20, 2020 9:34 am wrote:
> > > > > > > David Kanter (dkanter.delete@this.realworldtech.com) on December 19, 2020 11:51 am wrote:
> > > > > > > > As I've said before, the ECC that is being done in LPDDR is entirely different from ECC for server memory.
> > > > > > >
> > > > > > > Why can't servers profit from ECC integrated into DRAM?
> > > > > > >
> > > > > > > If ECC detection and correction can be done by the memory
> > > > > > > device autonomously during row refresh cycles would
> > > > > > > seem to be a big win, except that you need x64 devices. The
> > > > > > > fact that correctable errors will no longer be reported
> > > > > > > precisely seems like a secondary concern; simple parity could be used to ensure correct transfers.
> > > > > > >
> > > > > >
> > > > > > This is where the embedded seems to be going to
> > > > > >
> > > > > > http://www.xingmem.com/en/product_xm8a.php
> > > > > >
> > > > > > If you can make devices use ECC to hide the refresh cycle, and all the DRAM peculiarities, you
> > > > > > get a dumb SRAM like interface facing the system, greatly simplifying the host SoC/MCU design.
> > > > >
> > > > >
> > > > > You can't hide variable latency of access except by make best-case latency huge.
> > > > >
> > > >
> > > > You can!
> > > >
> > > > You just don't use the data written/read from the row being refreshed, and use
> > > > multi-bit error correction codes to "hide" that. This is how that XRAM works.
> > > >
> > > > You just need to make sure that the smallest accessible piece
> > > > of RAM covers more real rows than error correction
> > > > can fix, and do error correction on read to cover the case when read comes instantly after write.
> > >
> > > Even ignoring refresh, today in many workloads more than half of accesses are Row Hits which are 2 times
> > > faster* than Raw Misses (i.e. accesses to fully precharged bank) and 3 times faster than Raw Conflicts
> > > (i.e. accesses to bank that is currently active with different raw). In fixed-latency setup you will have
> > > all accesses as slow as Raw Conflicts which would be more than twice slower than today's average.
> > > And that's before we consider limitations imposed by limited supply current, in particular tFAW. Or,
> > > may be, maximal sustained bandwidth of this devices is so low that tFAW never becomes an issue?
> > >
> > > ---------------
> > > * - All latencies, as measured on memory device balls
> > >
> >
> > 166mhz with 4ns Tco. Completely nothing to sneeze at.
> >
>
>
> Almost 40 times lower bandwidth than state of the art DDR5 is also not something to be particularly proud of.
> However it *is* quick enough for current limiters off-the-shelf DRAM devices like tFAV and even more importantly
> tRRD to become a problem. Of course, if they design an array by themselves, they are free to use much smaller
> pages (rows) than off-the-shelf parts. That will allow shorter tRRD and could completely eliminate tFAV.
>
> BTW, what series are your talking about? Supposedly, XM7A? But I can't find a datasheet for this series.
> Without a datasheet, I can only guess what tCO actually means and if it applies to CL=1 or CL=2. Also,
> in order to calculate practically achievable latency for synchronous device we should know few other
> numbers apart from tCO and CL, most importantly, tSU/tH of address/control lines.
>
> XM8A in at your link appear to be 100 MHz in theory, but, accounting for various board
> and buffering delay a real-world design would get trouble beating 65-70 MT/s.
>
> > I believe you can avoid hitting a conflict if you always spread bits on more banks than your
> > data width. And you can tweak your error correction so that it will always be able to provide
> > more correction than maximum possible number of rows being precharged at given frequency.
>
> Overall, call me skeptical. IMHO, they are going nowhere.
> Ancient PC133 SDR SDRAM sounds like better proposition
And that is already better than what most MCUs, and computing ASICs around can work with. Then add a whopping 72mb size for the biggest XM7.
> Paul (pavel.delete@this.noa-labs.com) on December 22, 2020 6:12 pm wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on December 22, 2020 2:28 pm wrote:
> > > Paul (pavel.delete@this.noa-labs.com) on December 22, 2020 12:42 pm wrote:
> > > > Michael S (already5chosen.delete@this.yahoo.com) on December 21, 2020 1:00 am wrote:
> > > > > Paul (pavel.delete@this.noa-labs.com) on December 20, 2020 11:29 pm wrote:
> > > > > > Konrad Schwarz (no.spam.delete@this.no.spam) on December 20, 2020 9:34 am wrote:
> > > > > > > David Kanter (dkanter.delete@this.realworldtech.com) on December 19, 2020 11:51 am wrote:
> > > > > > > > As I've said before, the ECC that is being done in LPDDR is entirely different from ECC for server memory.
> > > > > > >
> > > > > > > Why can't servers profit from ECC integrated into DRAM?
> > > > > > >
> > > > > > > If ECC detection and correction can be done by the memory
> > > > > > > device autonomously during row refresh cycles would
> > > > > > > seem to be a big win, except that you need x64 devices. The
> > > > > > > fact that correctable errors will no longer be reported
> > > > > > > precisely seems like a secondary concern; simple parity could be used to ensure correct transfers.
> > > > > > >
> > > > > >
> > > > > > This is where the embedded seems to be going to
> > > > > >
> > > > > > http://www.xingmem.com/en/product_xm8a.php
> > > > > >
> > > > > > If you can make devices use ECC to hide the refresh cycle, and all the DRAM peculiarities, you
> > > > > > get a dumb SRAM like interface facing the system, greatly simplifying the host SoC/MCU design.
> > > > >
> > > > >
> > > > > You can't hide variable latency of access except by make best-case latency huge.
> > > > >
> > > >
> > > > You can!
> > > >
> > > > You just don't use the data written/read from the row being refreshed, and use
> > > > multi-bit error correction codes to "hide" that. This is how that XRAM works.
> > > >
> > > > You just need to make sure that the smallest accessible piece
> > > > of RAM covers more real rows than error correction
> > > > can fix, and do error correction on read to cover the case when read comes instantly after write.
> > >
> > > Even ignoring refresh, today in many workloads more than half of accesses are Row Hits which are 2 times
> > > faster* than Raw Misses (i.e. accesses to fully precharged bank) and 3 times faster than Raw Conflicts
> > > (i.e. accesses to bank that is currently active with different raw). In fixed-latency setup you will have
> > > all accesses as slow as Raw Conflicts which would be more than twice slower than today's average.
> > > And that's before we consider limitations imposed by limited supply current, in particular tFAW. Or,
> > > may be, maximal sustained bandwidth of this devices is so low that tFAW never becomes an issue?
> > >
> > > ---------------
> > > * - All latencies, as measured on memory device balls
> > >
> >
> > 166mhz with 4ns Tco. Completely nothing to sneeze at.
> >
>
>
> Almost 40 times lower bandwidth than state of the art DDR5 is also not something to be particularly proud of.
> However it *is* quick enough for current limiters off-the-shelf DRAM devices like tFAV and even more importantly
> tRRD to become a problem. Of course, if they design an array by themselves, they are free to use much smaller
> pages (rows) than off-the-shelf parts. That will allow shorter tRRD and could completely eliminate tFAV.
>
> BTW, what series are your talking about? Supposedly, XM7A? But I can't find a datasheet for this series.
> Without a datasheet, I can only guess what tCO actually means and if it applies to CL=1 or CL=2. Also,
> in order to calculate practically achievable latency for synchronous device we should know few other
> numbers apart from tCO and CL, most importantly, tSU/tH of address/control lines.
>
> XM8A in at your link appear to be 100 MHz in theory, but, accounting for various board
> and buffering delay a real-world design would get trouble beating 65-70 MT/s.
>
> > I believe you can avoid hitting a conflict if you always spread bits on more banks than your
> > data width. And you can tweak your error correction so that it will always be able to provide
> > more correction than maximum possible number of rows being precharged at given frequency.
>
> Overall, call me skeptical. IMHO, they are going nowhere.
> Ancient PC133 SDR SDRAM sounds like better proposition
And that is already better than what most MCUs, and computing ASICs around can work with. Then add a whopping 72mb size for the biggest XM7.