By: dmcq (dmcq.delete@this.fano.co.uk), January 11, 2021 3:32 am
Room: Moderated Discussions
Terry Gray (cuyahogan.delete@this.aol.com) on January 7, 2021 9:47 am wrote:
> Jörn Engel (joern.delete@this.purestorage.com) on January 7, 2021 9:05 am wrote:
> > Emanuel Rylke (ema.delete@this.mailbox.org) on January 7, 2021 12:49 am wrote:
> > >
> > > What about doing it not as a workaround for broken hardware
> > > but to make it more easy to show that the hardware
> > > is broken? In theory I know that I'm probably getting bit
> > > errors and that's bad(TM) but if a cat /proc/bit_errors
> > > showed me that I got at least 5 since boot I would be much more motivated to do something about it.
> >
> > Unrealistic for writable pages. Doable for read-only. You need a bit of shadow memory to store
> > the checksums and some fast hash function. Assuming you cannot use vector instructions, performance
> > would be 16 bytes per cycle or 256 cycles per page. You should calculate hashes when pages turn
> > read-only, again before they become writable and maybe periodically in between.
> >
> > Do you care enough to write a patch?
>
> Back in the 1960s Oregon State University had a CDC 3300 (24 bit computer).
>
> Some of the other students I shared an office with wrote an operating system for it called OS3
> (Oregon State Open Shop Operating System).
>
> It had parity memory and to recover from errors in progrem code each sector had an exclusive OR
> of the contents as the last word in a sector. When an error occurred the word in error was known
> so they could calculate what that word should have been. So this idea is not new. But interesting
> that I have never heard of it being used anywhere else (although it may have been).
>
> Terry
>
I did exactly that in the 90's for a system where we could just restart a task if a parity error occurred in the writable data. I calculated the thing would fall over once a week if we didn't. The memory was cleansed by a timer task and also code could be corrected if it happened while executing and had two copies of the correction code to cut down the danger area. I stll would have very much preferred ECC!
> Jörn Engel (joern.delete@this.purestorage.com) on January 7, 2021 9:05 am wrote:
> > Emanuel Rylke (ema.delete@this.mailbox.org) on January 7, 2021 12:49 am wrote:
> > >
> > > What about doing it not as a workaround for broken hardware
> > > but to make it more easy to show that the hardware
> > > is broken? In theory I know that I'm probably getting bit
> > > errors and that's bad(TM) but if a cat /proc/bit_errors
> > > showed me that I got at least 5 since boot I would be much more motivated to do something about it.
> >
> > Unrealistic for writable pages. Doable for read-only. You need a bit of shadow memory to store
> > the checksums and some fast hash function. Assuming you cannot use vector instructions, performance
> > would be 16 bytes per cycle or 256 cycles per page. You should calculate hashes when pages turn
> > read-only, again before they become writable and maybe periodically in between.
> >
> > Do you care enough to write a patch?
>
> Back in the 1960s Oregon State University had a CDC 3300 (24 bit computer).
>
> Some of the other students I shared an office with wrote an operating system for it called OS3
> (Oregon State Open Shop Operating System).
>
> It had parity memory and to recover from errors in progrem code each sector had an exclusive OR
> of the contents as the last word in a sector. When an error occurred the word in error was known
> so they could calculate what that word should have been. So this idea is not new. But interesting
> that I have never heard of it being used anywhere else (although it may have been).
>
> Terry
>
I did exactly that in the 90's for a system where we could just restart a task if a parity error occurred in the writable data. I calculated the thing would fall over once a week if we didn't. The memory was cleansed by a timer task and also code could be corrected if it happened while executing and had two copies of the correction code to cut down the danger area. I stll would have very much preferred ECC!