By: Terry Gray (cuyahogan.delete@this.aol.com), January 7, 2021 11:58 am
Room: Moderated Discussions
rwessel (rwessel.delete@this.yahoo.com) on January 7, 2021 11:00 am wrote:
> Terry Gray (cuyahogan.delete@this.aol.com) on January 7, 2021 9:47 am wrote:
> > Jörn Engel (joern.delete@this.purestorage.com) on January 7, 2021 9:05 am wrote:
> > > Emanuel Rylke (ema.delete@this.mailbox.org) on January 7, 2021 12:49 am wrote:
> > > >
> > > > What about doing it not as a workaround for broken hardware
> > > > but to make it more easy to show that the hardware
> > > > is broken? In theory I know that I'm probably getting bit
> > > > errors and that's bad(TM) but if a cat /proc/bit_errors
> > > > showed me that I got at least 5 since boot I would be much more motivated to do something about it.
> > >
> > > Unrealistic for writable pages. Doable for read-only. You need a bit of shadow memory to store
> > > the checksums and some fast hash function. Assuming you cannot use vector instructions, performance
> > > would be 16 bytes per cycle or 256 cycles per page. You should calculate hashes when pages turn
> > > read-only, again before they become writable and maybe periodically in between.
> > >
> > > Do you care enough to write a patch?
> >
> > Back in the 1960s Oregon State University had a CDC 3300 (24 bit computer).
> >
> > Some of the other students I shared an office with wrote an operating system for it called OS3
> > (Oregon State Open Shop Operating System).
> >
> > It had parity memory and to recover from errors in progrem code each sector had an exclusive OR
> > of the contents as the last word in a sector. When an error occurred the word in error was known
> > so they could calculate what that word should have been. So this idea is not new. But interesting
> > that I have never heard of it being used anywhere else (although it may have been).
>
>
> That's just RAID 4, if applied to disks.
>
> And that's the basic idea behind RAIM or chipkill style systems.
>
Thanks. I was mostly a software guy and I have been retired for many years.
Still nice to learn things everybody else seems to know.
Terry
> Terry Gray (cuyahogan.delete@this.aol.com) on January 7, 2021 9:47 am wrote:
> > Jörn Engel (joern.delete@this.purestorage.com) on January 7, 2021 9:05 am wrote:
> > > Emanuel Rylke (ema.delete@this.mailbox.org) on January 7, 2021 12:49 am wrote:
> > > >
> > > > What about doing it not as a workaround for broken hardware
> > > > but to make it more easy to show that the hardware
> > > > is broken? In theory I know that I'm probably getting bit
> > > > errors and that's bad(TM) but if a cat /proc/bit_errors
> > > > showed me that I got at least 5 since boot I would be much more motivated to do something about it.
> > >
> > > Unrealistic for writable pages. Doable for read-only. You need a bit of shadow memory to store
> > > the checksums and some fast hash function. Assuming you cannot use vector instructions, performance
> > > would be 16 bytes per cycle or 256 cycles per page. You should calculate hashes when pages turn
> > > read-only, again before they become writable and maybe periodically in between.
> > >
> > > Do you care enough to write a patch?
> >
> > Back in the 1960s Oregon State University had a CDC 3300 (24 bit computer).
> >
> > Some of the other students I shared an office with wrote an operating system for it called OS3
> > (Oregon State Open Shop Operating System).
> >
> > It had parity memory and to recover from errors in progrem code each sector had an exclusive OR
> > of the contents as the last word in a sector. When an error occurred the word in error was known
> > so they could calculate what that word should have been. So this idea is not new. But interesting
> > that I have never heard of it being used anywhere else (although it may have been).
>
>
> That's just RAID 4, if applied to disks.
>
> And that's the basic idea behind RAIM or chipkill style systems.
>
Thanks. I was mostly a software guy and I have been retired for many years.
Still nice to learn things everybody else seems to know.
Terry