воскресенье, 23 февраля 2020 г.

INTEL MCE INJECTOR DRIVER

See Chapter 15 in this reference where it says: The following two SRAO errors are architecturally defined. ISTM you want to map a known bad page there instead. This is in addition to the mcelog test suite included with the source make test. Memory errors are classified as either soft transient or hard permanent.
Uploader: Shaktijar
Date Added: 8 August 2011
File Size: 26.68 Mb
Operating Systems: Windows NT/2000/XP/2003/2003/7/8/10 MacOS 10/X
Downloads: 52723
Price: Free* [*Free Regsitration Required]





Posted Aug 27, Er, maybe I'm missing the thrust of your question, but I thought it was sort of straightforward: An instruction to load some data from memory didn't get the data because it's been destroyed. Background scrubbing gives a machine check. Once the hardware delivers the message to the OS via a machine checkthe OS is then free to deal with the machine check however it pleases. Automatic page offlining is a good idea: Introduction to memory errors on modern systems and a description how the mcelog daemon injectoe and avoids them.

HWPOISON []

Thus, when HWPOISON is coupled with the appropriate fault-tolerant processors, Linux users can enjoy systems that are more tolerant to memory errors in spite of increased memory densities. That's the stuff Andi Kleen and co.

Reserved kernel pages and zero count pages are ignored with the peril of a system panic. Posted Sep 29, Later, when erroneous data is read by injecto software, a machine check is initiated.

The MCA can occur on any "word", where "word" is defined by the width of the ECC code applied at the corresponding level of memory. Potentially corrupted processes can then be located by finding all processes that have the corrupted me mapped.

Unified error handling -- A worthy goal? The following two SRAO errors are architecturally defined.

For users:

The Intel Software Developer's manual describes the low level register interface of the x86 machine check architecture Machine checks are described in Volume 3A: Posted Dec 4, 9: Now that Intel is supporting MCA Recovery on x86 machines, some desktop users may also enjoy its benefits in the near future. Take a look here: Posted Sep 8, It refers to the specific bad subset being used as "data error consumption" and the instruction that uses it as the "offending instruction" and says you can't simply locate the offending instruction and thereby the memory location and the process that are affected by the bad memory, because of the delay.

ECC is able to recover from multib i y te errors. Background scrubbing works by reading memory locations, checking the ECC, and correcting correctable errors proactively before they become uncorrectable.

This link is broken. The handler must allow for multiple poisoning events occurring in a short time window. Its exact behavior depends upon the type of corrupted page and various kernel configuration parameters.

On a later page fault the associated application will be killed. For these uncorrectable errors, the hardware typically generates a trap which, in turn, causes a kernel panic. See this LWN article for further details about this issue.

mcelog -- further reading

Can it be any clearer? MCE is the mechanism by which the hardware reports the bad page to the operating system.

But that's not the case the article describes. Studies about memory errors A good study on memory errors from the University of Rochester. Newer Intel CPUs support a new class of machine checks called recoverable action optional.

The handler ignores the following types of pages: See Chapter 15 in this reference where it says:

Комментариев нет:

Отправить комментарий