ECC error correction/scrubbing on x86-64

linux

    Sponsored Links

    Next

  • 1. Hauppauge PVR-250 not being seen by Red Hat 9
    Hi folks, My old STB capture card was picked-up as /dev/video0, but it was so old, the tuner portion didn't work. So I invested in a Hauppauge PVR-250. It works great in Windows, but Linux doesn't pick it up at all. When I run modprobe bttv, I get the following: [root@localhost root]# modprobe bttv /lib/modules/2.4.20-8/kernel/drivers/media/video/bttv.o: init_module: No such device Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters. You may find more information in syslog or the output from dmesg /lib/modules/2.4.20-8/kernel/drivers/media/video/bttv.o: insmod /lib/modules/2.4.20-8/kernel/drivers/media/video/bttv.o failed /lib/modules/2.4.20-8/kernel/drivers/media/video/bttv.o: insmod bttv failed When I review /var/log/messages, the only entries for bttv are follows: Sep 11 21:32:52 localhost kernel: bttv: using 4 buffers with 2080k (8320k total) for capture Sep 11 21:32:52 localhost kernel: bttv: Host bridge is VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] Sep 11 21:32:52 localhost kernel: bttv: Host bridge is VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] Any suggestions? Is there anything that'll show what Linux did pick-up, and if it errored out due to conflicts, where would I look for this ??? Thanks for any help. Alex.
  • 2. Samsung ML-1710 Laser Print3er
    Anyone have any experience using a Samsung ML-1710 Laser Printer with a Linux machine (USB port connection)? The web literature indicates that Samsung supports Linux for this printer, but does not go into any details. Was not specific as to whether it realizes 16 pages-per-minute under Linux. -Norm
  • 3. which is the newest kernel
    which is the newest kernel?
  • 4. How to write a keyboard event callback in pure glib?
    Hi there, can anybody help me to write a keyboard event callback function in pure glib? Thanks in advance Jozef

ECC error correction/scrubbing on x86-64

Postby Aragorn » Wed, 27 Jan 2010 01:00:23 GMT

reetings, fellow penguinistas,

Okay, this is one of those rare occasions where I'm actually starting a
thread myself, hoping to find some answers. ;-)

I have been doing some reading lately on the Systems Management Mode
(SMM) introduced to the x86 architecture through the release of the
Intel 80386SL processor.

From what I've gleaned via Google searches, Systems Management Mode
began its life as an undocumented feature - or bug? - of the i386
architecture commonly called "unreal mode", "big real mode" or "flat
real mode". Unlike what its name says, it's not really a separate mode
of operation like real mode, protected mode or virtual 8086 mode.

"Unreal mode" is invoked by entering a fully set up protected mode and
then clearing the protection bit on the processor, so that it would
drop back to real mode but with the pagetables and the global and local
descriptor tables all still in place, without clearing/overwriting
them. This allows the processor to access the full 32-bit address
range of 4 GB from within (what is essentially) real mode. Hence the
description "unreal mode", "big real mode" or "flat real mode".

Originally, this undocumented "unreal mode" was put to use by a number
of DOS games that required more than the 1 MB address space offered by
normal real mode, but eventually this method for accessing more RAM was
largely supplanted by DOS extenders such as DPMI. That's when Intel
decided to officially introduce the Systems Management Mode on the
i386SL, by making use of "unreal mode" to run high-privileged BIOS code
in ring 0, among other things for power management, e.g. turning on
fans.

Another use of SMM, as I understand it, is to emulate a PS/2 keyboard
controller for USB keyboards (and mice) when an operating system is
used that does not support USB - i.e. the so-called "enable legacy USB
devices" setting in the BIOS set-up program - which is useful when the
machine only has a USB keyboard (or mouse) attached, so that one can
access the BIOS set-up program or make selections in a bootloader. (As
I understand it, Linux emulates a PS/2 keyboard controller for USB
keyboards and mice via its "HID" ("Human Input Devices") drivers, so
SMM would not be needed there.)

Now, according to one of the sources I've found via Google, x86 machines
with ECC would be making use of Systems Management Mode (and
thus "unreal mode") to correct errors in memory, but other than simply
mentioning this, the webpage did not elaborate on this subject, and I
couldn't find any other sources corroborating this.

In a way, it would appear logical to me if ECC were indeed to use SMM,
given that SMM is considered the highest privileged mode of operation
on x86 - so highly privileged that even execution of the operating
system is temporarily halted without that the operating system even
knows (which has already been exploited by "proof of concept"
rootkits) - but on the other hand it doesn't really make sense, and
here's why:

(1) There are many RISC-based machines with ECC, and RISC
architectures do not have any "real mode" (and thus also
no "unreal mode").

(2) SMM requires "unreal mode", and "unreal mode" can only be
initiated by entering 32-bit protected mode first and
then clearing the protection bit without clearing the
pagetables and descriptor tables, but because long mode
on x86-64 differs in operation from 32-bit protected mode
there is n

Re: ECC error correction/scrubbing on x86-64

Postby Pascal Hambourg » Wed, 27 Jan 2010 01:19:06 GMT

Hello,

Aragorn a rit :

Like on any other platform, I guess. ECC is a hardware operation
performed by the memory controller. Only exceptions caused by
uncorrectable errors, which are supposed to be rare, may be handled by
software.

Re: ECC error correction/scrubbing on x86-64

Postby Clemens Ladisch » Wed, 27 Jan 2010 21:05:33 GMT

ragorn wrote:

Not really. "Unreal mode" was undocumented, but SMM was designed and
documented.


It behaves different from all these modes, so I'd call it a mode.


Entering SMM works different. Certain interrupts or hardware ports can
be configured to switch to SMM mode, and the memory mappings are not
left from some previous mode but are explicitly set up as 'flat'.


Rings are only used in protected mode; SMM is arguably one ring above
that. (Actually, on modern processors with virtualization features,
the hypervisor is above ring 0, so the HV would be ring -1 and the
SMM ring -2. :-)


Yes. In this case, the temperature monitoring chip is configured to
generate a SMI (SM interrupt) when some temperate level is reached.


Yes; other motherboard devices are sometimes emulated with this
mechanism too.


Emulating the PS/2 controller is necessary only for software that expects
to access that controller. The Linux input system just has a generic
interface "give me a key", and the input drivers (PS/2, USB, and others)
just implement this interface.


SMM can be used to handle certain ECC errors; however, this is not
necessarily so.

There are two kinds of ECC errors, correctable and uncorrectable ones.
Uncorrectable errors must be reported to the OS so that it can show a
pretty blue screen; correctable errors should be corrected. The latter
can either be done in hardware by the memory controller, or by some
SMM code, or by the operating system.


RISC machines usually don't run MS-DOS but some customized OS that
knows how to handle ECC errors; SMM code is only required for OSes
that don't.


SMM a separate mode. BTW, 64-bit CPUs have (only) a 64-bit SMM.


ECC checking (and scrubbing) is done in hardware by the memory
controller; software is required only to handle errors.


HTH
Clemens

Re: ECC error correction/scrubbing on x86-64

Postby Aragorn » Fri, 29 Jan 2010 03:14:09 GMT

n Tuesday 26 January 2010 13:05 in comp.os.linux.hardware, somebody
identifying as Clemens Ladisch wrote...


That is why I wrote "... began its life as an undocumented feature". ;-)
I know that it was documented when SMM was officially introduced. :-)


Okay... :-)


Well, I knew that SMM is triggered by certain interrupts. I didn't know
that it set up its own pagetables.


Well, ring -1 is not really a ring. Some people call it that - hence
the "-1" but with hardware virtualization extensions enabled, there is
a "root mode" and a "non-root" mode, both of which have rings 0, 1, 2
and 3.

And by the way, Xen operates in (root mode) ring 0 and runs
paravirtualized guest kernels in ring 1, although I don't know whether
this is root mode or non-root mode. I presume root mode for the
privileged guest and non-root mode for unprivileged guests. It would
definitely have to be non-root mode for hardware-virtualized guests.


Yes, that is how I understand it.


As I understand it, the old APM standard uses SMM as well.


Oh okay, thanks for clarifying.


Only if you're running something that was concocted in Redmond. ;-) It
would probably be a nice black screen here at the penguin farm. :-)


Well, I wasn't aware that the memory controller itself could do this
(until the other replying poster pointed that out), and I was also not
aware that the operating system could do it, since the operating system
would by definition be subjected to those errors (since they are
hardware errors).

Yet as I understand it, entering SMM and resuming normal operation would
be expensive in terms of CPU time.


Of course not. DOS is not even compatible with such machines. ;-)


Okay, thanks for clearing that up.


Hmm... According to the information I've found on the 'net, it would be
impossible for 16-bit code to use long mode on an x86-64. Even during
normal execution, 16-bit protected mode code can be executed in 32-bit
compatibility mode, but not in long mode, and 16-bit real mode code can
be executed with the processor in legacy mode, via switching to "unreal
mode".

So what you're saying is that SMM on x86-64 is actually 64-bit code
instead of 16-bit real mode code executed in "unreal mode"?


Okay, thanks for clarifying. Like I wrote higher up, I didn't even know
that the memory controller could do this on its own accord without
intervention from a CPU running some kind of BIOS code.

--
*Aragorn*
(registered GNU/Linux user #223157)

Re: ECC error correction/scrubbing on x86-64

Postby Clemens Ladisch » Fri, 29 Jan 2010 17:16:09 GMT



Sorry, "memory mappings" was misleading; SMM sets up an identity
mapping (virtual=physical) without any page table.  My point was that
this 'flat' mode is explicitly set up, regardless of what mapping was in
use before.


Well, there are two definitions of "corrected" that could apply here.

When the CPU wants to read some memory location, and if the ECC
algorithm detects an error and can detect which of the bits has a wrong
value, the value that is sent to the CPU is the correct one. (1)
That value can also be written back into the RAM so that the error won't
occur again the next time that location is read. (2)

(1) _must_ be done by the memory controller; (2) can be done either by
the memory controller or by the CPU.


Yes.  Code running in SMM must be able to access all registers, all
memory, and all hardware; and SMM code is always hardware dependent, so
no backwards compatibility is needed.


Best regards,
Clemens

Re: ECC error correction/scrubbing on x86-64

Postby Aragorn » Fri, 29 Jan 2010 21:34:56 GMT

On Thursday 28 January 2010 09:16 in comp.os.linux.hardware, somebody
identifying as Clemens Ladisch wrote...




And what about the scrubbing of the CPU cache?  The cache sits in
between the CPU and the memory controller, so this cannot be "cleaned"
by the memory controller, can it?  Would this then be "cleaned" by the
CPU, and if so, by the operating system or by SMM code?

-- 
*Aragorn*
(registered GNU/Linux user #223157)

Re: ECC error correction/scrubbing on x86-64

Postby Pascal Hambourg » Fri, 29 Jan 2010 23:14:23 GMT

Aragorn a rit :

The CPU or any other bus master, such as any DMA-capable adapter :
graphics, mass storage, network...


Correct.


By the CPU's integrated cache controller, at least for (1).

Re: ECC error correction/scrubbing on x86-64

Postby Aragorn » Sat, 30 Jan 2010 00:25:52 GMT

On Thursday 28 January 2010 15:14 in comp.os.linux.hardware, somebody
identifying as Pascal Hambourg wrote...


Okay, thanks for clearing that up. ;-)

-- 
*Aragorn*
(registered GNU/Linux user #223157)

Similar Threads:

1.ECC error correction/scrubbing on x86-64

Greetings, fellow penguinistas,

Okay, this is one of those rare occasions where I'm actually starting a
thread myself, hoping to find some answers. ;-)

I have been doing some reading lately on the Systems Management Mode
(SMM) introduced to the x86 architecture through the release of the
Intel 80386SL processor.

From what I've gleaned via Google searches, Systems Management Mode
began its life as an undocumented feature - or bug? - of the i386
architecture commonly called "unreal mode", "big real mode" or "flat
real mode".  Unlike what its name says, it's not really a separate mode
of operation like real mode, protected mode or virtual 8086 mode.  

"Unreal mode" is invoked by entering a fully set up protected mode and
then clearing the protection bit on the processor, so that it would
drop back to real mode but with the pagetables and the global and local
descriptor tables all still in place, without clearing/overwriting
them.  This allows the processor to access the full 32-bit address
range of 4 GB from within (what is essentially) real mode.  Hence the
description "unreal mode", "big real mode" or "flat real mode".

Originally, this undocumented "unreal mode" was put to use by a number
of DOS games that required more than the 1 MB address space offered by
normal real mode, but eventually this method for accessing more RAM was
largely supplanted by DOS extenders such as DPMI.  That's when Intel
decided to officially introduce the Systems Management Mode on the
i386SL, by making use of "unreal mode" to run high-privileged BIOS code
in ring 0, among other things for power management, e.g. turning on
fans.  

Another use of SMM, as I understand it, is to emulate a PS/2 keyboard
controller for USB keyboards (and mice) when an operating system is
used that does not support USB - i.e. the so-called "enable legacy USB
devices" setting in the BIOS set-up program - which is useful when the
machine only has a USB keyboard (or mouse) attached, so that one can
access the BIOS set-up program or make selections in a bootloader.  (As
I understand it, Linux emulates a PS/2 keyboard controller for USB
keyboards and mice via its "HID" ("Human Input Devices") drivers, so
SMM would not be needed there.)

Now, according to one of the sources I've found via Google, x86 machines
with ECC would be making use of Systems Management Mode (and
thus "unreal mode") to correct errors in memory, but other than simply
mentioning this, the webpage did not elaborate on this subject, and I
couldn't find any other sources corroborating this.  

In a way, it would appear logical to me if ECC were indeed to use SMM,
given that SMM is considered the highest privileged mode of operation
on x86 - so highly privileged that even execution of the operating
system is temporarily halted without that the operating system even
knows (which has already been exploited by "proof of concept"
rootkits) - but on the other hand it doesn't really make sense, and
here's why:

(1) There are many RISC-based machines with ECC, and RISC
    architectures do not have any "real mode" (and thus also
    no "unreal mode").

(2) SMM requires "unreal mode", and "unreal mode" can only be
    initiated by entering 32-bit protected mode first and
    then clearing the protection bit without clearing the
    pagetables and descriptor tables, but because long mode
    on x86-64 differs in operation from 32-bit protected mode
    there is no 64-bit equivalent for "unreal mode".  In other
    words, SMM would never be able to access more than 4 GB
    of RAM.  I'm not sure on whether PAE pagetables could be
    used, but even if that were the case, the address range
    would then still be limited to 64 GB only, and so this
    would make ECC - if it does indeed use SMM and thus "unreal
    mode" - inadequate on x86-64 systems with more than 64 GB
    installed.

(3) ECC checks are performed at fairly high frequency, and
    entering SMM (and thus "unreal mode") with the prospect of
    being able to return to normal protected mode requires that
    the processor saves all of its registers first before
    clearing the protection bit, which is a very expensive
    operation in terms of clock cycles.  As such, systems with
    ECC operating at its highest frequency would suffer an
    incredible performance loss during normal execution of the
    operating system and userspace processes.  That doesn't
    seem quite logical on high performance/high throughput
    servers and workstations, and indeed, such systems don't
    appear to be slowing down too much with ECC enabled.

So, can anyone here enlighten me on how the high-privileged ECC routines
are executed on x86-64?  Is this something that would be executed by
the operating system itself while in long mode, or how does this work?

Thanks in advance,
-- 
*Aragorn*
(registered GNU/Linux user #223157)

2.[PATCH] Fix ECC error counting for AMD76x chipset, char/ecc.c driver

3.[PATCH] Fix ECC error counting for AMD76x chipset, char/ecc.c driver

4.swsusp: fix x86-64 [was swsusp: solving build issue leads to crash on x86-64]

5.[discuss] x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases

6. [discuss] x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases

7. x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases

8. [discuss] x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases



Return to linux

 

Who is online

Users browsing this forum: No registered users and 31 guest