ECC error correction/scrubbing on x86-64

linux

    Sponsored Links

    Next

  • 1. RTL8180 based wireless LAN card
    Hi, I'm trying to get my generic 802.11b wireless PC card to work under linux. It uses an RTL8180 chip. I've tried Fedora 3 and Ubuntu, neither detect the card. Can anyone direct me to where I may find more information on configuring wireless PC cards nder linux. Thanks.
  • 2. DiskEdit
    Is there a tool that is equivalent to DiskEdit or WinHex for Linux? I've recently gotten interested in looking at raw data on harddrives and trying to understand what it all means. Where would I go to look for information that explains the equivalent of FAT on Linux partitions or the various formats.
  • 3. How do I get the firmware for my hardware?
    I have a Genius ColorPage-Vivid 4 (USB) scanner that has never worked under Linux or given any signs of being visible to it. I recently installed MDK 10.1 CE, and now xcanimage gives me this error on stderr when run from the command line: --------------------------------------------------------------------------- [gt68xx] Couldn't open firmware file (neither `/usr/share/sane/gt68xx/ccd68861.fw' nor `/usr/share/sane/gt68xx/ccd68861.fw'): No such file or directory --------------------------------------------------------------------------- In addition, it gives me the following error in a dialog box: --------------------------------------------------------------------------- Failed to open device 'gt68xx:libusb:001:002': Invalid argument. --------------------------------------------------------------------------- Now, I suppose that the solution is to get the firmware file from the Windows driver (which I have) and copying it to the /usr/share/sane/gt68xx/ccd68861.fw file. But there is no shuch file in Windows. I shearched for ccd* and for gt68*. I did found a file named ccdecode.sys in c:\windows\system32, is that it? I am afraid that trying some random file directly on my scanner might damage it. So, where is that firmware file and how do I get it from there? Victor Rafael Rivarola
  • 4. Cisco airo: anyone?
    This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Hi. I'm trying to have my T41 connecting via the integrated wireless card, a cisco aironet. Someone has reported this one to work, not sure about the airo module, but someone got it definetley working under ndiswrapper. Has anyone made that card working? I mean with the airo module... And if unfortunately things happen --- i.e. airo can't work! :( --- what about ndiswrapper? Since I'm using suse 9.2, I'd like to stick with yast if possible... anyway, if I have to edit /etc/something ok, I'll do it, if no other way is possible... -- Sensei <mailto: XXXX@XXXXX.COM > <pgp:8998A2DB> <icqnum:241572242> <yahoo!:sensei_sen> <msn-id: XXXX@XXXXX.COM >
  • 5. MSI K8N Neo4 Nforce 4 Motherboard?
    Has anyone used the the MSI K8N Neo4 (Nforce 4 Ultra chipset) with Linux? Any problems? I'm finally getting around to building a new workstation. The K8N Neo2 (Nforce 3 250 Ultra) is out of stock at Newegg and the Inquirer just had a story about Nvidia killing the Nforce 3-250 Ultra because of yield problems. Just in case the Neo 2 is gone for good I'll need an alternative and the Neo 4 would be it assuming that it works well with either Mandrake 10.1 of Fedora Core 3.

ECC error correction/scrubbing on x86-64

Postby Aragorn » Wed, 27 Jan 2010 01:00:23 GMT

reetings, fellow penguinistas,

Okay, this is one of those rare occasions where I'm actually starting a
thread myself, hoping to find some answers. ;-)

I have been doing some reading lately on the Systems Management Mode
(SMM) introduced to the x86 architecture through the release of the
Intel 80386SL processor.

From what I've gleaned via Google searches, Systems Management Mode
began its life as an undocumented feature - or bug? - of the i386
architecture commonly called "unreal mode", "big real mode" or "flat
real mode". Unlike what its name says, it's not really a separate mode
of operation like real mode, protected mode or virtual 8086 mode.

"Unreal mode" is invoked by entering a fully set up protected mode and
then clearing the protection bit on the processor, so that it would
drop back to real mode but with the pagetables and the global and local
descriptor tables all still in place, without clearing/overwriting
them. This allows the processor to access the full 32-bit address
range of 4 GB from within (what is essentially) real mode. Hence the
description "unreal mode", "big real mode" or "flat real mode".

Originally, this undocumented "unreal mode" was put to use by a number
of DOS games that required more than the 1 MB address space offered by
normal real mode, but eventually this method for accessing more RAM was
largely supplanted by DOS extenders such as DPMI. That's when Intel
decided to officially introduce the Systems Management Mode on the
i386SL, by making use of "unreal mode" to run high-privileged BIOS code
in ring 0, among other things for power management, e.g. turning on
fans.

Another use of SMM, as I understand it, is to emulate a PS/2 keyboard
controller for USB keyboards (and mice) when an operating system is
used that does not support USB - i.e. the so-called "enable legacy USB
devices" setting in the BIOS set-up program - which is useful when the
machine only has a USB keyboard (or mouse) attached, so that one can
access the BIOS set-up program or make selections in a bootloader. (As
I understand it, Linux emulates a PS/2 keyboard controller for USB
keyboards and mice via its "HID" ("Human Input Devices") drivers, so
SMM would not be needed there.)

Now, according to one of the sources I've found via Google, x86 machines
with ECC would be making use of Systems Management Mode (and
thus "unreal mode") to correct errors in memory, but other than simply
mentioning this, the webpage did not elaborate on this subject, and I
couldn't find any other sources corroborating this.

In a way, it would appear logical to me if ECC were indeed to use SMM,
given that SMM is considered the highest privileged mode of operation
on x86 - so highly privileged that even execution of the operating
system is temporarily halted without that the operating system even
knows (which has already been exploited by "proof of concept"
rootkits) - but on the other hand it doesn't really make sense, and
here's why:

(1) There are many RISC-based machines with ECC, and RISC
architectures do not have any "real mode" (and thus also
no "unreal mode").

(2) SMM requires "unreal mode", and "unreal mode" can only be
initiated by entering 32-bit protected mode first and
then clearing the protection bit without clearing the
pagetables and descriptor tables, but because long mode
on x86-64 differs in operation from 32-bit protected mode
there is n

Re: ECC error correction/scrubbing on x86-64

Postby Pascal Hambourg » Wed, 27 Jan 2010 01:19:06 GMT

Hello,

Aragorn a rit :

Like on any other platform, I guess. ECC is a hardware operation
performed by the memory controller. Only exceptions caused by
uncorrectable errors, which are supposed to be rare, may be handled by
software.

Re: ECC error correction/scrubbing on x86-64

Postby Clemens Ladisch » Wed, 27 Jan 2010 21:05:33 GMT

ragorn wrote:

Not really. "Unreal mode" was undocumented, but SMM was designed and
documented.


It behaves different from all these modes, so I'd call it a mode.


Entering SMM works different. Certain interrupts or hardware ports can
be configured to switch to SMM mode, and the memory mappings are not
left from some previous mode but are explicitly set up as 'flat'.


Rings are only used in protected mode; SMM is arguably one ring above
that. (Actually, on modern processors with virtualization features,
the hypervisor is above ring 0, so the HV would be ring -1 and the
SMM ring -2. :-)


Yes. In this case, the temperature monitoring chip is configured to
generate a SMI (SM interrupt) when some temperate level is reached.


Yes; other motherboard devices are sometimes emulated with this
mechanism too.


Emulating the PS/2 controller is necessary only for software that expects
to access that controller. The Linux input system just has a generic
interface "give me a key", and the input drivers (PS/2, USB, and others)
just implement this interface.


SMM can be used to handle certain ECC errors; however, this is not
necessarily so.

There are two kinds of ECC errors, correctable and uncorrectable ones.
Uncorrectable errors must be reported to the OS so that it can show a
pretty blue screen; correctable errors should be corrected. The latter
can either be done in hardware by the memory controller, or by some
SMM code, or by the operating system.


RISC machines usually don't run MS-DOS but some customized OS that
knows how to handle ECC errors; SMM code is only required for OSes
that don't.


SMM a separate mode. BTW, 64-bit CPUs have (only) a 64-bit SMM.


ECC checking (and scrubbing) is done in hardware by the memory
controller; software is required only to handle errors.


HTH
Clemens

Re: ECC error correction/scrubbing on x86-64

Postby Aragorn » Fri, 29 Jan 2010 03:14:09 GMT

n Tuesday 26 January 2010 13:05 in comp.os.linux.hardware, somebody
identifying as Clemens Ladisch wrote...


That is why I wrote "... began its life as an undocumented feature". ;-)
I know that it was documented when SMM was officially introduced. :-)


Okay... :-)


Well, I knew that SMM is triggered by certain interrupts. I didn't know
that it set up its own pagetables.


Well, ring -1 is not really a ring. Some people call it that - hence
the "-1" but with hardware virtualization extensions enabled, there is
a "root mode" and a "non-root" mode, both of which have rings 0, 1, 2
and 3.

And by the way, Xen operates in (root mode) ring 0 and runs
paravirtualized guest kernels in ring 1, although I don't know whether
this is root mode or non-root mode. I presume root mode for the
privileged guest and non-root mode for unprivileged guests. It would
definitely have to be non-root mode for hardware-virtualized guests.


Yes, that is how I understand it.


As I understand it, the old APM standard uses SMM as well.


Oh okay, thanks for clarifying.


Only if you're running something that was concocted in Redmond. ;-) It
would probably be a nice black screen here at the penguin farm. :-)


Well, I wasn't aware that the memory controller itself could do this
(until the other replying poster pointed that out), and I was also not
aware that the operating system could do it, since the operating system
would by definition be subjected to those errors (since they are
hardware errors).

Yet as I understand it, entering SMM and resuming normal operation would
be expensive in terms of CPU time.


Of course not. DOS is not even compatible with such machines. ;-)


Okay, thanks for clearing that up.


Hmm... According to the information I've found on the 'net, it would be
impossible for 16-bit code to use long mode on an x86-64. Even during
normal execution, 16-bit protected mode code can be executed in 32-bit
compatibility mode, but not in long mode, and 16-bit real mode code can
be executed with the processor in legacy mode, via switching to "unreal
mode".

So what you're saying is that SMM on x86-64 is actually 64-bit code
instead of 16-bit real mode code executed in "unreal mode"?


Okay, thanks for clarifying. Like I wrote higher up, I didn't even know
that the memory controller could do this on its own accord without
intervention from a CPU running some kind of BIOS code.

--
*Aragorn*
(registered GNU/Linux user #223157)

Re: ECC error correction/scrubbing on x86-64

Postby Clemens Ladisch » Fri, 29 Jan 2010 17:16:09 GMT



Sorry, "memory mappings" was misleading; SMM sets up an identity
mapping (virtual=physical) without any page table.  My point was that
this 'flat' mode is explicitly set up, regardless of what mapping was in
use before.


Well, there are two definitions of "corrected" that could apply here.

When the CPU wants to read some memory location, and if the ECC
algorithm detects an error and can detect which of the bits has a wrong
value, the value that is sent to the CPU is the correct one. (1)
That value can also be written back into the RAM so that the error won't
occur again the next time that location is read. (2)

(1) _must_ be done by the memory controller; (2) can be done either by
the memory controller or by the CPU.


Yes.  Code running in SMM must be able to access all registers, all
memory, and all hardware; and SMM code is always hardware dependent, so
no backwards compatibility is needed.


Best regards,
Clemens

Re: ECC error correction/scrubbing on x86-64

Postby Aragorn » Fri, 29 Jan 2010 21:34:56 GMT

On Thursday 28 January 2010 09:16 in comp.os.linux.hardware, somebody
identifying as Clemens Ladisch wrote...




And what about the scrubbing of the CPU cache?  The cache sits in
between the CPU and the memory controller, so this cannot be "cleaned"
by the memory controller, can it?  Would this then be "cleaned" by the
CPU, and if so, by the operating system or by SMM code?

-- 
*Aragorn*
(registered GNU/Linux user #223157)

Re: ECC error correction/scrubbing on x86-64

Postby Pascal Hambourg » Fri, 29 Jan 2010 23:14:23 GMT

Aragorn a rit :

The CPU or any other bus master, such as any DMA-capable adapter :
graphics, mass storage, network...


Correct.


By the CPU's integrated cache controller, at least for (1).

Re: ECC error correction/scrubbing on x86-64

Postby Aragorn » Sat, 30 Jan 2010 00:25:52 GMT

On Thursday 28 January 2010 15:14 in comp.os.linux.hardware, somebody
identifying as Pascal Hambourg wrote...


Okay, thanks for clearing that up. ;-)

-- 
*Aragorn*
(registered GNU/Linux user #223157)

Similar Threads:

1.ECC error correction/scrubbing on x86-64

Greetings, fellow penguinistas,

Okay, this is one of those rare occasions where I'm actually starting a
thread myself, hoping to find some answers. ;-)

I have been doing some reading lately on the Systems Management Mode
(SMM) introduced to the x86 architecture through the release of the
Intel 80386SL processor.

From what I've gleaned via Google searches, Systems Management Mode
began its life as an undocumented feature - or bug? - of the i386
architecture commonly called "unreal mode", "big real mode" or "flat
real mode".  Unlike what its name says, it's not really a separate mode
of operation like real mode, protected mode or virtual 8086 mode.  

"Unreal mode" is invoked by entering a fully set up protected mode and
then clearing the protection bit on the processor, so that it would
drop back to real mode but with the pagetables and the global and local
descriptor tables all still in place, without clearing/overwriting
them.  This allows the processor to access the full 32-bit address
range of 4 GB from within (what is essentially) real mode.  Hence the
description "unreal mode", "big real mode" or "flat real mode".

Originally, this undocumented "unreal mode" was put to use by a number
of DOS games that required more than the 1 MB address space offered by
normal real mode, but eventually this method for accessing more RAM was
largely supplanted by DOS extenders such as DPMI.  That's when Intel
decided to officially introduce the Systems Management Mode on the
i386SL, by making use of "unreal mode" to run high-privileged BIOS code
in ring 0, among other things for power management, e.g. turning on
fans.  

Another use of SMM, as I understand it, is to emulate a PS/2 keyboard
controller for USB keyboards (and mice) when an operating system is
used that does not support USB - i.e. the so-called "enable legacy USB
devices" setting in the BIOS set-up program - which is useful when the
machine only has a USB keyboard (or mouse) attached, so that one can
access the BIOS set-up program or make selections in a bootloader.  (As
I understand it, Linux emulates a PS/2 keyboard controller for USB
keyboards and mice via its "HID" ("Human Input Devices") drivers, so
SMM would not be needed there.)

Now, according to one of the sources I've found via Google, x86 machines
with ECC would be making use of Systems Management Mode (and
thus "unreal mode") to correct errors in memory, but other than simply
mentioning this, the webpage did not elaborate on this subject, and I
couldn't find any other sources corroborating this.  

In a way, it would appear logical to me if ECC were indeed to use SMM,
given that SMM is considered the highest privileged mode of operation
on x86 - so highly privileged that even execution of the operating
system is temporarily halted without that the operating system even
knows (which has already been exploited by "proof of concept"
rootkits) - but on the other hand it doesn't really make sense, and
here's why:

(1) There are many RISC-based machines with ECC, and RISC
    architectures do not have any "real mode" (and thus also
    no "unreal mode").

(2) SMM requires "unreal mode", and "unreal mode" can only be
    initiated by entering 32-bit protected mode first and
    then clearing the protection bit without clearing the
    pagetables and descriptor tables, but because long mode
    on x86-64 differs in operation from 32-bit protected mode
    there is no 64-bit equivalent for "unreal mode".  In other
    words, SMM would never be able to access more than 4 GB
    of RAM.  I'm not sure on whether PAE pagetables could be
    used, but even if that were the case, the address range
    would then still be limited to 64 GB only, and so this
    would make ECC - if it does indeed use SMM and thus "unreal
    mode" - inadequate on x86-64 systems with more than 64 GB
    installed.

(3) ECC checks are performed at fairly high frequency, and
    entering SMM (and thus "unreal mode") with the prospect of
    being able to return to normal protected mode requires that
    the processor saves all of its registers first before
    clearing the protection bit, which is a very expensive
    operation in terms of clock cycles.  As such, systems with
    ECC operating at its highest frequency would suffer an
    incredible performance loss during normal execution of the
    operating system and userspace processes.  That doesn't
    seem quite logical on high performance/high throughput
    servers and workstations, and indeed, such systems don't
    appear to be slowing down too much with ECC enabled.

So, can anyone here enlighten me on how the high-privileged ECC routines
are executed on x86-64?  Is this something that would be executed by
the operating system itself while in long mode, or how does this work?

Thanks in advance,
-- 
*Aragorn*
(registered GNU/Linux user #223157)

2.[PATCH] Fix ECC error counting for AMD76x chipset, char/ecc.c driver

3.[PATCH] Fix ECC error counting for AMD76x chipset, char/ecc.c driver

4.swsusp: fix x86-64 [was swsusp: solving build issue leads to crash on x86-64]

5.[discuss] x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases

6. [discuss] x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases

7. x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases

8. [discuss] x86_64: x86-64 mailing lists / posting patchkits / x86-64 releases



Return to linux

 

Who is online

Users browsing this forum: No registered users and 44 guest