[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform



On Sun, Apr 22, 2012 at 10:18:27PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > This patch gets error information from xen hypervisor and convert
> > > > xen format error into linux format mcelog. This logic is basically
> > > > self-contained, not much touching other kernel components.
> > 
> > I have a problem with that statement above. This thing uses struct mce
> > and mce_log(), which are internal to x86, and whenever we want to change
> > anything there, we won't be able to because xen uses it too.
> 
> Huh? You mean this:
> 
>        m.misc = mc_bank->mc_misc;
>        m.status = mc_bank->mc_status;
>        m.addr = mc_bank->mc_addr;
>        m.tsc = mc_bank->mc_tsc;
>        m.bank = mc_bank->mc_bank;
>        m.finished = 1;
>        /*log this record*/
>        mce_log(&m);
> 
> ?
> 
> This driver is not that much different from the APEI bridge to MCE code -
> it just that instead of reading APEI blob data it reads it from an hypercall.

Let me ask you this: is APEI a virtualization solution of some sort?

No, it is the old windoze RAS crap but I guess Linux has to support it
now too through BIOS. And x86 vendors will have to support it too.

So it is piece of the firmware we'd have to deal with too.

Now xen is a whole another deal - it is purely a piece of software.

> The fix seems quite easy - you change the 'struct mce' and 'mce_log()'
> along with the drivers that use it.

This is exactly what I have a problem with: having to take care of xen
too. "No, Boris, nope, we cannot take your new feature because it breaks
xen." and also "Have you tested this on xen too?" where the only thing I
do is _hardware_ enablement and improving software support for it. And
xen is not hardware...

[..]

> If you are worried about breaking something, then you can just send
> the change to me or Liu to test it out before committing API changes
> in the MCE code.

This probably sounds good now but I don't think code changes like
that ever run as smoothly. Whenever there's breakage, there'll always
be people screaming against it - I just don't want code that enables
hardware to be crippled and unable to change because it breaks
completely unrelated pieces - it is bad as it is now.

> > And this has happened already with the whole microcode loading debacle.
> 
> My recollection is that the existing microcode API had major issues that
> could not fixed. The only fix was to make it be very early in the bootup
> processes and that is what hpa would like developers to focus on.

That was one side of the problem. The other was, AFAICR, creating a xen
microcode driver which was "on the same level" as the hardware microcode
drivers, which was completely bull*.

The problem is xen growing stuff everywhere in arch/x86/ and this way,
maybe even unwillingly, crippling development of hardware-related
features. I know you're willing to help and I know you mean it well, but
there's always some other problem in practice.

Now I keep wondering, why don't you guys simply create your own mcelog
ring buffer and interface on the userspace tool side instead of hooking
into lowlevel kernel stuff? I mean, the code is there, you simply have
to copy it into arch/xen/ or whatever you have there. Why do you have to
hook into arch/x86/ instead of doing your own stuff?

> > So, my suggestion is to copy the pieces you need and create your own xen
> > version of /dev/mcelog and add it to dom0 so that there's no hooking
> > into baremetal code and whenever a dom0 kernel is running, you can
> > reroute the mcelog userspace tool to read /dev/xen_mcelog or whatever
> > and not hook into the x86 versions.
> > 
> > Because, if you'd hooked into it, just imagine one fine day, when we
> > remove mcelog support, what screaming the xen people will be doing when
> > mcelog doesn't work anymore.
> 
> You would have more screaming from the distro camp about removing
> /dev/mcelog.

How do you know that? Don't you think that we probably would've talked
to them already and made preparations for conversion first?

> But if that is your choice, I would send you an email asking how I
> need to retool this driver to work with the new MCE gen2 code that you
> had in mind.

As I said above, I'm very sceptical this will ever work, I guess I'd
have to live and see.

Now, with your own buffer solution, nothing breaks and all is happy,
a win-win, if you wish. I think this is much simpler and easier a
solution.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.