[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] 2/3: MCA/MCE correctable error handling



On Wednesday 22 August 2007 18:13:30 Keir Fraser wrote:
> I'm not sure you should use shared_info at all. This interface should be
> low-rate enough that you can add a sysctl (assuming this info is applicable
> for dom0 only, which it looks to be).

The polling service is dom0 only. Uncorrectable errors may also be reported
to a DomU.

> Actually, probably a platform_op  rather than a sysctl...

What you propose is to copy the MCA info from Xen to guest?
The MCA info structure will be used for both correctable and uncorrectable 
error notification (as I said in my first patch 0/3 mail).

I assumed you agreed on all this as is in the patches, since you did not 
object/comment when this was discussed on this list in the "MCA/MCE concept" 
topic (and also your questions in your patch 3/3 mail were discussed there).
The purpose of this discussion was to find an agreement on how to proceed.

Therefore, all what I expected from you were some questions/comments about 
implementation details and no design/concept questions.


Christoph


>  -- Keir
>
> On 22/8/07 16:47, "Christoph Egger" <Christoph.Egger@xxxxxxx> wrote:
> > I fixed all this in my code and will re-submit the new patch.
> >
> > Christoph
> >
> > On Wednesday 22 August 2007 12:01:26 Jan Beulich wrote:
> >>>>> "Christoph Egger" <Christoph.Egger@xxxxxxx> 22.08.07 10:47 >>>
> >>>
> >>> On Tuesday 21 August 2007 17:53:45 Jan Beulich wrote:
> >>>>> +} __attribute__((packed));
> >>>>
> >>>> I think it was a general agreement that it is not a good idea
> >>>> (non-portable to non-GNU compilers) to put things like this in public
> >>>> headers. I think by properly ordering things you can get away without
> >>>> this (and almost without padding).
> >>>
> >>> Oh, you're right. I should use #pragma pack(1)  instead.
> >>> Is this fine with you?
> >>
> >> I'm afraid that wouldn't work with the compat mode header generation, as
> >> the original use of #pragma pack(push, ...) was banned for (Sun)
> >> compatibility reasons. Consequently, I believe the only way of doing
> >> this properly is to avoid depending on compiler behavior by arranging
> >> things appropriately (including padding members if needed) and avoiding
> >> #pragma pack() altogether.
> >>
> >>>>> +struct mcinfo_global {
> >>>>> ...
> >>>>> +    uint16_t mc_socketid;
> >>>>> +    uint16_t mc_coreid;
> >>>>> +    uint16_t mc_vcpu_id;
> >>>>
> >>>> Unless mc_vcpu_id is intended for that purpose, this neglects
> >>>> hyperthreading (I know, AMD doesn't use this, but the interface should
> >>>> be vendor neutral).
> >>>>
> >>>> If mc_vcpu_id is meant for that purpose, its naming is ambiguous.
> >>>>
> >>>> If mc_vcpu_id is meant as a vcpuid, then ordering things os that
> >>>> mc_domid and mc_vcpu_id are contiguous would seem more natural (making
> >>>> eventual examination in raw memory less confusing).
> >>>
> >>> mc_coreid is the physical core that reported the machine check
> >>> information. mc_socketid is the physical socket the physical core is
> >>> in. This is useful to find all other affected physical cores, when an
> >>> error in the L3-Cache is reported that is shared over all cores in the
> >>> chip.
> >>>
> >>> mc_vcpu_id is the id of the active vcpu for the domain, that reported
> >>> the machine check information. Inside Xen, this is current->vcpu_id.
> >>> mc_vcpu_id is needed to deal properly with the upcoming NUMA support my
> >>> collegue is working on.
> >>
> >> Okay, but then you're really lacking a thread id here, for being filled
> >> by respective Intel code (AMD code would set this to zero).
> >>
> >>>>> +/* sizeof(struct mcinfo_global) + 6 * sizeof(struct mcinfo_bank) ==
> >>>>> 200. + * This is enough space to store mc information of up to six
> >>>>> banks. + */
> >>>>> +#define MCINFO_MAXSIZE (204 - sizeof(uint32_t))
> >>>>
> >>>> Why don't you use the sizeof()-s from the description? If this is
> >>>> really needed for some reason, then having 200 in the description and
> >>>> 204 in the macro is at least confusing...
> >>>
> >>> MCINFO_MAXSIZE is the size for the content of the mi_data array.
> >>> MCINFO_MAXSIZE + sizeof(mi_nentries) == 204. That is where is number
> >>> comes from.
> >>
> >> I concluded that, but pointed out that seeing two different numbers made
> >> me think of why this is so, whereas identical numbers would have avoided
> >> that (and will likely keep others from asking later, too).
> >>
> >> Also, you don't say what was the reason for you to use constants instead
> >> of sizeof() here.
> >>
> >>>>>     /* Frame containing list of mfns containing list of mfns
> >>>>> containing p2m. */ xen_pfn_t     pfn_to_mfn_frame_list_list;
> >>>>>     unsigned long nmi_reason;
> >>>>> +    struct arch_mc_info mc_info; /* machine check information */
> >>>>>     uint64_t pad[32];
> >>>>> };
> >>>>
> >>>> Are you sure it is appropriate to add a member here without reducing
> >>>> the padding member?
> >>>
> >>> You want me to replace "uint64_t pad[32];" with "uint32_t pad[13];" ?
> >>
> >> It would be my understanding that that's the right thing to do (assuming
> >> you calculated the numbers correctly), but I'd rely on Keir to give a
> >> final word on this.
> >>
> >> Jan



-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.