[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN



MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK 

The L3 cache index disable feature works like this:

You read the bits 17:6  from the MSR 0xC0000408 (which is MC4_MISC1)
and write it into the index field. This MSR does not belong to the standard
mc bank data and is therefore provided by mcinfo_extended.
The index field are the bits 11:0 of the PCI function 3 register
"L3 Cache Index Disable".

Why is the recover action bound to the bank ?
I would like to see a struct mcinfo_recover  rather extending
struct mcinfo_bank.  That gives us flexibility.

Christoph


On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
> Christoph/Frank, Followed is the interface definition, please have a look.
>
> Thanks
> Yunhong Jiang
>
> 1) Interface between Xen/dom0 for passing xen's recovery action information
> to dom0. Usage model: After offlining broken page, Xen might pass its
> page-offline recovery action result information to dom0. Dom0 will save the
> information in non-volatile memory for further proactive actions, such as
> offlining the easy-broken page early when doing next reboot.
>
>
> struct page_offline_action
> {
>     /* Params for passing the offlined page number to DOM0 */
>     uint64_t mfn;
>     uint64_t status; /* Similar to page offline hypercall */
> };
>
> struct cpu_offline_action
> {
>     /* Params for passing the identity of the offlined CPU to DOM0 */
>     uint32_t mc_socketid;
>     uint16_t mc_coreid;
>     uint16_t mc_core_threadid;
> };
>
> struct cache_shrink_action
> {
>     /* TBD, Christoph, please fill it */
> };
>
> /* Recover action flags, giving recovery result information to guest */
> /* Recovery successfully after taking certain recovery actions below */
> #define REC_ACT_RECOVERED      (0x1 << 0)
> /* For solaris's usage that dom0 will take ownership when crash */
> #define REC_ACT_RESET          (0x1 << 2)
> /* No action is performed by XEN */
> #define REC_ACT_INFO           (0x1 << 3)
>
> /* Recover action type definition, valid only when flags & 
> REC_ACT_RECOVERED */
> #define MC_ACT_PAGE_OFFLINE 1 
> #define MC_ACT_CPU_OFFLINE   2
> #define MC_ACT_CACHE_SHIRNK 3
>
> struct recovery_action
> {
>     uint8_t flags;
>     uint8_t action_type;
>     union
>     {
>         struct page_offline_action page_retire;
>         struct cpu_offline_action cpu_offline;
>         struct cache_shrink_action cache_shrink;
>         uint8_t pad[MAX_ACTION_SIZE];
>     } action_info;
> }
>
> struct mcinfo_bank {
>     struct mcinfo_common common;
>
>     uint16_t mc_bank; /* bank nr */
>     uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
>                         * and if mc_addr is valid. Never valid on DomU. */
>     uint64_t mc_status; /* bank status */
>     uint64_t mc_addr;   /* bank address, only valid
>                          * if addr bit is set in mc_status */
>     uint64_t mc_misc;
>     uint64_t mc_ctrl2;
>     uint64_t mc_tsc;
>     /* Recovery action is performed per bank */
>     struct recovery_action action;
> };
>
> 2) Below two interfaces are for MCA processing internal use.
>     a. pre_handler will be called earlier in MCA ISR context, mainly for
> early need_reset detection for avoiding log missing (flag MCA_RESET). 
> Also, pre_handler might be able to find the impacted domain if possible.
>     b. mca_error_handler is actually a (error_action_index,
> recovery_handler pointer) pair. The defined recovery_handler function
> performs the actual recovery operations in softIrq context after the
> per_bank MCA error matching the corresponding mca_code index. If
> pre_handler can't judge the impacted domain, recovery_handler must figure
> it out.
>
> /* Error has been recovered successfully */
> #define MCA_RECOVERD    0
> /* Error impact one guest as stated in owner field */
> #define MCA_OWNER       1
> /* Error can't be recovered and need reboot system */
> #define MCA_RESET       2
> /* Error should be handled in softIRQ context */
> #define MCA_MORE_ACTION 3
>
> struct mca_handle_result
> {
>     uint32_t flags;
>     /* Valid only when flags & MCA_OWNER */
>     domid_d owner;
>     /* valid only when flags & MCA_RECOVERD */
>     struct  recovery_action *action;
> };
>
> struct mca_error_handler
> {
>     /*
>      * Assume we will need only architecture defined code. If the index
> can't be setup by * mca_code, we will add a function to do the (index,
> recovery_handler) mapping check. * This mca_code represents the recovery
> handler pointer index for identifying this * particular error's
> corresponding recover action
>     */
>     uint16_t mca_code;
>
>     /* Handler to be called in softIRQ handler context */
>     int recovery_handler(struct mcinfo_bank *bank,
>                      struct mcinfo_global *global,
>                      struct mcinfo_extended *extention,
>                      struct mca_handle_result *result);
>
> };
>
> struct mca_error_handler intel_mca_handler[] =
> {
>     ....
> };
>
> struct mca_error_handler amd_mca_handler[] =
> {
>     ....
> };
>
>
> /* HandlVer to be called in MCA ISR in MCA context */
> int intel_mca_pre_handler(struct cpu_user_regs *regs,
>                                 struct mca_handle_result *result);
>
> int amd_mca_pre_handler(struct cpu_user_regs *regs,
>                             struct mca_handle_result *result);
>
> Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote:
> > Jiang, Yunhong wrote:
> >> Frank/Christopher, can you please give more comments for it, or you are
> >> OK with this? For the action reporting mechanism, we will send out a
> >> proposal for review soon.
> >
> > I'm ok with this. We need a little more information on the AMD
> > mechanism, but it seems to me that we can fit this in.
> >
> > Sometime this week, I'll also send out the last of our changes that
> > haven't been sent upstream to xen-unstable yet. Maybe we can combine
> > some things in to one patch, like the telemetry handling changes that
> > Gavin did. The other changes are error injection (for debugging) and
> > panic crash dump support for our FMA tools, but those are probably only
> > interesting to us.
> >
> > - Frank



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.