|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph/Frank, Followed is the interface definition, please have a look.
Thanks
Yunhong Jiang
1) Interface between Xen/dom0 for passing xen's recovery action information to
dom0.
Usage model: After offlining broken page, Xen might pass its page-offline
recovery action
result information to dom0. Dom0 will save the information in non-volatile
memory for further
proactive actions, such as offlining the easy-broken page early when doing
next reboot.
struct page_offline_action
{
/* Params for passing the offlined page number to DOM0 */
uint64_t mfn;
uint64_t status; /* Similar to page offline hypercall */
};
struct cpu_offline_action
{
/* Params for passing the identity of the offlined CPU to DOM0 */
uint32_t mc_socketid;
uint16_t mc_coreid;
uint16_t mc_core_threadid;
};
struct cache_shrink_action
{
/* TBD, Christoph, please fill it */
};
/* Recover action flags, giving recovery result information to guest */
/* Recovery successfully after taking certain recovery actions below */
#define REC_ACT_RECOVERED (0x1 << 0)
/* For solaris's usage that dom0 will take ownership when crash */
#define REC_ACT_RESET (0x1 << 2)
/* No action is performed by XEN */
#define REC_ACT_INFO (0x1 << 3)
/* Recover action type definition, valid only when flags & REC_ACT_RECOVERED */
#define MC_ACT_PAGE_OFFLINE 1
#define MC_ACT_CPU_OFFLINE 2
#define MC_ACT_CACHE_SHIRNK 3
struct recovery_action
{
uint8_t flags;
uint8_t action_type;
union
{
struct page_offline_action page_retire;
struct cpu_offline_action cpu_offline;
struct cache_shrink_action cache_shrink;
uint8_t pad[MAX_ACTION_SIZE];
} action_info;
}
struct mcinfo_bank {
struct mcinfo_common common;
uint16_t mc_bank; /* bank nr */
uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
* and if mc_addr is valid. Never valid on DomU. */
uint64_t mc_status; /* bank status */
uint64_t mc_addr; /* bank address, only valid
* if addr bit is set in mc_status */
uint64_t mc_misc;
uint64_t mc_ctrl2;
uint64_t mc_tsc;
/* Recovery action is performed per bank */
struct recovery_action action;
};
2) Below two interfaces are for MCA processing internal use.
a. pre_handler will be called earlier in MCA ISR context, mainly for early
need_reset
detection for avoiding log missing (flag MCA_RESET). Also, pre_handler
might
be able to find the impacted domain if possible.
b. mca_error_handler is actually a (error_action_index, recovery_handler
pointer) pair.
The defined recovery_handler function performs the actual recovery
operations in
softIrq context after the per_bank MCA error matching the corresponding
mca_code index.
If pre_handler can't judge the impacted domain, recovery_handler must
figure it out.
/* Error has been recovered successfully */
#define MCA_RECOVERD 0
/* Error impact one guest as stated in owner field */
#define MCA_OWNER 1
/* Error can't be recovered and need reboot system */
#define MCA_RESET 2
/* Error should be handled in softIRQ context */
#define MCA_MORE_ACTION 3
struct mca_handle_result
{
uint32_t flags;
/* Valid only when flags & MCA_OWNER */
domid_d owner;
/* valid only when flags & MCA_RECOVERD */
struct recovery_action *action;
};
struct mca_error_handler
{
/*
* Assume we will need only architecture defined code. If the index can't
be setup by
* mca_code, we will add a function to do the (index, recovery_handler)
mapping check.
* This mca_code represents the recovery handler pointer index for
identifying this
* particular error's corresponding recover action
*/
uint16_t mca_code;
/* Handler to be called in softIRQ handler context */
int recovery_handler(struct mcinfo_bank *bank,
struct mcinfo_global *global,
struct mcinfo_extended *extention,
struct mca_handle_result *result);
};
struct mca_error_handler intel_mca_handler[] =
{
....
};
struct mca_error_handler amd_mca_handler[] =
{
....
};
/* HandlVer to be called in MCA ISR in MCA context */
int intel_mca_pre_handler(struct cpu_user_regs *regs,
struct mca_handle_result *result);
int amd_mca_pre_handler(struct cpu_user_regs *regs,
struct mca_handle_result *result);
Frank.Vanderlinden@xxxxxxx <mailto:Frank.Vanderlinden@xxxxxxx> wrote:
> Jiang, Yunhong wrote:
>> Frank/Christopher, can you please give more comments for it, or you are OK
>> with this? For the action reporting mechanism, we will send out a proposal
>> for review soon.
>
> I'm ok with this. We need a little more information on the AMD
> mechanism, but it seems to me that we can fit this in.
>
> Sometime this week, I'll also send out the last of our changes that
> haven't been sent upstream to xen-unstable yet. Maybe we can combine
> some things in to one patch, like the telemetry handling changes that
> Gavin did. The other changes are error injection (for debugging) and
> panic crash dump support for our FMA tools, but those are probably only
> interesting to us.
>
> - Frank
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |