[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH v2 00/17] RFC: SGX Virtualization design and draft patches
On Mon, Dec 04, 2017 at 08:15:11AM +0800, Boqun Feng wrote: > Hi all, > > This is the v2 of RFC SGX Virtualization design and draft patches, you Ping ;-) Any comments? Regards, Boqun > can find v1 at: > > https://lists.gt.net/xen/devel/483404 > > In the new version, I fix a few things according to the feedbacks for > previous version(mostly are cleanups and code movement). > > Besides, Kai and I redesign the SGX MSRs setting up part and introduce > new XL parameter 'lehash' and 'lewr'. > > Another big change is that I modify the EPC management to fit EPC pages > in 'struct page_info', and in patch #6 and #7, unscrubbable pages, > 'PGC_epc', 'MEMF_epc' and 'XENZONE_EPC' are introduced, so that EPC > management is fully integrated into existing memory management of xen. > This might be the controversial bit, so patch 6~8 are simply to show the > idea and drive deep discussion. > > Detailed changes since v1: (modifications with tag "[New]" is totally > new in this series, reviews and comments are highly welcome for those > parts) > > * Make SGX related mostly common for x86 by: 1) moving sgx.[ch] to > arch/x86/ and include/asm-x86/ and 2) renaming EPC related functions > with domain_* prefix. > > * Rename ioremap_cache() with ioremap_wb() and make it x86-specific as > suggested by Jan Beulich. > > * Remove percpu sgx_cpudata, during bootup secondary CPUs now check > whether they read different value than boot CPU, if so SGX is > disabled. > > * Remove domain_has_sgx_{,launch_control}, and make sure we can > rely on domain's arch.cpuid->feat.sgx{_lc} for setting checks. > > * Cleanup the code for CPUID handling as suggested by Andrew Cooper. > > * Adjust to msr_policy framework for SGX MSRs handling, and remove > unnecessary fields like 'readable' and 'writable' > > * Use 'page_info' to maintain EPC pages, and [NEW] add an draft > implementation for employing xenheap for EPC page management. Please > see patch 6~8 > > * [New] Modify the XL parameter for SGX, please see section 2.1.1 in > the updated design doc. > > * [New] Use _set_vcpu_msrs hypercall in the toolstack to set the SGX > related. Please see patch #17. > > * ACPI related tool changes are temporarily dropped in this patchset, > as I need more time to resolve the comments and do related tests. > > And the update design doc is as follow, as the previous version in the > design there are some particualr points that we don't know which > implementation is better. For those a question mark (?) is added at the > right of the menu. And for SGX live migration, thanks to Wei Liu for > providing comments that it's nice to support if we can in previous > version review, but we'd like hear more from you guys so we still put a > question mark fot this item. Your comments on those "question mark (?)" > parts (and other comments as well, of course) are highly appreciated. > > =================================================================== > 1. SGX Introduction > 1.1 Overview > 1.1.1 Enclave > 1.1.2 EPC (Enclave Paage Cache) > 1.1.3 ENCLS and ENCLU > 1.2 Discovering SGX Capability > 1.2.1 Enumerate SGX via CPUID > 1.2.2 Intel SGX Opt-in Configuration > 1.3 Enclave Life Cycle > 1.3.1 Constructing & Destroying Enclave > 1.3.2 Enclave Entry and Exit > 1.3.2.1 Synchonous Entry and Exit > 1.3.2.2 Asynchounous Enclave Exit > 1.3.3 EPC Eviction and Reload > 1.4 SGX Launch Control > 1.5 SGX Interaction with IA32 and IA64 Architecture > 2. SGX Virtualization Design > 2.1 High Level Toolstack Changes > 2.1.1 New 'sgx' XL configure file parameter > 2.1.2 New XL commands (?) > 2.1.3 Notify domain's virtual EPC base and size to Xen > 2.2 High Level Hypervisor Changes > 2.2.1 EPC Management > 2.2.2 EPC Virtualization > 2.2.3 Populate EPC for Guest > 2.2.4 Launch Control Support > 2.2.5 CPUID Emulation > 2.2.6 EPT Violation & ENCLS Trapping Handling > 2.2.7 Guest Suspend & Resume > 2.2.8 Destroying Domain > 2.3 Additional Point: Live Migration, Snapshot Support (?) > 3. Reference > > 1. SGX Introduction > > 1.1 Overview > > 1.1.1 Enclave > > Intel Software Guard Extensions (SGX) is a set of instructions and mechanisms > for memory accesses in order to provide security accesses for sensitive > applications and data. SGX allows an application to use it's pariticular > address > space as an *enclave*, which is a protected area provides confidentiality and > integrity even in the presence of privileged malware. Accesses to the enclave > memory area from any software not resident in the enclave are prevented, > including those from privileged software. Below diagram illustrates the > presence > of Enclave in application. > > |-----------------------| > | | > | |---------------| | > | | OS kernel | | |-----------------------| > | |---------------| | | | > | | | | | |---------------| | > | |---------------| | | | Entry table | | > | | Enclave |---|-----> | |---------------| | > | |---------------| | | | Enclave stack | | > | | App code | | | |---------------| | > | |---------------| | | | Enclave heap | | > | | Enclave | | | |---------------| | > | |---------------| | | | Enclave code | | > | | App code | | | |---------------| | > | |---------------| | | | > | | | |-----------------------| > |-----------------------| > > SGX supports SGX1 and SGX2 extensions. SGX1 provides basic enclave support, > and SGX2 allows additional flexibility in runtime management of enclave > resources and thread execution within an enclave. > > 1.1.2 EPC (Enclave Page Cache) > > Just like normal application memory management, enclave memory management can > be > devided into two parts: address space allocation and memory commitment. > Address > space allocation is allocating particular range of linear address space for > enclave. Memory commitment is assigning actual resource for the enclave. > > Enclave Page Cache (EPC) is the physical resource used to commit to enclave. > EPC is divided to 4K pages. An EPC page is 4K in size and always aligned to 4K > boundary. Hardware performs additional access control checks to restrict > access > to the EPC page. The Enclave Page Cache Map (EPCM) is a secure structure which > holds one entry for each EPC page, and is used by hardware to track the status > of each EPC page (invisibe to software). Typically EPC and EPCM are reserved > by BIOS as Processor Reserved Memory but the actual amount, size, and layout > of EPC are model-specific, and dependent on BIOS settings. EPC is enumerated > via new SGX CPUID, and is reported as reserved memory. > > EPC pages can either be invalid or valid. There are 4 valid EPC types in SGX1: > regular EPC page, SGX Enclave Control Structure (SECS) page, Thread Control > Structure (TCS) page, and Version Array (VA) page. SGX2 adds Trimmed EPC page. > Each enclave is associated with one SECS page. Each thread in enclave is > associated with one TCS page. VA page is used in EPC page eviction and reload. > Trimmed EPC page is introduced in SGX2 when particular 4K page in enclave is > going to be freed (trimmed) at runtime after enclave is initialized. > > 1.1.3 ENCLS and ENCLU > > Two new instructions ENCLS and ENCLU are introduced to manage enclave and EPC. > ENCLS can only run in ring 0, while ENCLU can only run in ring 3. Both ENCLS > and > ENCLU have multiple leaf functions, with EAX indicating the specific leaf > function. > > SGX1 supports below ENCLS and ENCLU leaves: > > ENCLS: > - ECREATE, EADD, EEXTEND, EINIT, EREMOVE (Enclave build and destroy) > - EPA, EBLOCK, ETRACK, EWB, ELDU/ELDB (EPC eviction & reload) > > ENCLU: > - EENTER, EEXIT, ERESUME (Enclave entry, exit, re-enter) > - EGETKEY, EREPORT (SGX key derivation, attestation) > > Additionally, SGX2 supports below ENCLS and ENCLU leaves for runtime > add/remove > EPC page to enclave after enclave is initialized, along with permission > change. > > ENCLS: > - EAUG, EMODT, EMODPR > > ENCLU: > - EACCEPT, EACCEPTCOPY, EMODPE > > VMM is able to interfere with ENCLS running in guest (see 1.2.x SGX > interaction > with VMX) but is unable to interfere with ENCLU. > > 1.2 Discovering SGX Capability > > 1.2.1 Enumerate SGX via CPUID > > If CPUID.0x7.0:EBX.SGX (bit 2) is 1, then processor supports SGX and SGX > capability and resource can be enumerated via new SGX CPUID (0x12). > CPUID.0x12.0x0 reports SGX capability, such as the presence of SGX1, SGX2, > enclave's maximum size for both 32-bit and 64-bit application. CPUID.0x12.0x1 > reports the availability of bits that can be set for SECS.ATTRIBUTES. > CPUID.0x12.0x2 reports the EPC resource's base and size. Platform may support > multiple EPC sections, and CPUID.0x12.0x3 and further sub-leaves can be used > to detect the existence of multiple EPC sections (until CPUID reports invalid > EPC). > > Refer to 37.7.2 Intel SGX Resource Enumeration Leaves for full description of > SGX CPUID 0x12. > > 1.2.2 Intel SGX Opt-in Configuration > > On processors that support Intel SGX, IA32_FEATURE_CONTROL also provides the > SGX_ENABLE bit (bit 18) to turn on/off SGX. Before system software can enable > and use SGX, BIOS is required to set IA32_FEATURE_CONTROL.SGX_ENABLE = 1 to > opt-in SGX. > > Setting SGX_ENABLE follows the rules of IA32_FEATURE_CONTROL.LOCK (bit 0). > Software is considered to have opted into Intel SGX if and only if > IA32_FEATURE_CONTROL.SGX_ENABLE and IA32_FEATURE_CONTROL.LOCK are set to 1. > > The setting of IA32_FEATURE_CONTROL.SGX_ENABLE (bit 18) is not reflected by > SGX CPUID. Enclave instructions will behavior differently according to value > of CPUID.0x7.0x0:EBX.SGX and whether BIOS has opted-in SGX. > > Refer to 37.7.1 Intel SGX Opt-in Configuration for more information. > > 1.3 Enclave Life Cycle > > 1.3.1 Constructing & Destroying Enclave > > Enclave is created via ENCLS[ECREATE] leaf by previleged software. Basically > ECREATE converts an invalid EPC page into SECS page, according to a source > SECS > structure resides in normal memory. The source SECS contains enclave's info > such as base (linear) address, size, enclave attributes, enclave's > measurement, > etc. > > After ECREATE, for each 4K linear address space page, priviledged software > uses > EADD and EEXTEND to add one EPC page to it. Enclave code/data (resides in > normal > memory) is loaded to enclave during EADD for enclave's each 4K page. After all > EPC pages are added to enclave, priviledged software calls EINIT to initialize > the enclave, and then enclave is ready to run. > > During enclave is constructed, enclave measurement, which is a SHA256 hash > value, is also built according to enclave's size, code/data itself and its > location in enclave, etc. The measurement can be used to uniquely identify the > enclave. SIGSTRUCT in EINIT leaf also contains the measurement specified by > untrusted software, via MRENCLAVE. EINIT will check the two measurements and > will only succeed when the two matches. > > Enclave is destroyed by running EREMOVE for all Enclave's EPC page, and then > for enclave's SECS. EREMOVE will report SGX_CHILD_PRESENT error if it is > called > for SECS when there's still regular EPC pages that haven't been removed from > enclave. > > Please refer to SDM chapter 39.1 Constructing an Enclave for more infomation. > > 1.3.2 Enclave Entry and Exit > > 1.3.2.1 Synchonous Entry and Exit > > After enclave is constructed, non-priviledged software use ENCLU[EENTER] to > enter enclave to run. While process runs in enclave, non-priviledged software > can use ENCLU[EEXIT] to exit from enclave and return to normal mode. > > 1.3.2.2 Asynchounous Enclave Exit > > Asynchronous and synchronous events, such as exceptions, interrupts, traps, > SMIs, and VM exits may occur while executing inside an enclave. These events > are referred to as Enclave Exiting Events (EEE). Upon an EEE, the processor > state is securely saved inside the enclave and then replaced by a synthetic > state to prevent leakage of secrets. The process of securely saving state and > establishing the synthetic state is called an Asynchronous Enclave Exit (AEX). > > After AEX, non-priviledged software uses ENCLU[ERESUME] to re-enter enclave. > The SGX userspace software maintains a small piece of code (resides in normal > memory) which basically calls ERESUME to re-enter enclave. The address of this > piece of code is called Asynchronous Exit Pointer (AEP). AEP is specified as > parameter in EENTER and will be kept internally in enclave. Upon AEX, AEP will > be pushed to stack and upon returning from EEE handling, such as IRET, AEP > will > be loaded to RIP and ERESUME will be called subsequently to re-enter enclave. > > During AEX the processor will do context saving and restore automatically > therefore no change to interrupt handling of OS kernel and VMM is required. It > is SGX userspace software's responsibility to setup AEP correctly. > > Please refer to SDM chapter 39.2 Enclave Entry and Exit for more infomation. > > 1.3.3 EPC Eviction and Reload > > SGX also allows priviledged software to evict any EPC pages that are used by > enclave. The idea is the same as normal memory swapping. Below is the detail > info of how to evict EPC pages. > > Below is the sequence to evict regular EPC page: > > 1) Select one or multiple regular EPC pages from one enclave > 2) Remove EPT/PT mapping for selected EPC pages > 3) Send IPIs to remote CPUs to flush TLB of selected EPC pages > 4) EBLOCK on selected EPC pages > 5) ETRACK on enclave's SECS page > 6) allocate one available slot (8-byte) in VA page > 7) EWB on selected EPC pages > > With EWB taking: > > - VA slot, to restore eviction version info. > - one normal 4K page in memory, to store encrypted content of EPC page. > - one struct PCMD in memory, to store meta data. > > (VA slot is a 8-byte slot in VA page, which is a particualr EPC page.) > > And below is the sequence to evict an SECS page or VA page: > > 1) locate SECS (or VA) page > 2) remove EPT/PT mapping for SECS (or VA) page > 3) Send IPIs to remote CPUs > 6) allocate one available slot (8-byte) in VA page > 4) EWB on SECS (or) page > > And for evicting SECS page, all regular EPC pages that belongs to that SECS > must be evicted out prior, otherwise EWB returns SGX_CHILD_PRESENT error. > > And to reload an EPC page: > > 1) ELDU/ELDB on EPC page > 2) setup EPT/PT mapping > > With ELDU/ELDB taking: > > - location of SECS page > - linear address of enclave's 4K page (that we are going to reload to) > - VA slot (used in EWB) > - 4K page in memory (used in EWB) > - struct PCMD in memory (used in EWB) > > Please refer to SDM chapter 39.5 EPC and Management of EPC pages for more > information. > > 1.4 SGX Launch Control > > SGX requires running "Launch Enclave" (LE) before running any other enclaves. > This is because LE is the only enclave that does not requires EINITTOKEN in > EINIT. Running any other enclave requires a valid EINITTOKEN, which contains > MAC of the (first 192 bytes) EINITTOKEN calculated by EINITTOKEN key. EINIT > will verify the MAC via internally deriving the EINITTOKEN key, and only the > EINITTOKEN that has matched MAC will be accepted by EINIT. The EINITTOKEN key > derivation depends on some info from LE. The typical process is LE generates > EINITTOKEN for other enclave according to LE itself and the target enclave, > and calcualtes the MAC by using ENCLU[EGETKEY] to get the EINITTOKEN key. Only > LE is able to get the EINITTOKEN key. > > Running LE requies the SHA256 hash of LE signer's RSA public key (SHA256 of > sigstruct->modulus) to equal to IA32_SGXLEPUBKEYHASH[0-3] MSRs (the 4 MSRs > together makes up 256-bit SHA256 hash value). > > If CPUID.0x7.0x0:EBX.SGX and CPUID.0x7.0x0:ECX.SGX_LAUNCH_CONTROL[bit 30] is > set, then IA32_FEATURE_CONTROL is available, and IA32_FEATURE_CONTROL MSR has > SGX_LAUNCH_CONTROL_ENABLE bit (bit 17) available. 1-setting of > SGX_LAUNCH_CONTROL_ENABLE bit enables runtime change of IA32_SGXLEPUBKEYHASHn > after IA32_FEATURE_CONTROL is locked. Otherwise, IA32_SGXLEPUBKEYHASHn are > read-only after IA32_FEATURE_CONTROL is locked. After reset, > IA32_SGXLEPUBKEYHASHn will be set to hash of Intel's default key. On system > that has only CPUID.0x7.0x0:EBX.SGX set, IA32_SGXLEPUBKEYHASHn are not > available. On such system EINIT will always treat IA32_SGXLEPUBKEYHASHn as > Intel's default value thus only Intel's LE is able to run. > > On system with IA32_SGXLEPUBKEYHASHn available, it is BIOS's implementation to > decide whether to provide configurations to user to set IA32_SGXLEPUBKEYHASHn > in *locked* (IA32_SGXLEPUBKEYHASHn are read-only after IA32_FEATURE_CONTROL is > locked) or *unlocked* mode (IA32_SGXLEPUBKEYHASHn are writable to kernel at > runtime). Also BIOS may or may not provide configurations to allow user to set > custom value of IA32_SGXLEPUBKEYHASHn. > > 1.5 SGX Interaction with IA32 and IA64 Architecture > > SDM Chapter 42 describes SGX interaction with various features in IA32 and > IA64 > architecture. Below outlines the major ones. Refer to Chapter 42 for full > description of SGX interaction with various IA32 and IA64 features. > > 1.5.1 VMX Changes for Supporting SGX Virtualization > > A new 64-bit ENCLS-exiting bitmap control field is added to VMCS (encoding > 0202EH) to control VMEXIT on ENCLS leaf functions. And a new "Enable ENCLS > exiting" control bit (bit 15) is defined in secondary processor based vm > execution control. 1-Setting of "Enable ENCLS exiting" enables ENCLS-exiting > bitmap control. ENCLS-exiting bitmap controls which ENCLS leaves will trigger > VMEXIT. > > Additionally two new bits are added to indicate whether VMEXIT (any) is from > enclave. Below two bits will be set if VMEXIT is from enclave: > - Bit 27 in the Exit reason filed of Basic VM-exit information. > - Bit 4 in the Interruptibility State of Guest Non-Register State of VMCS. > > Refer to 42.5 Interactions with VMX, 27.2.1 Basic VM-Exit Information, and > 27.3.4 Saving Non-Register. > > 1.5.2 Interaction with XSAVE > > SGX defines a sub-field called X-Feature Request Mask (XFRM) in the attributes > field of SECS. On enclave entry, SGX HW verifies XFRM in SECS.ATTRIBUTES are > already enabled in XCR0. > > Upon AEX, SGX saves the processor extended state and miscellaneous state to > enclave's state-save area (SSA), and clear the secrets from processor extended > state that is used by enclave (from leaking secrets). > > Refer to 42.7 Interaction with Processor Extended State and Miscellaneous > State > > 1.5.3 Interaction with S state > > When processor goes into S3-S5 state, EPC is destroyed, thus all enclaves are > destroyed as well consequently. > > Refer to 42.14 Interaction with S States. > > 2. SGX Virtualization Design > > 2.1 High Level Toolstack Changes: > > 2.1.1 New 'sgx' XL configure file parameter > > EPC is limited resource. In order to use EPC efficiently among all domains, > when creating guest, administrator should be able to specify domain's virtual > EPC size. And admin alao should be able to get all domain's virtual EPC size. > > For SGX Launch Control virtualization, we should allow admin to create VM with > either VM's virtual IA32_SGXLEPUBKEYHASHn locked or unlocked, and we should > also allow admin to create VM with custom IA32_SGXLEPUBKEYHASHn value. > > For above purposes, below new 'sgx' XL configure file parameter is added: > > sgx = 'epc=<size>,lehash=<sha256-hash>,lewr=<0|1>' > > In which 'epc' specifies VM's EPC size in MB and it's mandatory. > > When physical machine is in *locked* mode, both 'lehash' and 'lewr' > cannot be specificed, as physical machine are unable to change > IA32_SGXLEPUBKEYHASHn at runtime. Adding either 'lehash' and 'lewr' will > cause failure to create VM in that case. And VM's initial > IA32_SGXLEPUBKEYHASHn value will be set to value of physical MSRs. > > When physical machine is in *unlocked* mode, then VM's initial > IA32_SGXLEPUBKEYHASHn value will be set to 'lehash' if specified, or > Intel's default value. VM's SGX_LAUNCH_CONTROL_ENABLE bit in > IA32_FEATURE_CONTROL will be set or cleared, depending on whether 'lewr' > is specificied (or set to true or false expilicity). > > Please also refer to 2.2.4 Launch Control Support. > > 2.1.2 New XL commands (?) > > Administrator should be able to get physical EPC size, and all domain's > virtual > EPC size. For this purpose, we can introduce 2 additional commands: > > # xl sgxinfo > > Which will print out physical EPC size, and other SGX info (such as SGX1, > SGX2, > etc) if necessary. > > # xl sgxlist <did> > > Which will print out particular domain's virtual EPC size, or list all virtual > EPC sizes for all supported domains. > > Alternatively, we can also extend existing XL commands by adding new option > > # xl info -sgx > > Which will print out physical EPC size along with other physinfo. And > > # xl list <did> -sgx > > Which will print out domain's virtual EPC size. > > Comments? > > In this RFC the two new commands are not implemented yet. > > 2.1.3 Notify domain's virtual EPC base and size to Xen > > Xen needs to know guest's EPC base and size in order to populate EPC pages for > it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid. > > 2.2 High Level Xen Hypervisor Changes: > > 2.2.1 EPC Management > > Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before > supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible > that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on, > until invaid EPC is reported), but this is typically on MP-socket server on > which each package would have its own EPC. > > EPC is reported as reserved memory (so it is not reported as normal memory). > EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of > each > EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and > free > EPC pages for guest. > > Although typically on physical machine (at least existing machines), EPC is > ~100M in size at maximum, but we cannot assume EPC size, thus in terms of EPC > management, it's better to integrate EPC management to Xen's memmory > management > framework to take advantage of existing Xen's memory management algorithms. > > Specifically, one 'struct page_info' will be created for each EPC page, just > like normal memory, and a new flag will be defined to identify whether 'struct > page_info' is EPC or normal memory. Existing memory allocation API > alloc_domheap_pages will be resued to allocate EPC page, by adding a new > memflag > 'MEMF_epc' to indicate EPC allocation, rather than memory allocation. The new > 'MEMF_epc' can also be used for EPC ballooning (if required in the future), as > with the new flag, existing XENMEM_increase{decrease}_reservation, > XENMEM_populate_physmap can be resued for EPC as well. > > 2.2.2 EPC Virtualization > > This part is how to populate EPC for guests. We have 3 choices: > - Static Partitioning > - Oversubscription > - Ballooning > > Static Partitioning means all EPC pages will be allocated and mapped to guest > when it is created, and there's no runtime change of page table mappings for > EPC > pages. Oversubscription means Xen hypervisor supports EPC page swapping > between > domains, meaning Xen is able to evict EPC page from another domain and assign > it > to the domain that needs the EPC. With oversubscription, EPC can be assigned > to > domain on demand, when EPT violation happens. Ballooning is similar to memory > ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest. > > Static Partitioning is the easiest way in terms of implementation, and there > will be no hypervisor overhead (except EPT overhead of course), because in > "Static partitioning", there is no EPT violation for EPC, and Xen doesn't need > to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode. > > Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static > Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't > have EPT violation for EPC either. To support ballooning, we need ballooning > driver in guest to issue hypercall to give up or reclaim EPC pages. In terms > of > hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2) > Using existing XENMEM_{increase/decrease}_reservation with new memory flag, > ie, > XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not > later. > > Oversubscription looks nice but it requires more complicated implemetation. > Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow > specific > steps to evict EPC pages, and in order to do that, basically Xen needs to trap > ENCLS from guest and keep track of EPC page status and enclave info from all > guest. This is because: > - To evict regular EPC page, Xen needs to know SECS location > - Xen needs to know EPC page type: evicting regular EPC and evicting SECS, > VA page have different steps. > - Xen needs to know EPC page status: whether the page is blocked or not. > > Those info can only be got by trapping ENCLS from guest, and parsing its > parameters (to identify SECS page, etc). Parsing ENCLS parameters means we > need > to know which ENCLS leaf is being trapped, and we need to translate guest's > virtual address to get physical address in order to locate EPC page. And once > ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to > reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's > virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective > address* > which is able to be traslated by processor when running ENCLS. > > -------------------------------------------------------------- > | ENCLS | > -------------------------------------------------------------- > | /|\ > ENCLS VMEXIT| | VMENTRY > | | > \|/ | > > 1) parse ENCLS parameters > 2) reconstruct(remap) guest's ENCLS parameters > 3) run ENCLS on behalf of guest (and skip ENCLS) > 4) on success, update EPC/enclave info, or inject error > > And Xen needs to maintain each EPC page's status (type, blocked or not, in > enclave or not, etc). Xen also needs to maintain all Enclave's info from all > guests, in order to find the correct SECS for regular EPC page, and enclave's > linear address as well. > > So in general, "Static Partitioning" has simplest implementation, but > obviously > not the best way to use EPC efficiently; "Ballooning" has all pros of Static > Partitioning but requies guest balloon driver; "Oversubscription" is best in > terms of flexibility but requires complicated hypervisor implemetation. > > We will start with "Static Partitioning". If "Ballooning" is required in the > future, we will support it. "Oversubscription" should not be needed in > forseeable future. > > 2.2.3 Populate EPC for Guest > > Toolstack notifies Xen about domain's EPC base and size by > XEN_DOMCTL_set_cpuid, > so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid, > particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen > checks the values passed from toolstack is valid, Xen will allocate all EPC > pages and setup EPT mappings for guest. > > 2.2.4 Launch Control Support > > To support running multiple domains with each running its own LE signed by > different owners, physical machine's BIOS must leave IA32_SGXLEPUBKEYHASHn > *unlocked* before handing to Xen. Xen will trap domain's write to > IA32_SGXLEPUBKEYHASHn and keep the value in vcpu internally, and update the > value to physical MSRs when vcpu is scheduled in. This can guarantee that > when EINIT runs in guest, guest's virtual IA32_SGXLEPUBKEYHASHn have been > written to physical MSRs. > > SGX_LAUNCH_CONTROL_ENABLE bit in guest's IA32_FEATURE_CONTROL is controlled > by new added 'lewr' XL parameter (see 2.1.1 New 'sgx' XL configure file > parameter). > > If physical IA32_SGXLEPUBKEYHASHn are *locked* in machine's BIOS, then only > MSR > read is allowed from guest, and Xen will inject error for guest's MSR writes. > > In addition, if physical IA32_SGXLEPUBKEYHASHn are *locked*, then creating > guest > with 'lehash' parameter or 'lewr' will fail, as in such case Xen is not able > to > update guest's virtual IA32_SGXLEPUBKEYHASHn to physical MSRs. > > If physical IA32_SGXLEPUBKEYHASHn are not available > (CPUID.0x7.0x0:ECX.SGX_LAUHCN_CONTROL is not present), then creating VM with > 'lehash' and 'lewr' will also fail. In addition, any MSR read/write for > IA32_SGXLEPUBKEYHASHn from guest is invalid and Xen will inject error in such > case. > > 2.2.5 CPUID Emulation > > Most of native SGX CPUID info can be exposed to guest, expect below two parts: > - Sub-leaf 0x2 needs to report domain's virtual EPC base and size, instead > of physical EPC info. > - Sub-leaf 0x1 needs to be consistent with guest's XCR0. For the reason of > this part please refer to 1.5.2 Interaction with XSAVE. > > 2.2.6 EPT Violation & ENCLS Trapping Handling > > Only needed when Xen supports EPC Oversubscription, as explained above. > > 2.2.7 Guest Suspend & Resume > > On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy > guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by > Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will > destroy EPC if S State is S3-S5. > > Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may > not handle EPC suspend & resume correctly, in which case physically guest's > EPC > pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC > pages are becoming invalid. Otherwise further operation in guest on EPC may > fault as it assumes all EPC pages are invalid after guest is resumed. > > For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen > will > keep this SECS page into a list, and call EREMOVE for them again after all EPC > pages have been called with EREMOVE. This time the EREMOVE on SECS will > succeed > as all children (regular EPC pages) have already been removed. > > 2.2.8 Destroying Domain > > Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen > will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) > before > free them, as guest may shutdown unexpected (ex, user kills guest), and in > this > case, guest's EPC may still be valid. > > 2.3 Additional Point: Live Migration, Snapshot Support (?) > > Actually from hardware's point of view, SGX is not migratable. There are two > reasons: > > - SGX key architecture cannot be virtualized. > > For example, some keys are bound to CPU. For example, Sealing key, EREPORT > key, etc. If VM is migrated to another machine, the same enclave will > derive > the different keys. Taking Sealing key as an example, Sealing key is > typically used by enclave (enclave can get sealing key by EGETKEY) to > *seal* > its secrets to outside (ex, persistent storage) for further use. If > Sealing > key changes after VM migration, then the enclave can never get the sealed > secrets back by using sealing key, as it has changed, and old sealing key > cannot be got back. > > - There's no ENCLS to evict EPC page to normal memory, but at the meaning > time, still keep content in EPC. Currently once EPC page is evicted, the > EPC > page becomes invalid. So technically, we are unable to implement live > migration (or check pointing, or snapshot) for enclave. > > But, with some workaround, and some facts of existing SGX driver, technically > we are able to support Live migration (or even check pointing, snapshot). This > is because: > > - Changing key (which is bound to CPU) is not a problem in reality > > Take Sealing key as an example. Losing sealed data is not a problem, > because > sealing key is only supposed to encrypt secrets that can be provisioned > again. The typical work model is, enclave gets secrets provisioned from > remote (service provider), and use sealing key to store it for further > use. > When enclave tries to *unseal* use sealing key, if the sealing key is > changed, enclave will find the data is some kind of corrupted (integrity > check failure), so it will ask secrets to be provisioned again from > remote. > Another reason is, in data center, VM's typically share lots of data, and > as > sealing key is bound to CPU, it means the data encrypted by one enclave on > one machine cannot be shared by another enclave on another mahcine. So > from > SGX app writer's point of view, developer should treat Sealing key as a > changeable key, and should handle lose of sealing data anyway. Sealing key > should only be used to seal secrets that can be easily provisioned again. > > For other keys such as EREPORT key and provisioning key, which are used > for > local attestation and remote attestation, due to the second reason below, > losing them is not a problem either. > > - Sudden lose of EPC is not a problem. > > On hardware, EPC will be lost if system goes to S3-S5, or reset, or > shutdown, and SGX driver need to handle lose of EPC due to power > transition. > This is done by cooperation between SGX driver and userspace SGX SDK/apps. > However during live migration, there may not be power transition in guest, > so there may not be EPC lose during live migration. And technically we > cannot *really* live migrate enclave (explained above), so looks it's not > feasible. But the fact is that both Linux SGX driver and Windows SGX > driver > have already supported *sudden* lose of EPC (not EPC lose during power > transition), which means both driver are able to recover in case EPC is > lost > at any runtime. With this, technically we are able to support live > migration > by simply ignoring EPC. After VM is migrated, the destination VM will only > suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX > driver are already able to handle. > > But we must point out such *sudden* lose of EPC is not hardware behavior, > and other SGX driver for other OSes (such as FreeBSD) may not implement > this, so for those guests, destination VM will behavior in unexpected > manner. But I am not sure we need to care about other OSes. > > For the same reason, we are able to support check pointing for SGX guest (only > Linux and Windows); > > For snapshot, we can support snapshot SGX guest by either: > > - Suspend guest before snapshot (s3-s5). This works for all guests but > requires user to manually susppend guest. > - Issue an hypercall to destroy guest's EPC in save_vm. This only works > for > Linux and Windows but doesn't require user intervention. > > What's your comments? > > 3. Reference > > - Intel SGX Homepage > https://software.intel.com/en-us/sgx > > - Linux SGX SDK > https://01.org/intel-software-guard-extensions > > - Linux SGX driver for upstreaming > https://github.com/01org/linux-sgx > > - Intel SGX Specification (SDM Vol 3D) > > https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf > > - Paper: Intel SGX Explained > https://eprint.iacr.org/2016/086.pdf > > - ISCA 2015 tutorial slides for Intel® SGX - Intel® Software > https://software.intel.com/sites/default/files/332680-002.pdf > > Boqun Feng (5): > xen: mm: introduce non-scrubbable pages > xen: mm: manage EPC pages in Xen heaps > xen: x86/mm: add SGX EPC management > xen: x86: add functions to populate and destroy EPC for domain > xen: tools: add SGX to applying MSR policy > > Kai Huang (12): > xen: x86: expose SGX to HVM domain in CPU featureset > xen: x86: add early stage SGX feature detection > xen: vmx: detect ENCLS VMEXIT > xen: x86/mm: introduce ioremap_wb() > xen: p2m: new 'p2m_epc' type for EPC mapping > xen: x86: add SGX cpuid handling support. > xen: vmx: handle SGX related MSRs > xen: vmx: handle ENCLS VMEXIT > xen: vmx: handle VMEXIT from SGX enclave > xen: x86: reset EPC when guest got suspended. > xen: tools: add new 'sgx' parameter support > xen: tools: add SGX to applying CPUID policy > > docs/misc/xen-command-line.markdown | 8 + > tools/libxc/Makefile | 1 + > tools/libxc/include/xc_dom.h | 4 + > tools/libxc/include/xenctrl.h | 16 + > tools/libxc/xc_cpuid_x86.c | 68 ++- > tools/libxc/xc_msr_x86.h | 10 + > tools/libxc/xc_sgx.c | 82 +++ > tools/libxl/libxl.h | 3 +- > tools/libxl/libxl_cpuid.c | 15 +- > tools/libxl/libxl_create.c | 10 + > tools/libxl/libxl_dom.c | 65 ++- > tools/libxl/libxl_internal.h | 2 + > tools/libxl/libxl_nocpuid.c | 4 +- > tools/libxl/libxl_types.idl | 11 + > tools/libxl/libxl_x86.c | 12 + > tools/ocaml/libs/xc/xenctrl_stubs.c | 11 +- > tools/python/xen/lowlevel/xc/xc.c | 11 +- > tools/xl/xl_parse.c | 86 +++ > tools/xl/xl_parse.h | 1 + > xen/arch/x86/Makefile | 1 + > xen/arch/x86/cpu/common.c | 15 + > xen/arch/x86/cpuid.c | 62 ++- > xen/arch/x86/domctl.c | 87 ++- > xen/arch/x86/hvm/hvm.c | 3 + > xen/arch/x86/hvm/vmx/vmcs.c | 16 +- > xen/arch/x86/hvm/vmx/vmx.c | 68 +++ > xen/arch/x86/hvm/vmx/vvmx.c | 11 + > xen/arch/x86/mm.c | 9 +- > xen/arch/x86/mm/p2m-ept.c | 3 + > xen/arch/x86/mm/p2m.c | 41 ++ > xen/arch/x86/msr.c | 6 +- > xen/arch/x86/sgx.c | 815 > ++++++++++++++++++++++++++++ > xen/common/page_alloc.c | 39 +- > xen/include/asm-arm/mm.h | 9 + > xen/include/asm-x86/cpufeature.h | 4 + > xen/include/asm-x86/cpuid.h | 29 +- > xen/include/asm-x86/hvm/hvm.h | 3 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 8 + > xen/include/asm-x86/hvm/vmx/vmx.h | 3 + > xen/include/asm-x86/mm.h | 19 +- > xen/include/asm-x86/msr-index.h | 6 + > xen/include/asm-x86/msr.h | 5 + > xen/include/asm-x86/p2m.h | 12 +- > xen/include/asm-x86/sgx.h | 86 +++ > xen/include/public/arch-x86/cpufeatureset.h | 3 +- > xen/include/xen/mm.h | 2 + > xen/tools/gen-cpuid.py | 3 + > 47 files changed, 1757 insertions(+), 31 deletions(-) > create mode 100644 tools/libxc/xc_sgx.c > create mode 100644 xen/arch/x86/sgx.c > create mode 100644 xen/include/asm-x86/sgx.h > > -- > 2.15.0 > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |