[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
Hi Wei, Thank you very much for comments. Please see my reply below. On 7/17/2017 9:16 PM, Wei Liu wrote: Hi Kai Thanks for this nice write-up. Some comments and questions below. On Sun, Jul 09, 2017 at 08:03:10PM +1200, Kai Huang wrote:Hi all,[...]2. SGX Virtualization Design 2.1 High Level Toolstack Changes: 2.1.1 New 'epc' parameter EPC is limited resource. In order to use EPC efficiently among all domains, when creating guest, administrator should be able to specify domain's virtual EPC size. And admin alao should be able to get all domain's virtual EPC size. For this purpose, a new 'epc = <size>' parameter is added to XL configuration file. This parameter specifies guest's virtual EPC size. The EPC base address will be calculated by toolstack internally, according to guest's memory size, MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted. 2.1.2 New XL commands (?) Administrator should be able to get physical EPC size, and all domain's virtual EPC size. For this purpose, we can introduce 2 additional commands: # xl sgxinfo Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2, etc) if necessary. # xl sgxlist <did> Which will print out particular domain's virtual EPC size, or list all virtual EPC sizes for all supported domains. Alternatively, we can also extend existing XL commands by adding new option # xl info -sgx Which will print out physical EPC size along with other physinfo. And # xl list <did> -sgx Which will print out domain's virtual EPC size. Comments?Can a guest have multiple EPC? If so, the proposed parameter is not good enough. According to SDM a machine may have multiple EPC, but it may have doesn't mean it must have. EPC is typically reserved by BIOS as Processor Reserved Memory (PRM), and in my understanding, client machine doesn't need to have multiple EPC. Currently, I don't see why we need to expose multiple EPC to guest. Even physical machine reports multiple EPC, exposing one EPC to guest is enough. Currently SGX should not be supported with virtual NUMA simultaneously for a single domain. Can a guest with EPC enabled be migrated? The answer to this question can lead to multiple other questions. See the last section of my design. I saw you've already seen it. :) Another question, is EPC going to be backed by normal memory? This is related to memory accounting of the guest. Although SDM says typically EPC is allocated by BIOS as PRM, but I think we can just treat EPC as PRM, so I believe yes, physically EPC is backed by normal memory. But EPC is reported as reserved memory in e820 table. Is EPC going to be modeled as a device or another type of memory? This is related to how we manage it in the toolstack. I think we'd better to treat EPC as another type of memory. I am not sure whether it should be modeled as device, as on real machine, EPC is also exposed in ACPI table via "INT0E0C" device under \_SB (however it is not modeled as PCIE device for sure). Finally why do you not allow the users to specify the base address? I don't see any reason why user needs to specify base address. If we do, then specify what address? On real machine, BIOS set the base address, and for VM, I think toolstack/Xen should do this. In my RFC patches I didn't implement the commands as I don't know which is better. In the github repo I mentioned at the beginning, there's an old branch in which I implemented 'xl sgxinfo' and 'xl sgxlist', but they are implemented via dedicated hypercall for SGX, which I am not sure whether is a good option so I didn't include it in my RFC patches. 2.1.3 Notify domain's virtual EPC base and size to Xen Xen needs to know guest's EPC base and size in order to populate EPC pages for it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid. 2.1.4 Launch Control Support (?)[...]But maybe integrating EPC to MM framework is more reasonable. Comments? 2.2.2 EPC Virtualization (?) This part is how to populate EPC for guests. We have 3 choices: - Static Partitioning - Oversubscription - BallooningIMHO static partitioning is good enough as a starting point. Ballooning is nice to have but please don't make it mandatory. Not all guests have balloon driver -- imagine a unikernel style secure domain running with EPC. That's good point. Thanks. 2.3 Additional Point: Live Migration, Snapshot Support (?)Oh, here it is. Nice.Actually from hardware's point of view, SGX is not migratable. There are two reasons: - SGX key architecture cannot be virtualized. For example, some keys are bound to CPU. For example, Sealing key, EREPORT key, etc. If VM is migrated to another machine, the same enclave will derive the different keys. Taking Sealing key as an example, Sealing key is typically used by enclave (enclave can get sealing key by EGETKEY) to *seal* its secrets to outside (ex, persistent storage) for further use. If Sealing key changes after VM migration, then the enclave can never get the sealed secrets back by using sealing key, as it has changed, and old sealing key cannot be got back. - There's no ENCLS to evict EPC page to normal memory, but at the meaning time, still keep content in EPC. Currently once EPC page is evicted, the EPC page becomes invalid. So technically, we are unable to implement live migration (or check pointing, or snapshot) for enclave. But, with some workaround, and some facts of existing SGX driver, technically we are able to support Live migration (or even check pointing, snapshot). This is because: - Changing key (which is bound to CPU) is not a problem in reality Take Sealing key as an example. Losing sealed data is not a problem, because sealing key is only supposed to encrypt secrets that can be provisioned again. The typical work model is, enclave gets secrets provisioned from remote (service provider), and use sealing key to store it for further use. When enclave tries to *unseal* use sealing key, if the sealing key is changed, enclave will find the data is some kind of corrupted (integrity check failure), so it will ask secrets to be provisioned again from remote. Another reason is, in data center, VM's typically share lots of data, and as sealing key is bound to CPU, it means the data encrypted by one enclave on one machine cannot be shared by another enclave on another mahcine. So from SGX app writer's point of view, developer should treat Sealing key as a changeable key, and should handle lose of sealing data anyway. Sealing key should only be used to seal secrets that can be easily provisioned again. For other keys such as EREPORT key and provisioning key, which are used for local attestation and remote attestation, due to the second reason below, losing them is not a problem either. - Sudden lose of EPC is not a problem. On hardware, EPC will be lost if system goes to S3-S5, or reset, or shutdown, and SGX driver need to handle lose of EPC due to power transition. This is done by cooperation between SGX driver and userspace SGX SDK/apps. However during live migration, there may not be power transition in guest, so there may not be EPC lose during live migration. And technically we cannot *really* live migrate enclave (explained above), so looks it's not feasible. But the fact is that both Linux SGX driver and Windows SGX driver have already supported *sudden* lose of EPC (not EPC lose during power transition), which means both driver are able to recover in case EPC is lost at any runtime. With this, technically we are able to support live migration by simply ignoring EPC. After VM is migrated, the destination VM will only suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX driver are already able to handle. But we must point out such *sudden* lose of EPC is not hardware behavior, and other SGX driver for other OSes (such as FreeBSD) may not implement this, so for those guests, destination VM will behavior in unexpected manner. But I am not sure we need to care about other OSes.Presumably it wouldn't be too hard for FreeBSD to replicate the behaviour of Linux and Windows. The problem is this is not hardware behavior. If FreeBSD guys just look at the SDM then they may not expect such sudden lose of EPC. But I guess maybe they will just port existing driver. :) For the same reason, we are able to support check pointing for SGX guest (only Linux and Windows); For snapshot, we can support snapshot SGX guest by either: - Suspend guest before snapshot (s3-s5). This works for all guests but requires user to manually susppend guest. - Issue an hypercall to destroy guest's EPC in save_vm. This only works for Linux and Windows but doesn't require user intervention. What's your comments?IMHO it is of course good to have migration and snapshot support for such guests. Thanks. I have no problem supporting migration and snapshot if no one opposes. Thanks, -Kai _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |