[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM
On 2015/5/19 4:00, Wei Liu wrote: On Thu, May 14, 2015 at 04:27:45PM +0800, Chen, Tiejun wrote:On 2015/5/11 19:32, Wei Liu wrote:On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote:On 2015/5/8 22:43, Wei Liu wrote:Sorry for the late review. This series fell through the crack.Thanks for your review.On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote:While building a VM, HVM domain builder provides struct hvm_info_table{} to help hvmloader. Currently it includes two fields to construct guest e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should check them to fix any conflict with RAM. RMRR can reside in address space beyond 4G theoretically, but we never see this in real world. So in order to avoid breaking highmem layoutHow does this break highmem layout?In most cases highmen is always continuous like [4G, ...] but RMRR is theoretically beyond 4G but very rarely. Especially we don't see this happened in real world. So we don't want to such a case breaking the highmem.The problem is we take this approach just because this rarely happens *now* is not future proof. It needs to be clearly documented somewhere in the manual (or any other Intel docs) and be referenced in the code. Otherwise we might end up in a situation that no-one knows how it is supposed to work and no-one can fix it if it breaks in the future, that is, every single device on earth requires RMRR > 4G overnight (I'm exaggerating). Or you can just make it works with highmem. How much more work do you envisage? (If my above comment makes no sense do let me know. I only have very shallow understanding of RMRR)Maybe I'm misleading you :) I don't mean RMRR > 4G is not allowed to work in our implementation. What I'm saying is that our *policy* is just simple for this kind of rare highmem case...Yes, but this policy is hardcoded in code (as in, you bail when detecting conflict in highmem region). I don't think we have the expertise to un-hardcode it in the future (it might require x86 specific insider information and specific hardware to test). So I would like to make it as future proof as possible. I know you're limited by hvm_info. I would accept this hardcoded policy as short term solution, but I would like commitment from Intel to maintain this piece of code and properly work out a flexible solution to deal with >4G case. Looks Kevin replied this. we don't solve highmem conflict. Note this means highmem rmrr could still be supported if no conflict.Like these two sentences above.Aren't you actively trying to avoid conflict in libxl?RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some good policies to address RMRR. In the case of highmemt, that simple policy should be enough?Whatever policy you and HV maintainers agree on. Just clearly document it.Do you mean I should brief this patch description into one separate document?Either in code comment or in a separate document. Okay. [...]The caller to xc_device_get_rdm always frees this.I think I misunderstood how this function works. I thought xrdm was passed in by caller, which is clearly not the case. Sorry! In that case, the `if ( rc < 0 )' is not needed because the call should always return rc < 0. An assert is good enough.assert(rc < 0)? But we can't presume the user always pass '0' firstly, and additionally, we may have no any RMRR indeed. So I guess what you want is, assert( rc <=0 ); if ( !rc ) goto xrdm;Yes I was thinking about something like this. How in this case can rc equal to 0? Is it because you don't actually have any Yes. regions? If so, please write a comment. Sure. if ( errno == ENOBUFS ) ... Right?[...]+ memset(d_config->rdms, 0, sizeof(libxl_device_rdm)); + + /* Query all RDM entries in this platform */ + if (type == LIBXL_RDM_RESERVE_TYPE_HOST) { + d_config->num_rdms = nr_all_rdms; + for (i = 0; i < d_config->num_rdms; i++) { + d_config->rdms[i].start = + (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT; + d_config->rdms[i].size = + (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT; + d_config->rdms[i].flag = d_config->b_info.rdm.reserve; + } + } else { + d_config->num_rdms = 0; + }And you should move d_config->rdms = libxl__calloc inside that `if'. That is, don't allocate memory if you don't need it.We can't since in all cases we need to preallocate this, and then we will handle this according to our policy.How would it ever be used again if you set d_config->num_rdms to 0? How do you know the exact size of your array again?If we don't consider multiple devices shares one rdm entry, our workflow can be showed as follows: #1. We always preallocate all rdms[] but with memset().Because the number of RDM entries used in a domain never exceeds the number of global entries? I.e. we never risk overrunning the array? Yes."global" indicates the number would count all rdm entries. If w/o "global" flag, we just count these rdm entries associated to those pci devices assigned to this domain. So this means actually we have this equation, The number without "global" <= The number with "global" #2. Then we have two cases for that global rule, #2.1 If we really need all according to our global rule, we would set all rdms[] with all real rdm info and set d_config->num_rdms. #2.2 If we have no such a global rule to obtain all, we just clear d_config->num_rdms.No, don't do this. In any failure path, if the free / dispose function depends on num_rdms to iterate through the whole list to dispose memory (if your rdm structure later contains pointers to allocated memory), this method leaks memory. The num_ field should always reflect the actual size of the array.#3. Then for per device rule #3.1 From #2.1, we just need to check if we should change one given rdm entry's policy if this given entry is just associated to this device. #3.2 From 2.2, obviously we just need to fill rdms one by one. Of course, its very possible that we don't fill all rdms since all passthroued devices might not have no rdm at all or they just occupy some. But anyway, finally we sync d_config->num_rdms.Sorry I don't follow. But if your problem is you don't know how many entries you actually need, just use libxl__realloc?+ free(xrdm); + + /* Query RDM entries per-device */ + for (i = 0; i < d_config->num_pcidevs; i++) { + unsigned int nr_entries = 0;Maybe I should restate this, unsigned int nr_entries;[...]Like I replied above we always preallocate this at the beginning.Ah, OK. But please don't do this. It's hard to see you don't overrun the buffer. Please allocate memory only when you need it.Sorry I don't understand this. As I mention above, we don't know how many rdm entries we really need to allocate. So this is why we'd like to preallocate all rdms at the beginning. Then this can cover all cases, global policy, (global policy & per device) and only per device. Even if multiple devices shares one rdm we also need to avoid duplicating a new...Can you use libxl__realloc? Looks this can expand size each time automatically, right? If so I think we can try this. Thanks Tiejun _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |