Xen project Mailing List

Re: [Xen-devel] [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM

On Thu, May 14, 2015 at 04:27:45PM +0800, Chen, Tiejun wrote: > On 2015/5/11 19:32, Wei Liu wrote: > >On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote: > >>On 2015/5/8 22:43, Wei Liu wrote: > >>>Sorry for the late review. This series fell through the crack. > >>> > >> > >>Thanks for your review. > >> > >>>On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote: > >>>>While building a VM, HVM domain builder provides struct hvm_info_table{} > >>>>to help hvmloader. Currently it includes two fields to construct guest > >>>>e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should > >>>>check them to fix any conflict with RAM. > >>>> > >>>>RMRR can reside in address space beyond 4G theoretically, but we never > >>>>see this in real world. So in order to avoid breaking highmem layout > >>> > >>>How does this break highmem layout? > >> > >>In most cases highmen is always continuous like [4G, ...] but RMRR is > >>theoretically beyond 4G but very rarely. Especially we don't see this > >>happened in real world. So we don't want to such a case breaking the > >>highmem. > >> > > > >The problem is we take this approach just because this rarely happens > >*now* is not future proof. It needs to be clearly documented somewhere > >in the manual (or any other Intel docs) and be referenced in the code. > >Otherwise we might end up in a situation that no-one knows how it is > >supposed to work and no-one can fix it if it breaks in the future, that > >is, every single device on earth requires RMRR > 4G overnight (I'm > >exaggerating). > > > >Or you can just make it works with highmem. How much more work do you > >envisage? > > > >(If my above comment makes no sense do let me know. I only have very > >shallow understanding of RMRR) > > Maybe I'm misleading you :) > > I don't mean RMRR > 4G is not allowed to work in our implementation. What > I'm saying is that our *policy* is just simple for this kind of rare highmem > case... > Yes, but this policy is hardcoded in code (as in, you bail when detecting conflict in highmem region). I don't think we have the expertise to un-hardcode it in the future (it might require x86 specific insider information and specific hardware to test). So I would like to make it as future proof as possible. I know you're limited by hvm_info. I would accept this hardcoded policy as short term solution, but I would like commitment from Intel to maintain this piece of code and properly work out a flexible solution to deal with >4G case. > > > >>> > >>>>we don't solve highmem conflict. Note this means highmem rmrr could still > >>>>be supported if no conflict. > >>>> > > Like these two sentences above. > > >>> > >>>Aren't you actively trying to avoid conflict in libxl? > >> > >>RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some > >>good policies to address RMRR. In the case of highmemt, that simple policy > >>should be enough? > >> > > > >Whatever policy you and HV maintainers agree on. Just clearly document > >it. > > Do you mean I should brief this patch description into one separate > document? > Either in code comment or in a separate document. > > > >>> [...] > >>The caller to xc_device_get_rdm always frees this. > >> > > > >I think I misunderstood how this function works. I thought xrdm was > >passed in by caller, which is clearly not the case. Sorry! > > > > > >In that case, the `if ( rc < 0 )' is not needed because the call should > >always return rc < 0. An assert is good enough. > > assert(rc < 0)? But we can't presume the user always pass '0' firstly, and > additionally, we may have no any RMRR indeed. > > So I guess what you want is, > > assert( rc <=0 ); > if ( !rc ) > goto xrdm; > Yes I was thinking about something like this. How in this case can rc equal to 0? Is it because you don't actually have any regions? If so, please write a comment. > if ( errno == ENOBUFS ) > ... > > Right? > > > [...] > >>> > >>>>+ memset(d_config->rdms, 0, sizeof(libxl_device_rdm)); > >>>>+ > >>>>+ /* Query all RDM entries in this platform */ > >>>>+ if (type == LIBXL_RDM_RESERVE_TYPE_HOST) { > >>>>+ d_config->num_rdms = nr_all_rdms; > >>>>+ for (i = 0; i < d_config->num_rdms; i++) { > >>>>+ d_config->rdms[i].start = > >>>>+ (uint64_t)xrdm[i].start_pfn << > >>>>XC_PAGE_SHIFT; > >>>>+ d_config->rdms[i].size = > >>>>+ (uint64_t)xrdm[i].nr_pages << > >>>>XC_PAGE_SHIFT; > >>>>+ d_config->rdms[i].flag = d_config->b_info.rdm.reserve; > >>>>+ } > >>>>+ } else { > >>>>+ d_config->num_rdms = 0; > >>>>+ } > >>> > >>>And you should move d_config->rdms = libxl__calloc inside that `if'. > >>>That is, don't allocate memory if you don't need it. > >> > >>We can't since in all cases we need to preallocate this, and then we will > >>handle this according to our policy. > >> > > > >How would it ever be used again if you set d_config->num_rdms to 0? How > >do you know the exact size of your array again? > > If we don't consider multiple devices shares one rdm entry, our workflow can > be showed as follows: > > #1. We always preallocate all rdms[] but with memset(). Because the number of RDM entries used in a domain never exceeds the number of global entries? I.e. we never risk overrunning the array? > #2. Then we have two cases for that global rule, > > #2.1 If we really need all according to our global rule, we would set all > rdms[] with all real rdm info and set d_config->num_rdms. > #2.2 If we have no such a global rule to obtain all, we just clear > d_config->num_rdms. > No, don't do this. In any failure path, if the free / dispose function depends on num_rdms to iterate through the whole list to dispose memory (if your rdm structure later contains pointers to allocated memory), this method leaks memory. The num_ field should always reflect the actual size of the array. > #3. Then for per device rule > > #3.1 From #2.1, we just need to check if we should change one given rdm > entry's policy if this given entry is just associated to this device. > #3.2 From 2.2, obviously we just need to fill rdms one by one. Of course, > its very possible that we don't fill all rdms since all passthroued devices > might not have no rdm at all or they just occupy some. But anyway, finally > we sync d_config->num_rdms. > Sorry I don't follow. But if your problem is you don't know how many entries you actually need, just use libxl__realloc? > > > >>> > >>>>+ free(xrdm); > >>>>+ > >>>>+ /* Query RDM entries per-device */ > >>>>+ for (i = 0; i < d_config->num_pcidevs; i++) { > >>>>+ unsigned int nr_entries = 0; > >> > >>Maybe I should restate this, > >> unsigned int nr_entries; > >> [...] > >> > >>Like I replied above we always preallocate this at the beginning. > >> > > > >Ah, OK. > > > >But please don't do this. It's hard to see you don't overrun the > >buffer. Please allocate memory only when you need it. > > Sorry I don't understand this. As I mention above, we don't know how many > rdm entries we really need to allocate. So this is why we'd like to > preallocate all rdms at the beginning. Then this can cover all cases, global > policy, (global policy & per device) and only per device. Even if multiple > devices shares one rdm we also need to avoid duplicating a new... > Can you use libxl__realloc? Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.