[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM



On Thu, May 14, 2015 at 04:27:45PM +0800, Chen, Tiejun wrote:
> On 2015/5/11 19:32, Wei Liu wrote:
> >On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote:
> >>On 2015/5/8 22:43, Wei Liu wrote:
> >>>Sorry for the late review. This series fell through the crack.
> >>>
> >>
> >>Thanks for your review.
> >>
> >>>On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote:
> >>>>While building a VM, HVM domain builder provides struct hvm_info_table{}
> >>>>to help hvmloader. Currently it includes two fields to construct guest
> >>>>e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> >>>>check them to fix any conflict with RAM.
> >>>>
> >>>>RMRR can reside in address space beyond 4G theoretically, but we never
> >>>>see this in real world. So in order to avoid breaking highmem layout
> >>>
> >>>How does this break highmem layout?
> >>
> >>In most cases highmen is always continuous like [4G, ...] but RMRR is
> >>theoretically beyond 4G but very rarely. Especially we don't see this
> >>happened in real world. So we don't want to such a case breaking the
> >>highmem.
> >>
> >
> >The problem is  we take this approach just because this rarely happens
> >*now* is not future proof.  It needs to be clearly documented somewhere
> >in the manual (or any other Intel docs) and be referenced in the code.
> >Otherwise we might end up in a situation that no-one knows how it is
> >supposed to work and no-one can fix it if it breaks in the future, that
> >is, every single device on earth requires RMRR > 4G overnight (I'm
> >exaggerating).
> >
> >Or you can just make it works with highmem. How much more work do you
> >envisage?
> >
> >(If my above comment makes no sense do let me know. I only have very
> >shallow understanding of RMRR)
> 
> Maybe I'm misleading you :)
> 
> I don't mean RMRR > 4G is not allowed to work in our implementation. What
> I'm saying is that our *policy* is just simple for this kind of rare highmem
> case...
> 

Yes, but this policy is hardcoded in code (as in, you bail when
detecting conflict in highmem region). I don't think we have the
expertise to un-hardcode it in the future (it might require x86 specific
insider information and specific hardware to test). So I would like to
make it as future proof as possible.

I know you're limited by hvm_info. I would accept this hardcoded policy
as short term solution, but I would like commitment from Intel to
maintain this piece of code and properly work out a flexible solution to
deal with >4G case.

> >
> >>>
> >>>>we don't solve highmem conflict. Note this means highmem rmrr could still
> >>>>be supported if no conflict.
> >>>>
> 
> Like these two sentences above.
> 
> >>>
> >>>Aren't you actively trying to avoid conflict in libxl?
> >>
> >>RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some
> >>good policies to address RMRR. In the case of highmemt, that simple policy
> >>should be enough?
> >>
> >
> >Whatever policy you and HV maintainers agree on. Just clearly document
> >it.
> 
> Do you mean I should brief this patch description into one separate
> document?
> 

Either in code comment or in a separate document.

> >
> >>>

[...]

> >>The caller to xc_device_get_rdm always frees this.
> >>
> >
> >I think I misunderstood how this function works. I thought xrdm was
> >passed in by caller, which is clearly not the case. Sorry!
> >
> >
> >In that case, the `if ( rc < 0 )' is not needed because the call should
> >always return rc < 0. An assert is good enough.
> 
> assert(rc < 0)? But we can't presume the user always pass '0' firstly, and
> additionally, we may have no any RMRR indeed.
> 
> So I guess what you want is,
> 
> assert( rc <=0 );
> if ( !rc )
>     goto xrdm;
> 

Yes I was thinking about something like this.

How in this case can rc equal to 0? Is it because you don't actually have any
regions? If so, please write a comment.

> if ( errno == ENOBUFS )
> ...
> 
> Right?
> 
> >

[...]

> >>>
> >>>>+    memset(d_config->rdms, 0, sizeof(libxl_device_rdm));
> >>>>+
> >>>>+    /* Query all RDM entries in this platform */
> >>>>+    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
> >>>>+        d_config->num_rdms = nr_all_rdms;
> >>>>+        for (i = 0; i < d_config->num_rdms; i++) {
> >>>>+            d_config->rdms[i].start =
> >>>>+                                (uint64_t)xrdm[i].start_pfn << 
> >>>>XC_PAGE_SHIFT;
> >>>>+            d_config->rdms[i].size =
> >>>>+                                (uint64_t)xrdm[i].nr_pages << 
> >>>>XC_PAGE_SHIFT;
> >>>>+            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
> >>>>+        }
> >>>>+    } else {
> >>>>+        d_config->num_rdms = 0;
> >>>>+    }
> >>>
> >>>And you should move d_config->rdms = libxl__calloc inside that `if'.
> >>>That is, don't allocate memory if you don't need it.
> >>
> >>We can't since in all cases we need to preallocate this, and then we will
> >>handle this according to our policy.
> >>
> >
> >How would it ever be used again if you set d_config->num_rdms to 0? How
> >do you know the exact size of your array again?
> 
> If we don't consider multiple devices shares one rdm entry, our workflow can
> be showed as follows:
> 
> #1. We always preallocate all rdms[] but with memset().

Because the number of RDM entries used in a domain never exceeds the
number of global entries? I.e. we never risk overrunning the array?

> #2. Then we have two cases for that global rule,
> 
> #2.1 If we really need all according to our global rule, we would set all
> rdms[] with all real rdm info and set d_config->num_rdms.
> #2.2 If we have no such a global rule to obtain all, we just clear
> d_config->num_rdms.
> 

No, don't do this. In any failure path, if the free / dispose function
depends on num_rdms to iterate through the whole list to dispose memory
(if your rdm structure later contains pointers to allocated memory),
this method leaks memory.

The num_ field should always reflect the actual size of the array.

> #3. Then for per device rule
> 
> #3.1 From #2.1, we just need to check if we should change one given rdm
> entry's policy if this given entry is just associated to this device.
> #3.2 From 2.2, obviously we just need to fill rdms one by one. Of course,
> its very possible that we don't fill all rdms since all passthroued devices
> might not have no rdm at all or they just occupy some. But anyway, finally
> we sync d_config->num_rdms.
> 

Sorry I don't follow. But if your problem is you don't know how many
entries you actually need, just use libxl__realloc?

> >
> >>>
> >>>>+    free(xrdm);
> >>>>+
> >>>>+    /* Query RDM entries per-device */
> >>>>+    for (i = 0; i < d_config->num_pcidevs; i++) {
> >>>>+        unsigned int nr_entries = 0;
> >>
> >>Maybe I should restate this,
> >>    unsigned int nr_entries;
> >>

[...]

> >>
> >>Like I replied above we always preallocate this at the beginning.
> >>
> >
> >Ah, OK.
> >
> >But please don't do this. It's hard to see you don't overrun the
> >buffer. Please allocate memory only when you need it.
> 
> Sorry I don't understand this. As I mention above, we don't know how many
> rdm entries we really need to allocate. So this is why we'd like to
> preallocate all rdms at the beginning. Then this can cover all cases, global
> policy, (global policy & per device) and only per device. Even if multiple
> devices shares one rdm we also need to avoid duplicating a new...
> 

Can you use libxl__realloc?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.