Xen project Mailing List

Re: [PATCH 6/9] vpci/header: Handle p2m range sets per BAR

To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>

Date: Thu, 9 Sep 2021 10:24:16 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=ARrB8G9oG7epXdTmcH7phUZElpm3lxbUVHX5MKp8shw=; b=P5mCDlvs/wJvh+fRB0KTOnXPKNQGE+0ztRvEz5F/ZVliwvKGRYhGPofuSJvebMlhB/AvNH8nPz1iG2mqBnUVCxFpFHch88BexTFISQCGhvAxQzyN/YlogPcmhDFLHgJjBi9FsCmn4SkTYVtrfuzEMQSM7rSjSR2bgFPmYUvpmWcD13sMzV3lPP4pU1kDRd3aDWWYqnY9QPjB5IOdcn6c+2k+qgtCcgGDSViY4Plcgl50uvqgUzbVczsrVAoWdIC1ztWJR5nQKaGkHXzLQbjfCc/GJzmgH4V5oc5s5xoITrLAjMFAT89hpH1Io2otqwltlBpCN1IIQHSeZ2WePN5g+g==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JAJ4sdL3iCzA3UiFuoPjw1U33/Wp33xxvBdmQI3RZa/l0ocFA5o4qbGld49Vz8XViW0FFT4Z4NLU1NqsMjZ0TqFjFkkYKr6aMkP7lXWU8O8Te4ib1duCF9AhdaV6aU0o+yVUkHZfSnxrqbbJL7b3LNih924lvW3/6Mmb9TKWuYY/XwE2DDZb5bPdig+VLcCOFdbPdF6TBkgpfXt1KRCfPxdhwLGXoN4kEy0rH7XRq0nho8/KJ2XY+OsTh5ztQF+2eGBCbDBAg+7YFbU4pLoFt+2Qj1Mley2DoEKRnqKaMJ89l6Qp7MhIyxz8BDhvwQThQA4jivNoYfNGU+f0Xl6Alw==

Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;

Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 09 Sep 2021 08:24:36 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.09.2021 07:22, Oleksandr Andrushchenko wrote: > > On 08.09.21 18:00, Jan Beulich wrote: >> On 08.09.2021 16:31, Oleksandr Andrushchenko wrote: >>> On 06.09.21 17:47, Jan Beulich wrote: >>>> On 03.09.2021 12:08, Oleksandr Andrushchenko wrote: >>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >>>>> >>>>> Instead of handling a single range set, that contains all the memory >>>>> regions of all the BARs and ROM, have them per BAR. >>>> Without looking at how you carry out this change - this look wrong (as >>>> in: wasteful) to me. Despite ... >>>> >>>>> This is in preparation of making non-identity mappings in p2m for the >>>>> MMIOs/ROM. >>>> ... the need for this, every individual BAR is still contiguous in both >>>> host and guest address spaces, so can be represented as a single >>>> (start,end) tuple (or a pair thereof, to account for both host and guest >>>> values). No need to use a rangeset for this. >>> First of all this change is in preparation for non-identity mappings, >> I'm afraid I continue to not see how this matters in the discussion at >> hand. I'm fully aware that this is the goal. >> >>> e.g. currently we collect all the memory ranges which require mappings >>> into a single range set, then we cut off MSI-X regions and then use range >>> set >>> functionality to call a callback for every memory range left after MSI-X. >>> This works perfectly fine for 1:1 mappings, e.g. what we have as the range >>> set's starting address is what we want to be mapped/unmapped. >>> Why range sets? Because they allow partial mappings, e.g. you can map part >>> of >>> the range and return back and continue from where you stopped. And if I >>> understand that correctly that was the initial intention of introducing >>> range sets here. >>> >>> For non-identity mappings this becomes not that easy. Each individual BAR >>> may be >>> mapped differently according to what guest OS has programmed as >>> bar->guest_addr >>> (guest view of the BAR start). >> I don't see how the rangeset helps here. You have a guest and a host pair >> of values for every BAR. Pages with e.g. the MSI-X table may not be mapped >> to their host counterpart address, yes, but you need to special cases >> these anyway: Accesses to them need to be handled. Hence I'm having a hard >> time seeing how a per-BAR rangeset (which will cover at most three distinct >> ranges afaict, which is way too little for this kind of data organization >> imo) can gain you all this much. >> >> Overall the 6 BARs of a device will cover up to 8 non-adjacent ranges. IOW >> the majority (4 or more) of the rangesets will indeed merely represent a >> plain (start,end) pair (or be entirely empty). > First of all, let me explain why I decided to move to per-BAR > range sets. > Before this change all the MMIO regions and MSI-X holes were > accounted by a single range set, e.g. we go over all BARs and > add MMIOs and then subtract MSI-X from there. When it comes to > mapping/unmapping we have an assumtion that the starting address of > each element in the range set is equal to map/unmap address, e.g. > we have identity mapping. Please note, that the range set accepts > a single private data parameter which is enough to hold all > required data about the pdev in common, but there is no way to provide > any per-BAR data. > > Now, that we want non-identity mappings, we can no longer assume > that starting address == mapping address and we need to provide > additional information on how to map and which is now per-BAR. > This is why I decided to use per-BAR range sets. > > One of the solutions may be that we form an additional list of > structures in a form (I ommit some of the fields): > struct non_identity { > unsigned long start_mfn; > unsigned long start_gfn; > unsigned long size; > }; > So this way when the range set gets processed we go over the list > and find out the corresponding list's element which describes the > range set entry being processed (s, e, data): > > static int map_range(unsigned long s, unsigned long e, void *data, > unsigned long *c) > { > [snip] > go over the list elements > if ( list->start_mfn == s ) > found, can use list->start_gfn for mapping > [snip] > } > This has some complications as map_range may be called multiple times > for the same range: if {unmap|map}_mmio_regions was not able to complete > the operation it returns the number of pages it was able to process: > rc = map->map ? map_mmio_regions(map->d, start_gfn, > size, _mfn(s)) > : unmap_mmio_regions(map->d, start_gfn, > size, _mfn(s)); > In this case we need to update the list item: > list->start_mfn += rc; > list->start_gfn += rc; > list->size -= rc; > and if all the pages of the range were processed delete the list entry. > > With respect of creating the list everything also not so complicated: > while processing each BAR create a list entry and fill it with mfn, gfn > and size. Then, if MSI-X region is present within this BAR, break the > list item into multiple ones with respect to the holes, for example: > > MMIO 0 list item > MSI-X hole 0 > MMIO 1 list item > MSI-X hole 1 > > Here instead of a single BAR description we now have 2 list elements > describing the BAR without MSI-X regions. > > All the above still relies on a single range set per pdev as it is in the > original code. We can go this route if we agree this is more acceptable > than the range sets per BAR I guess I am now even more confused: I can't spot any "rangeset per pdev" either. The rangeset I see being used doesn't get associated with anything that's device-related; it gets accumulated as a transient data structure, but _all_ devices owned by a domain influence its final content. If you associate rangesets with either a device or a BAR, I'm failing to see how you'd deal with multiple BARs living in the same page (see also below). Considering that a rangeset really is a compressed representation of a bitmap, I wonder whether this data structure is suitable at all for what you want to express. You have two pieces of information to carry / manage, after all: Which ranges need mapping, and what their GFN <-> MFN relationship is. Maybe the latter needs expressing differently in the first place? And then in a way that's ensuring by its organization that no conflicting GFN <-> MFN mappings will be possible? Isn't this precisely what is already getting recorded in the P2M? I'm also curious what your plan is to deal with BARs overlapping in MFN space: In such a case, the guest cannot independently change the GFNs of any of the involved BARs. (Same the other way around: overlaps in GFN space are only permitted when the same overlap exists in MFN space.) Are you excluding (forbidding) this case? If so, did I miss you saying so somewhere? Yet if no overlaps are allowed in the first place, what modify_bars() does would be far more complicated than necessary in the DomU case, so it may be worthwhile considering to deviate more from how Dom0 gets taken care of. In the end a guest writing a BAR is merely a request to change its P2M. That's very different from Dom0 writing a BAR, which means the physical BAR also changes, and hence the P2M changes in quite different a way. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.