[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 6/9] vpci/header: Handle p2m range sets per BAR

  • To: Jan Beulich <jbeulich@xxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Thu, 9 Sep 2021 05:22:21 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=+CwpRiGSJLLP1t+op9L5h6wdjH0oQcj9ympwW45/GcM=; b=Fu/zpACr55ERa+RnrV9oq206mNQHrRv1IXYgaIW861ni5QKFUq10UeOkGoP1m2pnPMfmC2TyqOlKtSZiICSa3iFLri+/QU8/MFn4R86SdbXoh8CiC5FRhsIJBtPCgwcMVDDtO608A7y8Ing2y/JYYBAA4kn/HaQsHepknh14Q2zd67QrVpwTc+kE3LhUy9aiNcT8rzN8DMdXrSWV9S9jB5gis+E6ZE4CmSdxOkaeS5gkTwWglhpSQBMt+7Bj+tCW5IPGIYQt2A/pGIsmHUIwBIq+Lst6FM/GpfX5c3SHFdsbfubYv06BR/P7L3DAJFWVCEkOfe8R2w0yA1PvmpVWZg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G3b0cn+IQ58ZtPw9WRhT9ASaEC4BGzJbhCWHERtHLjH6oPDFaoRg3vNsD5leiFcHVjBTA3hBfVSqK2T8uDja04YSXqsyNrakAuYOa6w5mY8szEs4aAweFLhJVej7Bo1kFXqnpnz79PdL4DQBWuTbkDwRLbNlRz3QzkNg6cTXYbXnt16uypgVYzZGkK7YDx+nS7Wnx9dFVBb6eFFssECs/W0TBZkJzv9h+xgJTOTICN6hLS8lGXkF3FigKk+yB20x0P70gnOoRlfG9J0bpResa2g+QK0Tv3tJb+dvMSHRgTrPmAOV7yKlI+ZUfjF8DUECl5i/vyETFON7vu7i8CGm4Q==
  • Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=epam.com;
  • Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 09 Sep 2021 05:22:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXoKxjXfke7B20QEeo6YjlfUMhlquXGvYAgAMgDgCAAAg4gIAA8L+A
  • Thread-topic: [PATCH 6/9] vpci/header: Handle p2m range sets per BAR

On 08.09.21 18:00, Jan Beulich wrote:
> On 08.09.2021 16:31, Oleksandr Andrushchenko wrote:
>> On 06.09.21 17:47, Jan Beulich wrote:
>>> On 03.09.2021 12:08, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>> Instead of handling a single range set, that contains all the memory
>>>> regions of all the BARs and ROM, have them per BAR.
>>> Without looking at how you carry out this change - this look wrong (as
>>> in: wasteful) to me. Despite ...
>>>> This is in preparation of making non-identity mappings in p2m for the
>>>> MMIOs/ROM.
>>> ... the need for this, every individual BAR is still contiguous in both
>>> host and guest address spaces, so can be represented as a single
>>> (start,end) tuple (or a pair thereof, to account for both host and guest
>>> values). No need to use a rangeset for this.
>> First of all this change is in preparation for non-identity mappings,
> I'm afraid I continue to not see how this matters in the discussion at
> hand. I'm fully aware that this is the goal.
>> e.g. currently we collect all the memory ranges which require mappings
>> into a single range set, then we cut off MSI-X regions and then use range set
>> functionality to call a callback for every memory range left after MSI-X.
>> This works perfectly fine for 1:1 mappings, e.g. what we have as the range
>> set's starting address is what we want to be mapped/unmapped.
>> Why range sets? Because they allow partial mappings, e.g. you can map part of
>> the range and return back and continue from where you stopped. And if I
>> understand that correctly that was the initial intention of introducing 
>> range sets here.
>> For non-identity mappings this becomes not that easy. Each individual BAR 
>> may be
>> mapped differently according to what guest OS has programmed as 
>> bar->guest_addr
>> (guest view of the BAR start).
> I don't see how the rangeset helps here. You have a guest and a host pair
> of values for every BAR. Pages with e.g. the MSI-X table may not be mapped
> to their host counterpart address, yes, but you need to special cases
> these anyway: Accesses to them need to be handled. Hence I'm having a hard
> time seeing how a per-BAR rangeset (which will cover at most three distinct
> ranges afaict, which is way too little for this kind of data organization
> imo) can gain you all this much.
> Overall the 6 BARs of a device will cover up to 8 non-adjacent ranges. IOW
> the majority (4 or more) of the rangesets will indeed merely represent a
> plain (start,end) pair (or be entirely empty).
First of all, let me explain why I decided to move to per-BAR
range sets.
Before this change all the MMIO regions and MSI-X holes were
accounted by a single range set, e.g. we go over all BARs and
add MMIOs and then subtract MSI-X from there. When it comes to
mapping/unmapping we have an assumtion that the starting address of
each element in the range set is equal to map/unmap address, e.g.
we have identity mapping. Please note, that the range set accepts
a single private data parameter which is enough to hold all
required data about the pdev in common, but there is no way to provide
any per-BAR data.

Now, that we want non-identity mappings, we can no longer assume
that starting address == mapping address and we need to provide
additional information on how to map and which is now per-BAR.
This is why I decided to use per-BAR range sets.

One of the solutions may be that we form an additional list of
structures in a form (I ommit some of the fields):
struct non_identity {
     unsigned long start_mfn;
     unsigned long start_gfn;
     unsigned long size;
So this way when the range set gets processed we go over the list
and find out the corresponding list's element which describes the
range set entry being processed (s, e, data):

static int map_range(unsigned long s, unsigned long e, void *data,
                      unsigned long *c)
     go over the list elements
         if ( list->start_mfn == s )
             found, can use list->start_gfn for mapping
This has some complications as map_range may be called multiple times
for the same range: if {unmap|map}_mmio_regions was not able to complete
the operation it returns the number of pages it was able to process:
         rc = map->map ? map_mmio_regions(map->d, start_gfn,
                                          size, _mfn(s))
                       : unmap_mmio_regions(map->d, start_gfn,
                                            size, _mfn(s));
In this case we need to update the list item:
     list->start_mfn += rc;
     list->start_gfn += rc;
     list->size -= rc;
and if all the pages of the range were processed delete the list entry.

With respect of creating the list everything also not so complicated:
while processing each BAR create a list entry and fill it with mfn, gfn
and size. Then, if MSI-X region is present within this BAR, break the
list item into multiple ones with respect to the holes, for example:

MMIO 0 list item
MSI-X hole 0
MMIO 1 list item
MSI-X hole 1

Here instead of a single BAR description we now have 2 list elements
describing the BAR without MSI-X regions.

All the above still relies on a single range set per pdev as it is in the
original code. We can go this route if we agree this is more acceptable
than the range sets per BAR
>> Thus we need to collect all those non-identity mappings
>> per BAR now (so we have a mapping "guest view" : "physical BAR" and again 
>> cut off
>> MSI-X regions as before.  So, yes, it may be a bit wasteful to use many 
>> range sets,
>> but makes vPCI life much-much easier.
> Which I'm yet to be convinced of. Then again I'm not the maintainer of
> this code, so if you can convince Roger you'll be all good.

Per-BAR range sets look more clear to me and add relatively less code which

seems to be good.

>> Thus, I think that even per-BAR range sets are
>> good to go as they have more pros than cons. IMO
>> Even if we go with "can be represented as a single (start,end) tuple" it 
>> doesn't answer
>> the question what needs to be done if a range gets partially mapped/unmapped.
> This question also isn't answered when you use rangesets.
bool vpci_process_pending(struct vcpu *v)


             rc = rangeset_consume_ranges(bar->mem, map_range, &data);

             if ( rc == -ERESTART )
                 return true;

> Jan
Thank you,




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.