|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
>
> On 07.02.22 18:37, Jan Beulich wrote:
>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 18:15, Jan Beulich wrote:
>>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>>>>> On 07.02.22 17:26, Jan Beulich wrote:
>>>>>> 1b. Make vpci_write use write lock for writes to command register and
>>>>>> BARs
>>>>>> only; keep using the read lock for all other writes.
>>>>> I am not quite sure how to do that. Do you mean something like:
>>>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>>> uint32_t data)
>>>>> [snip]
>>>>> list_for_each_entry ( r, &pdev->vpci->handlers, node )
>>>>> {
>>>>> [snip]
>>>>> if ( r->needs_write_lock)
>>>>> write_lock(d->vpci_lock)
>>>>> else
>>>>> read_lock(d->vpci_lock)
>>>>> ....
>>>>>
>>>>> And provide rw as an argument to:
>>>>>
>>>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>>>> vpci_write_t *write_handler, unsigned int
>>>>> offset,
>>>>> unsigned int size, void *data, --->>> bool
>>>>> write_path <<<-----)
>>>>>
>>>>> Is this what you mean?
>>>> This sounds overly complicated. You can derive locally in vpci_write(),
>>>> from just its "reg" and "size" parameters, whether the lock needs taking
>>>> in write mode.
>>> Yes, I started writing a reply with that. So, the summary (ROM
>>> position depends on header type):
>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>> {
>>> read PCI_COMMAND and see if memory or IO decoding are enabled.
>>> if ( enabled )
>>> write_lock(d->vpci_lock)
>>> else
>>> read_lock(d->vpci_lock)
>>> }
>> Hmm, yes, you can actually get away without using "size", since both
>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>> accesses get split in vpci_ecam_write().
> But, OS may want reading a single byte of ROM BAR, so I think
> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> ranges
>> For the command register the memory- / IO-decoding-enabled check may
>> end up a little more complicated, as the value to be written also
>> matters. Maybe read the command register only for the ROM BAR write,
>> using the write lock uniformly for all command register writes?
> Sounds good for the start.
> Another concern is that if we go with a read_lock and then in the
> underlying code we disable memory decoding and try doing
> something and calling cmd_write handler for any reason then....
>
> I mean that the check in the vpci_write is somewhat we can tolerate,
> but then it is must be considered that no code in the read path
> is allowed to perform write path functions. Which brings a pretty
> valid use-case: say in read mode we detect an unrecoverable error
> and need to remove the device:
> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>
> What do we do then? It is all going to be fragile...
I have tried to summarize the options we have wrt locking
and would love to hear from @Roger and @Jan.
In every variant there is a task of dealing with the overlap
detection in modify_bars, so this is the only place as of now
which needs special treatment.
Existing limitations: there is no way to upgrade a read lock to a write
lock, so paths which may require write lock protection need to use
write lock from the very beginning. Workarounds can be applied.
1. Per-domain rw lock, aka d->vpci_lock
==============================================================
Note: with per-domain rw lock it is possible to do without introducing
per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
should be required.
This is only going to work in case if vpci_write always takes the write lock
and vpci_read takes a read lock and no path in vpci_read is allowed to
perform write path operations.
vpci_process_pending uses write lock as it have vpci_remove_device in its
error path.
Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars
Cons:
- all writes are serialized
- need to carefully select read paths, so they are guaranteed not to lead
to lock upgrade use-cases
1.1. Semi read lock upgrade in modify bars
--------------------------------------------------------------
In this case both vpci_read and vpci_write take a read lock and when it comes
to modify_bars:
1. read_unlock(d->vpci_lock)
2. write_lock(d->vpci_lock)
3. Check that pdev->vpci is still available and is the same object:
if (pdev->vpci && (pdev->vpci == old_vpci) )
{
/* vpci structure is valid and can be used. */
}
else
{
/* vpci has gone, return an error. */
}
Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars
- readers and writers are NOT serialized
- NO need to carefully select read paths, so they are guaranteed not to lead
to lock upgrade use-cases
Cons:
- ???
2. per-device lock (pdev->vpci_lock) + d->overlap_chk_lock
==============================================================
In order to solve overlap ABBA, we introduce a per-domain helper
lock to protect the overlapping code in modify_bars:
old_vpci = pdev->vpci;
spin_unlock(pdev->vpci_lock);
spin_lock(pdev->domain->overlap_chk_lock);
spin_lock(pdev->vpci_lock);
if ( pdev->vpci && (pdev->vpci == old_vpci) )
for_each_pdev ( pdev->domain, tmp )
{
if ( tmp != pdev )
{
spin_lock(tmp->vpci_lock);
if ( tmp->vpci )
...
}
}
Pros:
- all accesses are independent, only the same device access is serialized
- no need to care about readers and writers wrt read lock upgrade issues
Cons:
- helper spin lock
3. Move overlap detection into process pending
==============================================================
There is a Roger's patch [1] which adds a possibility for vpci_process_pending
to perform different tasks rather than just map/unmap. With this patch extended
in a way that it can hold a request queue it is possible to delay execution
of the overlap code until no pdev->vpci_lock is held, but before returning to
a guest after vpci_{read|write} or similar.
Pros:
- no need to emulate read lock upgrade
- fully parallel read/write
- queue in the vpci_process_pending will later on be used by SR-IOV,
so this is going to help the future code
Cons:
- ???
4. Re-write overlap detection code
==============================================================
It is possible to re-write overlap detection code, so the information about the
mapped/unmapped regions is not read from vpci->header->bars[i] of each device,
but instead there is a per-domain structure which holds the regions and
implements reference counting.
Pros:
- solves ABBA
Cons:
- very complex code is expected
5. You name it
==============================================================
From all the above I would recommend we go with option 2 which seems to
reliably
solve ABBA and does not bring cons of the other approaches.
Thank you in advance,
Oleksandr
[1]
https://lore.kernel.org/all/5BABA6EF02000078001EC452@xxxxxxxxxxxxxxxxxxxxxxxx/T/#m231fb0586007725bfd8538bb97ff1777a36842cf
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |