[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 6/7] xen/riscv: page table handling



On Thu, 2024-08-15 at 17:26 +0200, Jan Beulich wrote:
> On 15.08.2024 15:34, oleksii.kurochko@xxxxxxxxx wrote:
> > On Thu, 2024-08-15 at 14:16 +0200, Jan Beulich wrote:
> > > On 15.08.2024 13:21, oleksii.kurochko@xxxxxxxxx wrote:
> > > > On Thu, 2024-08-15 at 10:09 +0200, Jan Beulich wrote:
> > > > > On 14.08.2024 18:50, oleksii.kurochko@xxxxxxxxx wrote:
> > > > > > On Tue, 2024-08-13 at 12:31 +0200, Jan Beulich wrote:
> > > > > > > On 09.08.2024 18:19, Oleksii Kurochko wrote:
> > > > > > > > RISC-V detects superpages using `pte.x` and `pte.r`, as
> > > > > > > > there
> > > > > > > > is no specific bit in the PTE for this purpose. From
> > > > > > > > the
> > > > > > > > RISC-V
> > > > > > > > spec:
> > > > > > > > ```
> > > > > > > >   ...
> > > > > > > >   4. Otherwise, the PTE is valid. If pte.r = 1 or pte.x
> > > > > > > > =
> > > > > > > > 1, go
> > > > > > > > to
> > > > > > > > step 5.
> > > > > > > >      Otherwise, this PTE is a pointer to the next level
> > > > > > > > of
> > > > > > > > the
> > > > > > > > page
> > > > > > > > table.
> > > > > > > >      ... .
> > > > > > > >   5. A leaf PTE has been found.
> > > > > > > >      ...
> > > > > > > >   ...
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > The code doesn’t support super page shattering so 4KB
> > > > > > > > pages
> > > > > > > > are
> > > > > > > > used as
> > > > > > > > default.
> > > > > > > 
> > > > > > > ... you have this. Yet still callers expecting re-mapping
> > > > > > > in
> > > > > > > the
> > > > > > > (large)
> > > > > > > range they map can request small-page mappings right
> > > > > > > away.
> > > > > > I am not sure that I fully understand what do you mean by
> > > > > > "expcting
> > > > > > re-
> > > > > > mapping".
> > > > > 
> > > > > Right now you have callers pass PTE_BLOCK when they know that
> > > > > no
> > > > > small
> > > > > page re-mappings are going to occur for an area. What I'm
> > > > > suggesting
> > > > > is
> > > > > that you invert this logic: Have callers pass PTE_SMALL when
> > > > > there is
> > > > > a possibility that re-mapping requests may be issued later.
> > > > > Then,
> > > > > later, by simply grep-ing for PTE_SMALL you'll be able to
> > > > > easily
> > > > > find
> > > > > all candidates that possibly can be relaxed when super-page
> > > > > shattering
> > > > > starts being supported. That's going to be easier than
> > > > > finding
> > > > > all
> > > > > instances where PTE_BLOCK is _not_used.
> > > > So if I understand correctly. Actually nothing will change in
> > > > algorithm
> > > > of pt_update() and only PTE_SMALL should be introduced instead
> > > > of
> > > > PTE_BLOCK. And if I will know that something will be better to
> > > > map
> > > > as
> > > > PTE_SMALL to not face shattering in case of unmap (for example)
> > > > I
> > > > just
> > > > can map this memory as PTE_SMALL and that is it?
> > > 
> > > That is it.
> > > 
> > > > > > > > +    spin_unlock(&xen_pt_lock);
> > > > > > > 
> > > > > > > Does this really need to come after fence and flush?
> > > > > > I think yes, as page table should be updated only by 1 CPU
> > > > > > at
> > > > > > the
> > > > > > same
> > > > > > time. And before give ability to other CPU to update page
> > > > > > table
> > > > > > we
> > > > > > have
> > > > > > to finish a work on current CPU.
> > > > > 
> > > > > Can you then explain to me, perhaps by way of an example,
> > > > > what
> > > > > will
> > > > > go
> > > > > wrong if the unlock is ahead of the flush? (I'm less certain
> > > > > about
> > > > > the
> > > > > fence, and that's also less expensive.)
> > > > pt_update() will be called for interleaved region, for example,
> > > > by
> > > > different CPUs:
> > > > 
> > > >                      pt_update():
> > > > CPU1:                                    CPU2:
> > > >  ...                                spin_lock(&xen_pt_lock);
> > > > RISCV_FENCE(rw, rw);                 ....
> > > > 
> > > > /* After this function will be
> > > >    executed the following thing
> > > >    can happen ------------------>  start to update page table
> > > > */                                 entries which was
> > > > partially     
> > > > spin_unlock(&xen_pt_lock);         created during CPU1 but
> > > > CPU2       
> > > > ....                               doesn't know about them
> > > > yet        
> > > > ....                               because flush_tlb() (
> > > > sfence.vma
> > > > ) 
> > > > ....                               wasn't done      
> > > > ....                                                           
> > > >     
> > > >    
> > > > flush_tlb_range_va();
> > > 
> > > Not exactly: CPU2 knows about them as far as the memory used /
> > > modified
> > > goes, and that's all that matters for further page table
> > > modifications.
> > > CPU2 only doesn't know about the new page table entries yet when
> > > it
> > > comes
> > > to using them for a translation (by the hardware page walker).
> > > Yet
> > > this
> > > aspect is irrelevant here, if I'm not mistaken.
> > And it isn't an issue that CPU2 will add these new page table
> > entries
> > again during execution of CPU2's pt_update()?
> 
> Add these page table entries again? That's only going to happen due
> to
> another bug somewhere, I suppose. And it would be as much (or as
> little)
> of an issue if that happened right after dropping the lock.
Yes, agree, it sounds more like a bug. Thanks.

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.