Xen project Mailing List

Re: [Xen-devel] [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

To: "Yang, Philip" <Philip.Yang@xxxxxxx>, Jerome Glisse <jglisse@xxxxxxxxxx>

From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>

Date: Fri, 1 Nov 2019 18:21:06 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kF8vKQntyNYsBxGxQvVYyiW6RqO65RRaLmW947DAoGA=; b=AhHFdRgY7dikywn45HiI86lXdZsi8WgiR1KDQF4Mbn45zN6c0oDLLcunQndOpH7nGDjXM4OGYb4S/8phuQNQZmkFPG6quA0nbaItJ3zyoNRctekeje65rfs7WAyT6U/TJUGJWhD+LUdc3UWns7xwYJfHVAfaHzKj2EGoykuXl3kPewVFBLhrbqBJnXRbRflO0/hfRO+qV2qf4E85/GvQvkK8ZBYfFi+Ru/rH4q48Q6THQQhCu4trzuf8GYAIfZKC2PVAK+zVocCP83fdRY051vj2RAzHZRmcOb4QfKfdH76FT2cx6cd7s0h972A5q6GHVxj8z6nL/CO+A12zVivs7g==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hkIlIG85b9ACWFy9reH0jRILGQNySHdqVQR64w/uzKYm5GTkz835FjgrwshRGgyzWxwcth4ipmQVBPWuK20ngr/wuRwmetgQEEvp0EtA06x62ZO+oUChlfAfvzwCdJR4HQS+0uzBmyligOF4srbHmkBfGN2cxTU+E5ocJRJ++TcbuaUlXhzagnsYJ5+ul08y845BF1R10DUhgpEpIFHuBMyM/9rVN/guw5xO4fuXTg9gemhCaSQQZaxKlZR1m57LMhYk+BcRp3s8l1fA8pBsOVg7RDZlGiJluzL2zyuKtm8VVUIbjbrjNo4zuyfHFJ4P/xfZ06tau2fvklClBEV1NQ==

Authentication-results: spf=none (sender IP is ) smtp.mailfrom=jgg@xxxxxxxxxxxx;

Delivery-date: Fri, 01 Nov 2019 18:21:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHVjcvOUfhzqykxXkO0v7SQaQq3BKdyANqAgAAA3wCABGiEgIAAPGcA

Thread-topic: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

On Fri, Nov 01, 2019 at 02:44:51PM +0000, Yang, Philip wrote: > @@ -854,12 +853,20 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, > struct page **pages) > r = -EPERM; > goto out_unlock; > } > + up_read(&mm->mmap_sem); > + timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); > + > +retry: > + range->notifier_seq = mmu_range_read_begin(&bo->notifier); > > + down_read(&mm->mmap_sem); > r = hmm_range_fault(range, 0); > up_read(&mm->mmap_sem); > - > - if (unlikely(r < 0)) > + if (unlikely(r <= 0)) { > + if ((r == 0 || r == -EBUSY) && !time_after(jiffies, timeout)) > + goto retry; > goto out_free_pfns; > + } I was reflecting on why this suddently became necessary, and I think what might be happening is that hmm_range_fault() is trigging invalidations as it runs (ie it is faulting in pages or something) and that in turn causes the mrn to need retry. The hmm version of this had a bug where a full invalidate_range_start/end pair would not trigger retry, so this this didn't happen. This is unfortunate as the retry is unnecessary, but at this time I can't think of a good way to separate an ignorable synchronous invalidation caused by hmm_range_fault from an async one that cannot be ignored.. A basic fix would be to not update the mrq seq in the notifier if the invalidate is triggered by hmm_range_fault, but that seems difficult to determine.. Any thoughts Jerome? Jason _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.