[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 3/3] xen: use idle vcpus to scrub pages
On 07/23/2014 08:38 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 15, 2014 at 05:16:33PM +0800, Bob Liu wrote: >> Hi Jan & Konrad, >> >> On 07/01/2014 08:59 PM, Jan Beulich wrote: >>>>>> On 01.07.14 at 14:25, <bob.liu@xxxxxxxxxx> wrote: >>>> On 07/01/2014 05:12 PM, Jan Beulich wrote: >>>>>>>> On 30.06.14 at 15:39, <lliubbo@xxxxxxxxx> wrote: >>>>>> @@ -948,6 +954,7 @@ static void free_heap_pages( >>>>>> { >>>>>> if ( !tainted ) >>>>>> { >>>>>> + node_need_scrub[node] = 1; >>>>>> for ( i = 0; i < (1 << order); i++ ) >>>>>> pg[i].count_info |= PGC_need_scrub; >>>>>> } >>>>> >>>>> Iirc it was more than this single place where you set >>>>> PGC_need_scrub, and hence where you'd now need to set the >>>>> other flag too. >>>>> >>>> >>>> I'm afraid this is the only place where PGC_need_scrub was set. >>> >>> Ah, indeed - I misremembered others, they are all tests for the flag. >>> >>>> I'm sorry for all of the coding style problems. >>>> >>>> By the way is there any script which can be used to check the code >>>> before submitting? Something like ./scripts/checkpatch.pl under linux. >>> >>> No, there isn't. But avoiding (or spotting) hard tabs should be easy >>> enough, and other things you ought to simply inspect your patch for >>> - after all that's no different from what reviewers do. >>> >>>>>> + } >>>>>> + >>>>>> + /* free percpu free list */ >>>>>> + if ( !page_list_empty(local_free_list) ) >>>>>> + { >>>>>> + spin_lock(&heap_lock); >>>>>> + page_list_for_each_safe( pg, tmp, local_free_list ) >>>>>> + { >>>>>> + order = PFN_ORDER(pg); >>>>>> + page_list_del(pg, local_free_list); >>>>>> + for ( i = 0; i < (1 << order); i++ ) >>>>>> + { >>>>>> + pg[i].count_info |= PGC_state_free; >>>>>> + pg[i].count_info &= ~PGC_need_scrub; >>>>> >>>>> This needs to happen earlier - the scrub flag should be cleared right >>>>> after scrubbing, and the free flag should imo be set when the page >>>>> gets freed. That's for two reasons: >>>>> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can >>>>> allocate memory regardless of the scrub flag's state. >>>> >>>> AFAIR, the reason I set those flags here is to avoid a panic happen. >>> >>> That's pretty vague a statement. >>> >>>>> 2) You still detain the memory on the local lists from allocation. On a >>>>> many-node system, the 16Mb per node can certainly sum up (which >>>>> is not to say that I don't view the 16Mb on a single node as already >>>>> problematic). >>>> >>>> Right, but we can adjust SCRUB_BATCH_ORDER. >>>> Anyway I'll take a retry as you suggested. >>> >>> You should really drop the idea of removing pages temporarily. >>> All you need to do is make sure a page being allocated and getting >>> simultaneously scrubbed by another CPU won't get passed to the >>> caller until the scrubbing finished. In particular it's no problem if >>> the allocating CPU occasionally ends up scrubbing a page already >>> being scrubbed elsewhere. >>> >> >> After so many days I haven't make a workable solution if don't remove >> pages temporarily. The hardest part is iterating the heap free list >> without holding heap_lock because if holding the lock it might be heavy >> lock contention. > > Why do you need to hold on the heap_lock? > > I presume that the flag changes need an cmpxchg operation. In which case > could you add another flag that would be 'PG_scrubbing_now` - and the > allocator could bypass all of the pages that have said flag (and also > PG_need_scrub). That's what I'm struggling with these days. > > If the allocator can't find enough pages to satisfy the demands - it goes > back in and looks at PG_need_scrub pages and scrubs those. > > In parallel, another thread (threads?) pick the ones that have PG_need_scrub > and change them to PG_scrubbing_now and scrubs them. When it has > completed it removes both flags. > But there is still race condition: A:alloc path B:idle scrub path page = list_entry(&heap_list); ^^^ Delist the page from heap_list (Can happen any time) if (PG_need_scrub) set PG_scrubbing_now scrub and clear both flags page = page_list_next(page); ^^^ Get panic Another problem Jan pointed out before is if order=20 and being scrubbed in idle vcpus, alloc path will fail because of the PG_scrubbing_now flag. This will introduce a long time we have large memory but can't allocate any page. >> So do you think it's acceptable if fixed all other concerns about this >> patch? >> >> And there is another idea based on the assumption that only guest >> allocation need all pages to be scrubbed(It's safe for xen hypervisor to >> use an un-scrubbed page): >> 1. set the need_scrub flag in free_heap_pages(). >> 2. alloc pages from heap free list, don't scrub and neither clear the >> need_scrub flag. >> 3. When the guest accessing the real memory(machine pages) the first >> time, page fault should happen. We track this event and before build the > > I think you mean that we can set the EPT (or shadow) all of those > PG_need_scrub No, I mean at the point when setting the EPT or shadow page table entry, scrub the related machine page if it has the PG_need_scrub flag. I think in Pod mode the place is in p2m_set_entry(), scrub the mfn related page if PG_need_scrub before we set up the entry. > pages to be r/o. And then when the guest access them we would trap in > the hypervisor and scrub them. However, that means the guest can > still read those pages without trapping - and that means we might > be reading sensitive data from pages that came from another guest! > > >> right PFN to MFN page table mappings(correct me if it doesn't work as >> this), at this place scrub the page if the need_scrub flag was set. >> >> By this way only guest real used pages are needed scrubbing and the time >> is scattered. >> How do you think of this solution? -- Regards, -Bob _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |