On 23/11/13 20:09, Steven Haigh wrote:
> On 24/11/13 07:03, Andrew
Cooper wrote:
>> On 23/11/13 19:56, Steven Haigh wrote:
>>> On 24/11/13 06:38, Steven Haigh wrote:
>>>> On 24/11/13 06:27, Olaf Hering wrote:
>>>>> On Sun, Nov 24, Steven Haigh wrote:
>>>>>
>>>>>> Running Xen 4.2.3 with all the current
XSA fixes.
>>>>>
>>>>> How exactly did you start the guests?
>>>>
>>>> The DomUs were started with: xl create
/etc/xen/<configfile>
>>>>
>>>>> Does 'ps faxu' show qemu processes for the
listed domain_ids?
>>>>> What is the 'xenstore-ls -f | sort' output?
>>>>
>>>> I'll have to check this when I manage to
reproduce it. So far, I have
>>>> been unable to get a reliable way to reproduce
it. I managed to get a
>>>> system to do it every time a HVM DomU was
shutdown OR restarted - but
>>>> after a reboot of the Dom0 I can't get it into
that state again.
>>>>
>>>> As soon as I can get a system in this state
again, I'll leave it to see
>>>> what information I can extract.
>>>
>>> Ha! As always, as soon as I send this, I notice its
happened on a Dom0.
>>>
>>> # xl list
>>> Name ID Mem
VCPUs State
>>> Time(s)
>>> Domain-0 0
1579 2 r-----
>>> 2731.3
>>> planner.vm 1
1013 1 -b----
>>> 189.3
>>> (null) 2
0 1 --psrd
>>> 301.1
>>> tracker.vm 3
1013 2 -b----
>>> 834.4
>>>
>>> Attached is the output of:
>>> # xl debug-keys q
>>> # xl dmesg > xen-dmesg.log
>>> # gzip xen-dmesg.log
>>
>> Ok - from dmesg.
>>
>> (XEN) General information for domain 2:
>> (XEN) refcnt=1 dying=2 pause_count=2
>> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0
paged_pages=0
>> dirty_cpus={} max_pages=262400
>> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219
vm_assist=00000000
>> (XEN) paging assistance: hap refcounts translate
external
>> ...
>> (XEN) Memory pages belonging to domain 2:
>> (XEN) DomPage 00000000000866e0: caf=00000001,
taf=0000000000000000
>> (XEN) DomPage 00000000000866e1: caf=00000001,
taf=0000000000000000
>> (XEN) PoD entries=0 cachesize=0
>>
>>
>> So there are indeed two outstanding pages causing this
domain to become
>> a zombie. They are normal pages, with 1 outstanding ref.
>>
>> Can you collect "xl debug-keys g" as well?
>
> Sure - attached.
(XEN) -------- active -------- -------- shared --------
(XEN) [ref] localdom mfn pin localdom gmfn flags
(XEN) grant-table for remote domain: 2 (v1)
(XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19
(XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19
Ok - so domain 2 has two outstanding grants. This explains why it
is a zombie.
Both these grants are GFT_writing | GFT_reading | GFT_permit_access,
but seemingly unmapped.
I will have to defer to someone who knows the grant code better. Is
it possible for a domain to be a zombie just because it has two
grants it hasn't manually invalidated?
~Andrew
|