[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null scheduler bug



Thank you for taking your time to deal with this problem.
I did more testing just to be sure and I also measured time (using
stopwatch on my phone which isn't precise at all, just wanted You to
get the feeling of what time intervals are we talking about).
Yes, I can confirm that that situation actually improves with Xen
4.10, which is why I'm going to continue to use it.

With Xen 4.9.2 after I create a guest and destroy it (note that it is
a guest with pass through which blinks GPIO PS LED) I can't re-create
it again. Never. Not even after 30 seconds, 2 minutes, 5 minutes,
etc...

These are testing results with Xen 4.10:

1.) I created a guest, destroyed it and immediately after that tried
to create it again (manualy, over keyboard, for that I need maybe half
a second or a second to hit twice "arrow up" and "enter" buttons on
keyboard) and this shows:

root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
root@uz3eg-iocc-2018-2:~# xl destroy bm1
root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
(XEN) IRQ 48 is already used by domain 27
libxl: error: libxl_create.c:1325:domcreate_launch_dm: Domain
28:failed give domain access to irq 48: Device or resource busy
libxl: error: libxl_domain.c:1000:libxl__destroy_domid: Domain
28:Non-existant domain
libxl: error: libxl_domain.c:959:domain_destroy_callback: Domain
28:Unable to destroy guest
libxl: error: libxl_domain.c:886:domain_destroy_cb: Domain
28:Destruction of domain failed

2.) Here I createed a guest, destroyed it and then immediately ran xl
create twice, fast. For that I also need like half a second or second.
Note that guest isn't in any state, is should be in "running" state
because I need that PS LED to blink.

root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
root@uz3eg-iocc-2018-2:~# xl destroy bm1
root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
(XEN) IRQ 48 is already used by domain 32
libxl: error: libxl_create.c:1325:domcreate_launch_dm: Domain
33:failed give domain access to irq 48: Device or resource busy
libxl: error: libxl_domain.c:1000:libxl__destroy_domid: Domain
33:Non-existant domain
libxl: error: libxl_domain.c:959:domain_destroy_callback: Domain
33:Unable to destroy guest
libxl: error: libxl_domain.c:886:domain_destroy_cb: Domain
33:Destruction of domain failed
root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
root@uz3eg-iocc-2018-2:~# xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                    0     768     1
r-----    1936.2
bm1                                           34     8         1
     ------          0.0

3.) Here I did same thing like in 2.) except I waited 6-7 seconds
after error pops and then ran xl create and guest worked fine (it is
in "running state"), so:

root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
root@uz3eg-iocc-2018-2:~# xl destroy bm1
root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
(XEN) IRQ 48 is already used by domain 57
libxl: error: libxl_create.c:1325:domcreate_launch_dm: Domain
58:failed give domain access to irq 48: Device or resource busy
libxl: error: libxl_domain.c:1000:libxl__destroy_domid: Domain
58:Non-existant domain
libxl: error: libxl_domain.c:959:domain_destroy_callback: Domain
58:Unable to destroy guest
libxl: error: libxl_domain.c:886:domain_destroy_cb: Domain
58:Destruction of domain failed

/* waited for approximately 6-7 seconds and then ran command bellow */

root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
root@uz3eg-iocc-2018-2:~# xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   768     1     r-----    3071.5
bm1                                            59       8     1
r-----          8.2

4.) Here I createed a guest, destroyed it and then waited for
approximately 7 seconds and then ran xl create and everything worked
fine:

root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
root@uz3eg-iocc-2018-2:~# xl destroy bm1

/* waited for approximately 7 seconds and then ran command bellow */

root@uz3eg-iocc-2018-2:~# xl create bm1.cfg
Parsing config from bm1.cfg
root@uz3eg-iocc-2018-2:~# xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   768     1     r-----    3641.1
bm1                                            70       8     1
r-----          7.1

It looks like guest needs approximately 7 seconds to be fully
destroyed and to fully release IRQ.
And yes, if you menage to produce patch I will put it in my source
tree, build with it, test it and send you back the results.
In attachment I included dmesg, xl dmesg from xen 4.10.

On Thu, Sep 13, 2018 at 7:39 PM Dario Faggioli <dfaggioli@xxxxxxxx> wrote:
>
> On Thu, 2018-09-13 at 17:18 +0200, Milan Boberic wrote:
> > Commits are there and I will definitely continue with 4.10 version.
> > But it didn't solve my problem entirely.
> >
> > I create my bare-metal application (with xl create) and destroy it
> > with xl destroy (it disappears from xl list) and when I try to create
> > it again same error pops but if I immediately run xl create command
> > again it creates it without error.
> > If I run xl create twice fast sometimes bare-metal application isn't
> > in any state (it should be in "running" state).
> > If I wait some time (approximately between 30 and 90 seconds) after
> > destruction of that bm app and then run xl create it will create it
> > without error.
> >
> Ok, thanks for trying and reporting back.
>
> If possible, help me understand things a bit better.
>
> So, can you confirm that the situation _actually_improves_ with Xen
> 4.10 ?
>
> Basically, as far as I've understood things, with Xen 4.9, you destroy
> a guest, and you can _never_ re-create it, not even after 30 seconds,
> 90 seconds, 2 minutes, 1 hour, ecc. Is that correct?
>
> With Xen 4.10, it may still fail, if you try to re-create it within ~30
> to 90 seconds, but after that, it works. Is that also correct?
>
> I need to know this, because I want to understand if the issue is, at
> least partially, cured by the RCU fixes, although having to wait 30
> seconds is definitely not what I was expecting (i.e., there might be
> something else).
>
> Another question, in case I manage to produce a debug patch, are you ok
> to put it in your source tree, build with it, and tell us what you see?
>
> Thanks again and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Software Engineer @ SUSE https://www.suse.com/

Attachment: dmesg XEN 4.10.txt
Description: Text document

Attachment: xl dmesg XEN 4.10.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.