Xen project Mailing List

RE: oxenstored performance issue when starting VMs in parallel

To: "'Edwin Torok'" <edvin.torok@xxxxxxxxxx>, <sstabellini@xxxxxxxxxx>, "'Anthony Perard'" <anthony.perard@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: Paul Durrant <xadimgnik@xxxxxxxxx>

Date: Tue, 22 Sep 2020 14:42:22 +0100

Cc: <xen-users@xxxxxxxxxxxxxxxxxxxx>, <jerome.leseinne@xxxxxxxxx>, <julien@xxxxxxx>

Delivery-date: Tue, 22 Sep 2020 14:04:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQIMrhWN1wQeREaSEc+MmVb+KxeEJQB0POE8AQkglPkA0UpCvgG7bFMSqOgelyA=

> -----Original Message----- > From: Edwin Torok <edvin.torok@xxxxxxxxxx> > Sent: 22 September 2020 14:29 > To: sstabellini@xxxxxxxxxx; Anthony Perard <anthony.perard@xxxxxxxxxx>; xen- > devel@xxxxxxxxxxxxxxxxxxxx; paul@xxxxxxx > Cc: xen-users@xxxxxxxxxxxxxxxxxxxx; jerome.leseinne@xxxxxxxxx; julien@xxxxxxx > Subject: Re: oxenstored performance issue when starting VMs in parallel > > On Tue, 2020-09-22 at 15:17 +0200, jerome leseinne wrote: > > Hi, > > > > Edwin you rock ! This call in qemu is effectively the culprit ! > > I have disabled this xen_bus_add_watch call and re-run test on our > > big server: > > > > - oxenstored is now between 10% to 20% CPU usage (previously was > > 100% all the time) > > - All our VMs are responsive > > - All our VM start in less than 10 seconds (before the fix some VMs > > could take more than one minute to be fully up > > - Dom0 is more responsive > > > > Disabling the watch may not be the ideal solution ( I let the qemu > > experts answer this and the possible side effects), > > Hi, > > CC-ed Qemu maintainer of Xen code, please see this discussion about > scalability issues with the backend watching code in qemu 4.1+. > > I think the scalability issue is due to this code in qemu, which causes > an instance of qemu to see watches from all devices (even those > belonging to other qemu instances), such that adding a single device > causes N watches to be fired on each N instances of qemu: > xenbus->backend_watch = > xen_bus_add_watch(xenbus, "", /* domain root node */ > "backend", xen_bus_backend_changed, > &local_err); > > I can understand that for backwards compatibility you might need this > code, but is there a way that an up-to-date (xl) toolstack could tell > qemu what it needs to look at (e.g. via QMP, or other keys in xenstore) > instead of relying on an overly broad watch? I think this could be made more efficient. The call to "module_call_init(MODULE_INIT_XEN_BACKEND)" just prior to this watch will register backends that do auto-creation so we could register individual watches for the various backend types instead of this single one. Paul > > Best regards, > --Edwin > > > > but in our > > scenario and usage this fixes the problem and dramatically boosts the > > performance. > > > > So far we haven't seen any side effect, all the xl orders are ok, the > > VMs are fully functional, no devices leak (like network vif for > > exemple) and once all the VMs are down a call to xenstore-ls show > > that > > the store is indeed empty (dom0 excluded) > > > > We will continue additional testing and stress but in all cases a > > huge thanks to you and Julien for your help on this issue ! > > > > Jerome > > > > Le lun. 21 sept. 2020 à 18:57, Edwin Torok <edvin.torok@xxxxxxxxxx> a > > écrit : > > > On Mon, 2020-09-21 at 17:40 +0100, Julien Grall wrote: > > > > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open > > > > attachments unless you have verified the sender and know the > > > > content > > > > is safe. > > > > > > > > On 21/09/2020 14:05, jerome leseinne wrote: > > > > > Hello, > > > > > > > > Hello, > > > > > > > > I am only CCing Edwin who is working on OXenStored. Hopefully, he > > > > will > > > > be able to give you some pointers. > > > > > > > > > We are developing a solution based on Xen 4.13 who is > > > > > constantly > > > > > creating / destroying VMs. > > > > > > > > > > To summarize our lifecycle : > > > > > > > > > > - xl restore vmX > > > > > - xl cd-insert .... > > > > > - We do our stuff for ~ 2 minutes > > > > > - xl destroy vmX > > > > > > > > > > So our VMs have a life of approximately 2 minutes. > > > > > > > > > > The number of VMs we ran in parallel depends on the underlying > > > > > server. > > > > > > > > > > We are seeing the issue with our larger server who is running > > > > > 30 > > > > > VMs > > > > > (HVM) in parallel. > > > > > > > > > > On this server oxenstored is constantly running at 100% cpu > > > > > usage > > > > > and > > > > > some VMs are almost stucked or unresponsive. > > > > > > > > > > This is not an hardware issue, 72 xeon cores, 160 GB of memory > > > > > and > > > > > very fast I/O subsystem. > > > > > Everything else is running smoothly on the server. > > > > > > > > > > what we witness in the xenstore-access.log is that the number > > > > > of > > > > > WATCH > > > > > event is matching the number of currently running VMs > > > > > > > > > > so for example for a single WRITE event is followed by around > > > > > 30 > > > > > watch events : > > > > > > > > > > [20200918T15:15:18.045Z] A41354 write > > > > > /local/domain/0/backend/qdisk/1311/5632 > > > > > [20200918T15:15:18.046Z] A41248 w event > > > > > backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed > > > > > [20200918T15:15:18.046Z] A41257 w event > > > > > backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288 > > > > > [20200918T15:15:18.046Z] A40648 w event > > > > > backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc > > > > > [20200918T15:15:18.046Z] A40542 w event > > > > > > Hi, > > > > > > All the A<id> here point to another unix domain socket. They have > > > all > > > registered a watch on this key (or a parent). > > > > > > To find out what key they are watching you can look for the watch > > > registration: > > > grep 'A41248 .*watch' /var/log/xenstored-access.log > > > > > > You can find out all xenstore clients connected to oxenstored, find > > > all > > > the numbers in the port column here: > > > ss -a --unix -p|grep xenstored > > > Then look up those numbers, e.g. assuming you see 22269: > > > ss -a --unix -p|grep 22269 > > > > > > This should tell you what the processes are. > > > > > > I have a suspicion that it is related to this code in qemu: > > > > > > xenbus->backend_watch = > > > xen_bus_add_watch(xenbus, "", /* domain root node */ > > > "backend", xen_bus_backend_changed, > > > &local_err); > > > > > > If I read the code right this seems to set up a watch on > > > /local/domain/0/backend, so if you have N qemus running (one for > > > each > > > disk) then you get N^2 watch events fired in total. > > > > > > Best regards, > > > --Edwin > > > > > > > > backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c > > > > > [20200918T15:15:18.046Z] A41141 w event > > > > > backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2 > > > > > [20200918T15:15:18.046Z] A41310 w event > > > > > backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f > > > > > [20200918T15:15:18.046Z] A40660 w event > > > > > backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c > > > > > [20200918T15:15:18.046Z] A41347 w event > > > > > backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d > > > > > [20200918T15:15:18.046Z] A41015 w event > > > > > backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e > > > > > [20200918T15:15:18.046Z] A41167 w event > > > > > backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6 > > > > > [20200918T15:15:18.046Z] A41100 w event > > > > > backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df > > > > > [20200918T15:15:18.046Z] A41147 w event > > > > > backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a > > > > > [20200918T15:15:18.046Z] A41305 w event > > > > > backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937 > > > > > [20200918T15:15:18.046Z] A40616 w event > > > > > backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d > > > > > [20200918T15:15:18.046Z] A40951 w event > > > > > backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978 > > > > > [20200918T15:15:18.046Z] A40567 w event > > > > > backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93 > > > > > [20200918T15:15:18.046Z] A41363 w event > > > > > backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6 > > > > > [20200918T15:15:18.046Z] A40740 w event > > > > > backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f > > > > > [20200918T15:15:18.046Z] A40632 w event > > > > > backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353 > > > > > [20200918T15:15:18.046Z] A41319 w event > > > > > backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819 > > > > > [20200918T15:15:18.046Z] A40677 w event > > > > > backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238 > > > > > [20200918T15:15:18.046Z] A41152 w event > > > > > backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885 > > > > > [20200918T15:15:18.047Z] A41233 w event > > > > > backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9 > > > > > [20200918T15:15:18.047Z] A41069 w event > > > > > backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79 > > > > > [20200918T15:15:18.047Z] A41287 w event > > > > > backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a > > > > > [20200918T15:15:18.047Z] A40683 w event > > > > > backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d > > > > > [20200918T15:15:18.047Z] A41177 w event > > > > > backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29 > > > > > [20200918T15:15:18.047Z] D0 w event > > > > > backend/qdisk/1311/5632 FFFFFFFF8276B520 > > > > > [20200918T15:15:18.047Z] A40513 w event > > > > > backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f > > > > > [20200918T15:15:18.047Z] A41354 w event > > > > > backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6 > > > > > > > > > > We are not sure this is the root cause of the issue but this is > > > > > the > > > > > only real difference we can see in the log. > > > > > > > > > > We don't understand why the number of WATCH events is related > > > > > to > > > > > the > > > > > number of concurrent running VM. > > > > > A watch event should be registered and only fired for the > > > > > current > > > > > domain ID, so a write for a specific node path should only > > > > > trigger > > > > > one > > > > > watch event and not 30 in our case. > > > > > > > > > > Any ideas / comments ? > > > > > > > > Cheers, > > > >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.