[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: oxenstored performance issue when starting VMs in parallel
we tested on the latest 4.14, same issue. we tried a oxenstored replacement using lixs : https://github.com/cnplab/lixs This basically solves the problem, no more 100% CPU (or only a few spikes) , all the VMs are responsive! One problem though, everything works fine except during the "xl destroy", xl is complaining it cannot delete the VIF interface, so there is a VIF leakage which lead, after a few hours / days, to some issue with the dom0 complaining about network interface and has to be rebooted.... So lixs is not a solution and it is no longer in active maintenance/evolution since 4 years. A supported Xen solution/workaround would be better... Jerome Le lun. 21 sept. 2020 à 17:25, Fanny Dwargee <fdwargee6@xxxxxxxxx> a écrit : > > > > El lun., 21 sept. 2020 a las 15:10, jerome leseinne > (<jerome.leseinne@xxxxxxxxx>) escribió: >> >> Hello, >> >> We are developing a solution based on Xen 4.13 who is constantly >> creating / destroying VMs. >> >> To summarize our lifecycle : >> >> - xl restore vmX >> - xl cd-insert .... >> - We do our stuff for ~ 2 minutes >> - xl destroy vmX >> >> So our VMs have a life of approximately 2 minutes. >> >> The number of VMs we ran in parallel depends on the underlying server. >> >> We are seeing the issue with our larger server who is running 30 VMs >> (HVM) in parallel. >> >> On this server oxenstored is constantly running at 100% cpu usage and >> some VMs are almost stucked or unresponsive. >> >> This is not an hardware issue, 72 xeon cores, 160 GB of memory and >> very fast I/O subsystem. >> Everything else is running smoothly on the server. >> >> what we witness in the xenstore-access.log is that the number of WATCH >> event is matching the number of currently running VMs >> >> so for example for a single WRITE event is followed by around 30 watch >> events : >> >> [20200918T15:15:18.045Z] A41354 write >> /local/domain/0/backend/qdisk/1311/5632 >> [20200918T15:15:18.046Z] A41248 w event >> backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed >> [20200918T15:15:18.046Z] A41257 w event >> backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288 >> [20200918T15:15:18.046Z] A40648 w event >> backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc >> [20200918T15:15:18.046Z] A40542 w event >> backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c >> [20200918T15:15:18.046Z] A41141 w event >> backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2 >> [20200918T15:15:18.046Z] A41310 w event >> backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f >> [20200918T15:15:18.046Z] A40660 w event >> backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c >> [20200918T15:15:18.046Z] A41347 w event >> backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d >> [20200918T15:15:18.046Z] A41015 w event >> backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e >> [20200918T15:15:18.046Z] A41167 w event >> backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6 >> [20200918T15:15:18.046Z] A41100 w event >> backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df >> [20200918T15:15:18.046Z] A41147 w event >> backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a >> [20200918T15:15:18.046Z] A41305 w event >> backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937 >> [20200918T15:15:18.046Z] A40616 w event >> backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d >> [20200918T15:15:18.046Z] A40951 w event >> backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978 >> [20200918T15:15:18.046Z] A40567 w event >> backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93 >> [20200918T15:15:18.046Z] A41363 w event >> backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6 >> [20200918T15:15:18.046Z] A40740 w event >> backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f >> [20200918T15:15:18.046Z] A40632 w event >> backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353 >> [20200918T15:15:18.046Z] A41319 w event >> backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819 >> [20200918T15:15:18.046Z] A40677 w event >> backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238 >> [20200918T15:15:18.046Z] A41152 w event >> backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885 >> [20200918T15:15:18.047Z] A41233 w event >> backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9 >> [20200918T15:15:18.047Z] A41069 w event >> backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79 >> [20200918T15:15:18.047Z] A41287 w event >> backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a >> [20200918T15:15:18.047Z] A40683 w event >> backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d >> [20200918T15:15:18.047Z] A41177 w event >> backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29 >> [20200918T15:15:18.047Z] D0 w event >> backend/qdisk/1311/5632 FFFFFFFF8276B520 >> [20200918T15:15:18.047Z] A40513 w event >> backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f >> [20200918T15:15:18.047Z] A41354 w event >> backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6 >> >> We are not sure this is the root cause of the issue but this is the >> only real difference we can see in the log. >> >> We don't understand why the number of WATCH events is related to the >> number of concurrent running VM. >> A watch event should be registered and only fired for the current >> domain ID, so a write for a specific node path should only trigger one >> watch event and not 30 in our case. >> >> Any ideas / comments ? >> >> Thanks >> >> Jerome Leseinne >> > > Jerome, > we are experiencing very similar issues in Xen v4.12.3 (Debian 10.4) with a > similar setup (128GB RAM, 48 cores), in our case we start and stop dozens of > HVM VMs in parallel using restore from a memory saved file and analyzing > automatically a software behaviour inside the guest during a few minutes. > > Any ideas/comments for improving the oxenstore performance will be very > welcome. >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |