[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Wg-openstack] Xen+Openstack CI Log analysis
Hi everyone, Below is a small Markdown file containing the analysis to the CI log files which I was meant to send as per our last Meeting. I sent this a little later in order to be in sync with some patch series I sent, related to some of the issues presented here. Thanks! Joao -- Libvirt+Xen+Openstack CI log analysis ===================================== These are some notes taken regarding the state of our CI loop and what's missing to be able to do a full tempest run. The CI loop currently comprises at least 1260 tests, having 112 of them skipped. So what the logs can tell us? Log Stats --------- For the analysis, I've taken the following test runs #1637 (Jul 39), #1666 (Jul 30) and #4205 (Sept 8). For reference the numbers are from the latest one #4205. There is a difference of 3 tests related to Bug entries compared to the earlier other data sets. 66 tests for not having Heat(14), Neutron(41), Sahara(7) and Zaqar(4) support These appear in the logs with the following format: "XXXX support is required" "XXXX is required" 21 tests for not having Trove(4), Neutron(8) and Ironic(9) support, same reason as before But in the log it has a different format: "XXXX is not available" "XXXX service must be available" "XXXX not available" 8 tests are skipped until the following bugs are closed: 1240043[0], 1014647[1], 1324348[2], 1310597[3], 1205344[4] 1480490[5], and 2 for 1455043[6]. Note that the last three are only present in the #4205 as earlier runs didn't have it. Further below there are links to all of the launchpad bugs. 4 tests because "Live migration not available". {0} tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_iscsi_volume ... SKIPPED: Block Live migration not available {0} tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration ... SKIPPED: Live migration not available {0} tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration_paused ... SKIPPED: Live migration not available {1} setUpClass (tempest.api.compute.test_live_block_migration_negative.LiveBlockMigrationNegativeTestJSON) ... SKIPPED: Live migration is not enabled And the remaining 13 tests are due to various (and perhaps minor) reasons: "Instance validation tests are disabled" (5 of them) "Change password is not available" "VNC Console feature is disabled" "test_list_servers_detailed_filter_by_image ... SKIPPED: Only one image found" "test_list_servers_filter_by_image ... SKIPPED: Only one image found" "Cinder multi-backend feature disabled" (2 of them) "test_attach_detach_volume ... SKIPPED: SSH required for this test" "large_ops_number not set to multiple instances" Summary and Comments -------------------- Below is a small table to facilitate the reading and summarize all of the above. | Reason | Nr. of tests | |----------------------------------------------- | Heat support is required | 14 | | Sahara support is required | 7 | | Zaqar support is required | 4 | | Trove support is required | 4 | | Ironic support is required | 9 | | Neutron support is required | 49 | | Live migration not available | 4 | | Bug is open | 8 | | Miscellaneous | 13 | | | | | Total | 112 | Overall I belive that most of the issues seems to came from the CI setup: enabling Trove, Ironic, Heat, Sahara and Zaqar would hopefully get 38 tests passing. These components are not exactly Xen related, except Heat that could use Ceilometer because of the autoscaling feature. Ceilometer is the statistics component and doesn't yet provide all the statistics from the Compute node as is later reported below. Furthemore, Neutron is definetely the biggest portion of skipped tests comprising a total of 49 tests. So if I made the math right: there are a total 87 tests that potentially are solved from enabling components. From the last 3 categories, 8 of them are skipped because of opened bug entries. To my understanding I think there is only one bug that is Xen related, namely Bug #1240043 [0]. libvirt currently lacks support for some statistics (besides vCPUs info) which leads to nova not being able to provide instance diagnostics. Nova currently uses the following libvirt APIs (Openstack Kilo) to extract statistics: virNodeGetCPUStats virDomainGetVcpus virDomainGetCPUStats (not implemented but only used in Domain-0) virDomainMemoryStats (not implemented) virDomainInterfaceStats (not implemented) virDomainBlockStats (not implemented) virDomainGetJobInfo (not implemented) virDomainGetJobStats (not implemented) The first five functions are used for nova diagnostics (Bug #1240043) and the last two are used in the live-migration but only on Kilo. Before Kilo, these were not used at all[7]. Early this morning, I submitted one patch series[8] that I have been working/testing for a while already and I think it can fix some of the issues there. From the remaining 17 tests: 13 of them look miscellaneous but the other 4 live-migration tests looks to be the big fish. I've been testing upstream libvirt and Openstack Kilo and I found out that migration doesn't quite work without changing the libvirt APIs nova is using. Nova relies on the libvirt virDomainMigrateToURI{,2} APIs which depend on P2P/Direct driver support, which is inexistent in our libxl driver. So, I've also submitted another series this morning to tackle this issue [9] and tested together with the earlier series that I mentioned. Since nova doesn't quite handle the case of no support for JobInfo/JobStats the live migration monitoring thread crashes and the domain is left out eternally with "MIGRATING" state. Thus the two series all fix live-migration without having to modify nova. Additionally I have a nova patch to fix this erronous behaviour of JobInfo. The skipped tests mentions block_live_migration though and this means supporting VIR_MIGRATE_NON_SHARED_INC on libvirt which is "migration with non-shared storage with incremental copy (same base image shared between source and destination)" (quoting the corresponding docs on virsh). BUT, this is not required for testing live migration. From the tempest test code it looks that block_live_migration is only used _IF_ supported by the test environment which is good given that we don't support it. Nova and libvirt ---------------- Moving on, I think libvirt looks good on the API side, as we can see from the tempest logs. But looking at libvirt interaction on nova, we still these issues: * Monitoring of migration still remains to be implemented e.g. knowing how much memory was sent out) just to give an idea to the client. Openstack makes use of that info (contained in the JobStats) to show the progress of migration. * NUMA is not an issue at least from the APIs perspective: it extracts the topology **but** it doesn't process any NUMA (and vNUMA-related) XML elements in the libvirt guest config. I believe this was pointed out on Jim's XenDevSummit presentation this year. Nova only pays atention to the guest config, and does not use any of the NUMA-related APIs on libvirt e.g. setNumaParameters and getNumaParameters. * Last but not least is snapshots which if I am not wrong, it's already a work in progress on libxl side [10]. References ---------- [0] https://bugs.launchpad.net/nova/+bug/1240043 [1] https://bugs.launchpad.net/tempest/+bug/1014647 [2] https://bugs.launchpad.net/nova/+bug/1324348 [3] https://bugs.launchpad.net/swift/+bug/1310597 [4] https://bugs.launchpad.net/tempest/+bug/1205344 [5] https://bugs.launchpad.net/ceilometer/+bug/1480490 [6] https://bugs.launchpad.net/cinder/+bug/1455043 [7] https://review.openstack.org/gitweb?p=openstack%2Fnova.git;a=commitdiff;h=c513c37385eba42d464e81a324f87d1ca9ceaa83 [8] https://www.redhat.com/archives/libvir-list/2015-September/msg00236.html [9] https://www.redhat.com/archives/libvir-list/2015-September/msg00233.html [10] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00889.html _______________________________________________ Wg-openstack mailing list Wg-openstack@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-openstack
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |