[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Wg-openstack] Xen+Openstack CI Log analysis
Joao, thank you for the great analysis. Lars > On 8 Sep 2015, at 17:34, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > Hi everyone, > > Below is a small Markdown file containing the analysis to the CI log files > which > I was meant to send as per our last Meeting. I sent this a little later in > order > to be in sync with some patch series I sent, related to some of the issues > presented here. > > Thanks! > Joao > > -- > > Libvirt+Xen+Openstack CI log analysis > ===================================== > > These are some notes taken regarding the state of our CI loop and what's > missing > to be able to do a full tempest run. > The CI loop currently comprises at least 1260 tests, having 112 of them > skipped. > So what the logs can tell us? > > Log Stats > --------- > > For the analysis, I've taken the following test runs #1637 (Jul 39), #1666 > (Jul > 30) and #4205 (Sept 8). > For reference the numbers are from the latest one #4205. There is a difference > of 3 tests related to Bug entries compared to the earlier other data sets. > > 66 tests for not having Heat(14), Neutron(41), Sahara(7) and Zaqar(4) support > These appear in the logs with the following format: > > "XXXX support is required" > "XXXX is required" > > 21 tests for not having Trove(4), Neutron(8) and Ironic(9) support, same > reason > as before > But in the log it has a different format: > > "XXXX is not available" > "XXXX service must be available" > "XXXX not available" > > 8 tests are skipped until the following bugs are closed: 1240043[0], > 1014647[1], > 1324348[2], 1310597[3], 1205344[4] 1480490[5], and 2 for 1455043[6]. Note that > the last three are only present in the #4205 as earlier runs didn't have it. > Further below there are links to all of the launchpad bugs. > > 4 tests because "Live migration not available". > > {0} > tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_iscsi_volume > ... SKIPPED: Block Live migration not available > {0} > tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration > ... SKIPPED: Live migration not available > {0} > tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration_paused > ... SKIPPED: Live migration not available > {1} setUpClass > (tempest.api.compute.test_live_block_migration_negative.LiveBlockMigrationNegativeTestJSON) > ... SKIPPED: Live migration is not enabled > > And the remaining 13 tests are due to various (and perhaps minor) reasons: > > "Instance validation tests are disabled" (5 of them) > "Change password is not available" > "VNC Console feature is disabled" > "test_list_servers_detailed_filter_by_image ... SKIPPED: Only one image > found" > "test_list_servers_filter_by_image ... SKIPPED: Only one image found" > "Cinder multi-backend feature disabled" (2 of them) > "test_attach_detach_volume ... SKIPPED: SSH required for this test" > "large_ops_number not set to multiple instances" > > Summary and Comments > -------------------- > > Below is a small table to facilitate the reading and summarize all of the > above. > > | Reason | Nr. of tests | > |----------------------------------------------- > | Heat support is required | 14 | > | Sahara support is required | 7 | > | Zaqar support is required | 4 | > | Trove support is required | 4 | > | Ironic support is required | 9 | > | Neutron support is required | 49 | > | Live migration not available | 4 | > | Bug is open | 8 | > | Miscellaneous | 13 | > | | | > | Total | 112 | > > > Overall I belive that most of the issues seems to came from the CI setup: > enabling Trove, Ironic, Heat, Sahara and Zaqar would hopefully get 38 tests > passing. These components are not exactly Xen related, except Heat that could > use Ceilometer because of the autoscaling feature. Ceilometer is the > statistics > component and doesn't yet provide all the statistics from the Compute node as > is > later reported below. Furthemore, Neutron is definetely the biggest portion of > skipped tests comprising a total of 49 tests. So if I made the math right: > there > are a total 87 tests that potentially are solved from enabling components. > > From the last 3 categories, 8 of them are skipped because of opened bug > entries. > To my understanding I think there is only one bug that is Xen related, namely > Bug #1240043 [0]. libvirt currently lacks support for some statistics (besides > vCPUs info) which leads to nova not being able to provide instance > diagnostics. > Nova currently uses the following libvirt APIs (Openstack Kilo) to extract > statistics: > > virNodeGetCPUStats > virDomainGetVcpus > virDomainGetCPUStats (not implemented but only used in Domain-0) > virDomainMemoryStats (not implemented) > virDomainInterfaceStats (not implemented) > virDomainBlockStats (not implemented) > virDomainGetJobInfo (not implemented) > virDomainGetJobStats (not implemented) > > The first five functions are used for nova diagnostics (Bug #1240043) and the > last two are used in the live-migration but only on Kilo. Before Kilo, these > were not used at all[7]. Early this morning, I submitted one patch series[8] > that I have been working/testing for a while already and I think it can fix > some > of the issues there. > > From the remaining 17 tests: 13 of them look miscellaneous but the other 4 > live-migration tests looks to be the big fish. I've been testing upstream > libvirt and Openstack Kilo and I found out that migration doesn't quite work > without changing the libvirt APIs nova is using. Nova relies on the libvirt > virDomainMigrateToURI{,2} APIs which depend on P2P/Direct driver support, > which > is inexistent in our libxl driver. So, I've also submitted another series this > morning to tackle this issue [9] and tested together with the earlier series > that I mentioned. Since nova doesn't quite handle the case of no support for > JobInfo/JobStats the live migration monitoring thread crashes and the domain > is > left out eternally with "MIGRATING" state. Thus the two series all fix > live-migration without having to modify nova. Additionally I have a nova patch > to fix this erronous behaviour of JobInfo. > > The skipped tests mentions block_live_migration though and this means > supporting > VIR_MIGRATE_NON_SHARED_INC on libvirt which is "migration with non-shared > storage with incremental copy (same base image shared between source and > destination)" (quoting the corresponding docs on virsh). BUT, this is not > required for testing live migration. From the tempest test code it looks that > block_live_migration is only used _IF_ supported by the test environment which > is good given that we don't support it. > > Nova and libvirt > ---------------- > > Moving on, I think libvirt looks good on the API side, as we can see from the > tempest logs. But looking at libvirt interaction on nova, we still these > issues: > > * Monitoring of migration still remains to be implemented e.g. knowing how > much > memory was sent out) just to give an idea to the client. Openstack makes use > of > that info (contained in the JobStats) to show the progress of migration. > > * NUMA is not an issue at least from the APIs perspective: it extracts the > topology **but** it doesn't process any NUMA (and vNUMA-related) XML elements > in > the libvirt guest config. I believe this was pointed out on Jim's XenDevSummit > presentation this year. Nova only pays atention to the guest config, and does > not use any of the NUMA-related APIs on libvirt e.g. setNumaParameters and > getNumaParameters. > > * Last but not least is snapshots which if I am not wrong, it's already a work > in progress on libxl side [10]. > > References > ---------- > > [0] https://bugs.launchpad.net/nova/+bug/1240043 > [1] https://bugs.launchpad.net/tempest/+bug/1014647 > [2] https://bugs.launchpad.net/nova/+bug/1324348 > [3] https://bugs.launchpad.net/swift/+bug/1310597 > [4] https://bugs.launchpad.net/tempest/+bug/1205344 > [5] https://bugs.launchpad.net/ceilometer/+bug/1480490 > [6] https://bugs.launchpad.net/cinder/+bug/1455043 > [7] > https://review.openstack.org/gitweb?p=openstack%2Fnova.git;a=commitdiff;h=c513c37385eba42d464e81a324f87d1ca9ceaa83 > [8] https://www.redhat.com/archives/libvir-list/2015-September/msg00236.html > [9] https://www.redhat.com/archives/libvir-list/2015-September/msg00233.html > [10] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00889.html > > _______________________________________________ > Wg-openstack mailing list > Wg-openstack@xxxxxxxxxxxxxxxxxxxx > http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-openstack _______________________________________________ Wg-openstack mailing list Wg-openstack@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-openstack
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |