[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wg-openstack] Xen+Openstack CI Log analysis



Joao,
thank you for the great analysis. 
Lars

> On 8 Sep 2015, at 17:34, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
> 
> Hi everyone,
> 
> Below is a small Markdown file containing the analysis to the CI log files 
> which
> I was meant to send as per our last Meeting. I sent this a little later in 
> order
> to be in sync with some patch series I sent, related to some of the issues
> presented here.
> 
> Thanks!
> Joao
> 
> --
> 
> Libvirt+Xen+Openstack CI log analysis
> =====================================
> 
> These are some notes taken regarding the state of our CI loop and what's 
> missing
> to be able to do a full tempest run.
> The CI loop currently comprises at least 1260 tests, having 112 of them 
> skipped.
> So what the logs can tell us?
> 
> Log Stats
> ---------
> 
> For the analysis, I've taken the following test runs #1637 (Jul 39), #1666 
> (Jul
> 30) and #4205 (Sept 8).
> For reference the numbers are from the latest one #4205. There is a difference
> of 3 tests related to Bug entries compared to the earlier other data sets.
> 
> 66 tests for not having Heat(14), Neutron(41), Sahara(7) and Zaqar(4) support
> These appear in the logs with the following format:
> 
>       "XXXX support is required"
>       "XXXX is required"
> 
> 21 tests for not having Trove(4), Neutron(8) and Ironic(9) support, same 
> reason
> as before
> But in the log it has a different format:
> 
>       "XXXX is not available"
>       "XXXX service must be available"
>       "XXXX not available"
> 
> 8 tests are skipped until the following bugs are closed: 1240043[0], 
> 1014647[1],
> 1324348[2], 1310597[3], 1205344[4] 1480490[5], and 2 for 1455043[6]. Note that
> the last three are only present in the #4205 as earlier runs didn't have it.
> Further below there are links to all of the launchpad bugs.
> 
> 4 tests because "Live migration not available".
> 
>       {0}
> tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_iscsi_volume
> ... SKIPPED: Block Live migration not available
>       {0}
> tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration
> ... SKIPPED: Live migration not available
>       {0}
> tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration_paused
> ... SKIPPED: Live migration not available
>       {1} setUpClass
> (tempest.api.compute.test_live_block_migration_negative.LiveBlockMigrationNegativeTestJSON)
> ... SKIPPED: Live migration is not enabled
> 
> And the remaining 13 tests are due to various (and perhaps minor) reasons:
> 
>    "Instance validation tests are disabled" (5 of them)
>    "Change password is not available"
>    "VNC Console feature is disabled"
>    "test_list_servers_detailed_filter_by_image ... SKIPPED: Only one image 
> found"
>    "test_list_servers_filter_by_image ... SKIPPED: Only one image found"
>    "Cinder multi-backend feature disabled" (2 of them)
>    "test_attach_detach_volume ... SKIPPED: SSH required for this test"
>    "large_ops_number not set to multiple instances"
> 
> Summary and Comments
> --------------------
> 
> Below is a small table to facilitate the reading and summarize all of the 
> above.
> 
> | Reason                       |  Nr. of tests |
> |-----------------------------------------------
> | Heat support is required     |       14      |
> | Sahara support is required   |        7      |
> | Zaqar support is required    |        4      |
> | Trove support is required    |        4      |
> | Ironic support is required   |        9      |
> | Neutron support is required  |       49      |
> | Live migration not available |        4      |
> | Bug is open                  |        8      |
> | Miscellaneous                |       13      |
> |                              |               |
> | Total                        |      112      |
> 
> 
> Overall I belive that most of the issues seems to came from the CI setup:
> enabling Trove, Ironic, Heat, Sahara and Zaqar would hopefully get 38 tests
> passing. These components are not exactly Xen related, except Heat that could
> use Ceilometer because of the autoscaling feature. Ceilometer is the 
> statistics
> component and doesn't yet provide all the statistics from the Compute node as 
> is
> later reported below. Furthemore, Neutron is definetely the biggest portion of
> skipped tests comprising a total of 49 tests. So if I made the math right: 
> there
> are a total 87 tests that potentially are solved from enabling components.
> 
> From the last 3 categories, 8 of them are skipped because of opened bug 
> entries.
> To my understanding I think there is only one bug that is Xen related, namely
> Bug #1240043 [0]. libvirt currently lacks support for some statistics (besides
> vCPUs info) which leads to nova not being able to provide instance 
> diagnostics.
> Nova currently uses the following libvirt APIs (Openstack Kilo) to extract
> statistics:
> 
>    virNodeGetCPUStats
>    virDomainGetVcpus
>    virDomainGetCPUStats    (not implemented but only used in Domain-0)
>    virDomainMemoryStats    (not implemented)
>    virDomainInterfaceStats (not implemented)
>    virDomainBlockStats     (not implemented)
>    virDomainGetJobInfo     (not implemented)
>    virDomainGetJobStats    (not implemented)
> 
> The first five functions are used for nova diagnostics (Bug #1240043) and the
> last two are used in the live-migration but only on Kilo. Before Kilo, these
> were not used at all[7]. Early this morning, I submitted one patch series[8]
> that I have been working/testing for a while already and I think it can fix 
> some
> of the issues there.
> 
> From the remaining 17 tests: 13 of them look miscellaneous but the other 4
> live-migration tests looks to be the big fish. I've been testing upstream
> libvirt and Openstack Kilo and I found out that migration doesn't quite work
> without changing the libvirt APIs nova is using. Nova relies on the libvirt
> virDomainMigrateToURI{,2} APIs which depend on P2P/Direct driver support, 
> which
> is inexistent in our libxl driver. So, I've also submitted another series this
> morning to tackle this issue [9] and tested together with the earlier series
> that I mentioned. Since nova doesn't quite handle the case of no support for
> JobInfo/JobStats the live migration monitoring thread crashes and the domain 
> is
> left out eternally with "MIGRATING" state. Thus the two series all fix
> live-migration without having to modify nova. Additionally I have a nova patch
> to fix this erronous behaviour of JobInfo.
> 
> The skipped tests mentions block_live_migration though and this means 
> supporting
> VIR_MIGRATE_NON_SHARED_INC on libvirt which is "migration with non-shared
> storage with incremental copy (same base image shared between source and
> destination)" (quoting the corresponding docs on virsh). BUT, this is not
> required for testing live migration. From the tempest test code it looks that
> block_live_migration is only used _IF_ supported by the test environment which
> is good given that we don't support it.
> 
> Nova and libvirt
> ----------------
> 
> Moving on, I think libvirt looks good on the API side, as we can see from the
> tempest logs. But looking at libvirt interaction on nova, we still these 
> issues:
> 
> * Monitoring of migration still remains to be implemented e.g. knowing how 
> much
> memory was sent out) just to give an idea to the client. Openstack makes use 
> of
> that info (contained in the JobStats) to show the progress of migration.
> 
> * NUMA is not an issue at least from the APIs perspective: it extracts the
> topology **but** it doesn't process any NUMA (and vNUMA-related) XML elements 
> in
> the libvirt guest config. I believe this was pointed out on Jim's XenDevSummit
> presentation this year. Nova only pays atention to the guest config, and does
> not use any of the NUMA-related APIs on libvirt e.g. setNumaParameters and
> getNumaParameters.
> 
> * Last but not least is snapshots which if I am not wrong, it's already a work
> in progress on libxl side [10].
> 
> References
> ----------
> 
> [0] https://bugs.launchpad.net/nova/+bug/1240043
> [1] https://bugs.launchpad.net/tempest/+bug/1014647
> [2] https://bugs.launchpad.net/nova/+bug/1324348
> [3] https://bugs.launchpad.net/swift/+bug/1310597
> [4] https://bugs.launchpad.net/tempest/+bug/1205344
> [5] https://bugs.launchpad.net/ceilometer/+bug/1480490
> [6] https://bugs.launchpad.net/cinder/+bug/1455043
> [7]
> https://review.openstack.org/gitweb?p=openstack%2Fnova.git;a=commitdiff;h=c513c37385eba42d464e81a324f87d1ca9ceaa83
> [8] https://www.redhat.com/archives/libvir-list/2015-September/msg00236.html
> [9] https://www.redhat.com/archives/libvir-list/2015-September/msg00233.html
> [10] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00889.html
> 
> _______________________________________________
> Wg-openstack mailing list
> Wg-openstack@xxxxxxxxxxxxxxxxxxxx
> http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-openstack


_______________________________________________
Wg-openstack mailing list
Wg-openstack@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-openstack


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.