[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wg-openstack] Xen+Openstack CI Log analysis



Hi everyone,

Below is a small Markdown file containing the analysis to the CI log files which
I was meant to send as per our last Meeting. I sent this a little later in order
to be in sync with some patch series I sent, related to some of the issues
presented here.

Thanks!
Joao

--

Libvirt+Xen+Openstack CI log analysis
=====================================

These are some notes taken regarding the state of our CI loop and what's missing
to be able to do a full tempest run.
The CI loop currently comprises at least 1260 tests, having 112 of them skipped.
So what the logs can tell us?

Log Stats
---------

For the analysis, I've taken the following test runs #1637 (Jul 39), #1666 (Jul
30) and #4205 (Sept 8).
For reference the numbers are from the latest one #4205. There is a difference
of 3 tests related to Bug entries compared to the earlier other data sets.

66 tests for not having Heat(14), Neutron(41), Sahara(7) and Zaqar(4) support
These appear in the logs with the following format:

        "XXXX support is required"
        "XXXX is required"

21 tests for not having Trove(4), Neutron(8) and Ironic(9) support, same reason
as before
But in the log it has a different format:

        "XXXX is not available"
        "XXXX service must be available"
        "XXXX not available"

8 tests are skipped until the following bugs are closed: 1240043[0], 1014647[1],
1324348[2], 1310597[3], 1205344[4] 1480490[5], and 2 for 1455043[6]. Note that
the last three are only present in the #4205 as earlier runs didn't have it.
Further below there are links to all of the launchpad bugs.

4 tests because "Live migration not available".

        {0}
tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_iscsi_volume
... SKIPPED: Block Live migration not available
        {0}
tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration
... SKIPPED: Live migration not available
        {0}
tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration_paused
... SKIPPED: Live migration not available
        {1} setUpClass
(tempest.api.compute.test_live_block_migration_negative.LiveBlockMigrationNegativeTestJSON)
... SKIPPED: Live migration is not enabled

And the remaining 13 tests are due to various (and perhaps minor) reasons:

    "Instance validation tests are disabled" (5 of them)
    "Change password is not available"
    "VNC Console feature is disabled"
    "test_list_servers_detailed_filter_by_image ... SKIPPED: Only one image 
found"
    "test_list_servers_filter_by_image ... SKIPPED: Only one image found"
    "Cinder multi-backend feature disabled" (2 of them)
    "test_attach_detach_volume ... SKIPPED: SSH required for this test"
    "large_ops_number not set to multiple instances"

Summary and Comments
--------------------

Below is a small table to facilitate the reading and summarize all of the above.

| Reason                       |  Nr. of tests |
|-----------------------------------------------
| Heat support is required     |       14      |
| Sahara support is required   |        7      |
| Zaqar support is required    |        4      |
| Trove support is required    |        4      |
| Ironic support is required   |        9      |
| Neutron support is required  |       49      |
| Live migration not available |        4      |
| Bug is open                  |        8      |
| Miscellaneous                |       13      |
|                              |               |
| Total                        |      112      |


Overall I belive that most of the issues seems to came from the CI setup:
enabling Trove, Ironic, Heat, Sahara and Zaqar would hopefully get 38 tests
passing. These components are not exactly Xen related, except Heat that could
use Ceilometer because of the autoscaling feature. Ceilometer is the statistics
component and doesn't yet provide all the statistics from the Compute node as is
later reported below. Furthemore, Neutron is definetely the biggest portion of
skipped tests comprising a total of 49 tests. So if I made the math right: there
are a total 87 tests that potentially are solved from enabling components.

From the last 3 categories, 8 of them are skipped because of opened bug entries.
To my understanding I think there is only one bug that is Xen related, namely
Bug #1240043 [0]. libvirt currently lacks support for some statistics (besides
vCPUs info) which leads to nova not being able to provide instance diagnostics.
Nova currently uses the following libvirt APIs (Openstack Kilo) to extract
statistics:

    virNodeGetCPUStats
    virDomainGetVcpus
    virDomainGetCPUStats    (not implemented but only used in Domain-0)
    virDomainMemoryStats    (not implemented)
    virDomainInterfaceStats (not implemented)
    virDomainBlockStats     (not implemented)
    virDomainGetJobInfo     (not implemented)
    virDomainGetJobStats    (not implemented)

The first five functions are used for nova diagnostics (Bug #1240043) and the
last two are used in the live-migration but only on Kilo. Before Kilo, these
were not used at all[7]. Early this morning, I submitted one patch series[8]
that I have been working/testing for a while already and I think it can fix some
of the issues there.

From the remaining 17 tests: 13 of them look miscellaneous but the other 4
live-migration tests looks to be the big fish. I've been testing upstream
libvirt and Openstack Kilo and I found out that migration doesn't quite work
without changing the libvirt APIs nova is using. Nova relies on the libvirt
virDomainMigrateToURI{,2} APIs which depend on P2P/Direct driver support, which
is inexistent in our libxl driver. So, I've also submitted another series this
morning to tackle this issue [9] and tested together with the earlier series
that I mentioned. Since nova doesn't quite handle the case of no support for
JobInfo/JobStats the live migration monitoring thread crashes and the domain is
left out eternally with "MIGRATING" state. Thus the two series all fix
live-migration without having to modify nova. Additionally I have a nova patch
to fix this erronous behaviour of JobInfo.

The skipped tests mentions block_live_migration though and this means supporting
VIR_MIGRATE_NON_SHARED_INC on libvirt which is "migration with non-shared
storage with incremental copy (same base image shared between source and
destination)" (quoting the corresponding docs on virsh). BUT, this is not
required for testing live migration. From the tempest test code it looks that
block_live_migration is only used _IF_ supported by the test environment which
is good given that we don't support it.

Nova and libvirt
----------------

Moving on, I think libvirt looks good on the API side, as we can see from the
tempest logs. But looking at libvirt interaction on nova, we still these issues:

* Monitoring of migration still remains to be implemented e.g. knowing how much
memory was sent out) just to give an idea to the client. Openstack makes use of
that info (contained in the JobStats) to show the progress of migration.

* NUMA is not an issue at least from the APIs perspective: it extracts the
topology **but** it doesn't process any NUMA (and vNUMA-related) XML elements in
the libvirt guest config. I believe this was pointed out on Jim's XenDevSummit
presentation this year. Nova only pays atention to the guest config, and does
not use any of the NUMA-related APIs on libvirt e.g. setNumaParameters and
getNumaParameters.

* Last but not least is snapshots which if I am not wrong, it's already a work
in progress on libxl side [10].

References
----------

[0] https://bugs.launchpad.net/nova/+bug/1240043
[1] https://bugs.launchpad.net/tempest/+bug/1014647
[2] https://bugs.launchpad.net/nova/+bug/1324348
[3] https://bugs.launchpad.net/swift/+bug/1310597
[4] https://bugs.launchpad.net/tempest/+bug/1205344
[5] https://bugs.launchpad.net/ceilometer/+bug/1480490
[6] https://bugs.launchpad.net/cinder/+bug/1455043
[7]
https://review.openstack.org/gitweb?p=openstack%2Fnova.git;a=commitdiff;h=c513c37385eba42d464e81a324f87d1ca9ceaa83
[8] https://www.redhat.com/archives/libvir-list/2015-September/msg00236.html
[9] https://www.redhat.com/archives/libvir-list/2015-September/msg00233.html
[10] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00889.html

_______________________________________________
Wg-openstack mailing list
Wg-openstack@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-openstack


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.