[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wg-test-framework] Minutes - Credativ/Xen 2015-12-17



Attendees:
 Xen: Ian Jackson
 Credativ: Martin Zobel-Helas, Felix Geyer, Yogesh Patel


I think I got everything but please let me know if not.


Tasks in the Marlborough colo, by ticket
----------------------------------------

65869   Cubietruck disks (in ARM crate)

        Now sorted out by Yogesh and Ian Campbell.  It appears that
        simply reseating connectors has fixed the problems.  Nodes
        handed back; Ian C is running recommissioning flights on some
        of them.

 ACTION: Credativ: close ticket

(none)  Rack rails for ARM crate
        (This was discussed on IRC, included in these minutes for
        completness)

        One of the rails (left side, seen from the front) seems not to
        run properly, and Yogesh found a ball bearing ball on the
        machine below.  The machine is stable right now (not at
        risk of collapsing).

 ACTION: Ian Cambpell to procure new rails

65871   3 machines suffering from boot order problems
        Have been removed from the rack by Yogesh.

 ACTION: Yogesh to try to deliver to All-net

66150   Colo access list
        This is all sorted out

 ACTION: Credativ: close ticket

67351   rimava0 failure
        (Also discussed on IRC)

        We discovered that the labels on rimava0 and rimava1 were not
        consistent with the documentation and software config; we
        swapped the labels to avoid changing the software.

        We also discovered that the layout document was not accurate.
        (More about this later.)

        Mysteriously rimava0 started working again, possibly due to
        PSU cable being reseated (felt slightly loose, says Yogesh).

 ACTION: Credativ: close ticket

67602   Colo rack inventory

        Ian J asked Yogesh to inventory the physical contents of the
        rack including the PDU connections, so that we can correct
        discrepancies with our documentation.

 Action (now done): Yogesh to email list to Ian J.
 ACTION: Credativ: close ticket
 ACTION: Ian to commit to our git and maybe fold into existing spreadsheet

(none)  Discussion of state of our rack

        Yogesh said he had seen better, but also seen worse.  He
        advised that he didn't see the need to spend a lot of time
        redoing and neatening the wiring.

        The serial connectors on rimava[01] had not been screwed in
        (see above), which Yogesh corrected.  The others probably
        aren't screwed in either, but we are not going to do that
        proactively as it probably risks more disruption.

(none)  Ticket workflow for colo tickets

        Yogesh asked if he should poll the Xen/Credativ ticket queue
        to look for relevant work.  Martin said that Credativ staff in
        Germany would be looking at that queue, so there was no need
        for Yogesh to poll the queue: relevant tickets would be
        assigned to Yogesh as necessary.

After the discussion of the Marlborough colo was completed, we excused
Yogesh.


Other itty-bitty bits
---------------------

65860   Password manager

        This is now set up.  From the Xen end, only Ian J is currently
        configured as an encryption recipient.

 ACTION: Credativ: close ticket
 ACTION: Ian J to enroll Ian Campbell and Birin Sanchez's PGP keys.


(none)   Next meeting

         14th of January at the same time

 Action (now done): Martin/Felix to tell Yogesh

         Many people will be away over parts of the Christmas and New
         Year period.

(none)   Ticket system web access

         Credativ report that the ticket system web UI can only grant
         web access to tickets by a particular submitter, which would
         not be so useful.  It might be worth moving the ticket queue
         to a VM in Rackspace.

 ACTION: Felix/Martin to investigate


(none)   Report of hours used in support contract

         Ian J has not received this report (but maybe wasn't supposed
         to, as Lars is the contract contact).

 ACTION: Martin to talk to David Brauner (CC'd on contract mails)
   to check the email was sent.

         If it was sent and Ian J wants a copy, or this needs chasing,
         Ian J can liase directly with David.


(none)   Admin VM has no DNS name

         The primary DNS zone is xenproject.org, in the standard place
         (in /etc) in the VM mail.xenproject.org.  The reverse DNS is
         controlled via the RS panel.

 ACTION: Felix/Martin to add a DNS name (and update the reverse DNS)

         We discussed revision control: currently the zonefile is in
         git by virtue of etckeeper.  At some point we may want to
         move it to the gitolite in the admin VM.  But not right now.

Test colo network access
------------------------

Credativ have not been properly introduced to the test colo service
machines, which ought to be subject to backup and monitoring.

 ACTION: Ian J to make sure Credativ have appropriate access, and to
   send an introductory email



Monitoring
----------

We lack individual tracking of which Rackspace VMs are properly set
up.

 ACTION: Credativ to create sub-tickets for each machine that they've
       been given access to


The wheezy+ VMs that Credativ have access to now have the monitoring
agent installed.  There isn't anything talking to them though yet.

 ACTION: Credativ to set up a new VM for the monitoring daemon and
       cause it to email Credativ


Several of the Rackspace VMs are squeeze.  They need to be upgraded
(the monitoring agent is not available in squeeze).

We need to coordinate the downtime with the community users.  We
mostly have existing channels for that, which depend on the service,
and, which Lars (and perhaps Ian J) will be able to advise on.

 ACTION: Credativ to consult Lars (CC Ian J) about communicating
          downtime
 ACTION: Credativ to then make appropriate plans for upgrading


The Rackspace VMs lack the Rackspace agent.  This agent would provide
an improved view in the Rackspace control panel.

 ACTION: Credativ to install the Rackspace agent on the VMs.


Not all of the Rackspace VMs have been properly handed over to
Credativ

 ACTION: Ian J to check the machine list and previous emails,
     determine the state of all the remaining VMs, gain access as
     necessary, and hand them over to Credativ (or delete), as
     applicable


The test colo service machines (dom0's and VMs) ought to be subject to
monitoring too.  There was discussion of whether this should happen in
the dom0, or the infrastructure VM.  Ian J preferred the use the
infrastructure VM.  Of course the new monitoring VM at Rackspace would
need to be able to notice if the colo went dead.

 ACTION: Credativ to investigate after Ian J has provided access



Backups
-------

We discussed a variety of possible approaches.  Martin suggested that
we could perhaps back up the Rackspace VMs to the colo, and perhaps
vice versa.

The colo contains a number of service hosts (mostly VMs) most of whose
relevant state is configuration rather than data.  But also a
PostgreSQL database, currently 6Gby, growing at ~~~3Gb/yr, which could
be streamed using the Postgres replication protocol (also providing a
read-only view for reporting etc.)

 ACTION: Credativ to investigate after Ian J has provided access to
     the colo, and make a proposal



Ian.

_______________________________________________
Wg-test-framework mailing list
Wg-test-framework@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.