Attendees:
Xen: Ian Jackson, Lars Kurth
Credativ: Felix Geyer, Michael Sprengel, Yogesh Patel, Martin Zobel-Helas
Tasks in the Marlborough colo, by ticket
----------------------------------------
CLOSED
65869 Cubietruck disks (in ARM crate)
65871 Machines been given to AllNet
66150 Colo access list
67351 rimava0 failure
67602 Colo rack inventory
ACTION: Ian to commit to our git and maybe fold into existing spreadsheet
Done
(none) Discussion of state of our rack
(none) Ticket workflow for colo tickets
OPEN
(none) Rack rails for ARM crate
ACTION: Ian Cambpell to procure new rails
NEW
(none) Order missing hard rives
ACTION: Ian to file a ticket with Harddrive info and then Yogesh can send a price list
Other itty-bitty bits
---------------------
CLOSED
65860 Password manager
(none) Admin VM has no DNS name
OPEN / NEW
65860 Password manager
ACTION: Ian J to enroll Ian Campbell and Birin Sanchez's PGP keys.
(not done)
(none) Ticket system web access
Credativ report that the ticket system web UI can only grant
web access to tickets by a particular submitter, which would
not be so useful. It might be worth moving the ticket queue
to a VM in Rackspace.
Nearly done, but not yet close. The idea is that we have a generic e-mail address.
ACTION: Felix/Martin will get back via IRC re credentials
Ian: can be a xenproject.org alias
Ian: need to add alias to message labs
ACTION: Ian see above
(none) Report of hours used in support contract
Ian J has not received this report (but maybe wasn't supposed
to, as Lars is the contract contact).
Martin: spoke with Felix how to best get this done. Waiting for Jogesh to commit hours. Then we can get reports on demand.
Jogesh: will have this by the end of day
ACTION: Credativ to send report
NEW:
Ian: general question whether anyone from Credative to see whether there any new issues regarding the Rackspace VMs
Felix: Need to upgrade to wheezy because of package dependencies
Felix: Some (not sure which) machines are monitored
Michael: Some inventory has been done
Ian: Is there a list of new TODO items
Michael: Create child tickets for each host for the upgrade
ACTION: Creative to create these tickets
Bugzilla is very old
ACTION: Lars to send a note to the list that we are planning to kill that bugzilla and see whether anyone needs the data. Could do a R/O archive view.
Ian: if nobody objects we should just kill it
Test colo network access
------------------------
DONE
ACTION: Ian J to make sure Credativ have appropriate access, and to
send an introductory email
NEW
ACTION: Ian to file a ticket to look at the firewall for the two main hosts in the COLO as Ian does not have a lot of confidence in it
Monitoring
----------
DONE:
OPEN:
We lack individual tracking of which Rackspace VMs are properly set
up.
ACTION: Credativ to create sub-tickets for each machine that they've
been given access to and communicate when done to us.
The test colo service machines (dom0's and VMs) ought to be subject to
monitoring too. There was discussion of whether this should happen in
the dom0, or the infrastructure VM. Ian J preferred the use the
infrastructure VM. Of course the new monitoring VM at Rackspace would
need to be able to notice if the colo went dead.
ACTION: Credativ to investigate after Ian J has provided access
ACTION: Creativ to raise tickets for VMs where it is not obvious what they
are used for and for Ian/Lars to comment. Please add Lars and Ian
to the ticket.
ACTION: Credativ to set up a new VM for the monitoring daemon and
cause it to email Credativ
Q: Done. But should mails be sent to another e-mail address?
Ian: send to root@xxxxxxxxxxxxxx
ACTION: Creative to add root@xxxxxxxxxxxxxx
Q: Can we create new VMs?
Ian: Could use the infra VM, but don't have a strong opinion.
If we could avoid creating a VM for that purpose.
ACTION: Install satellite agents into info VM on the COLO and alter
the connection, such that will talk to the monitoring at
the relevant Rackspace VM.
Several of the Rackspace VMs are squeeze. They need to be upgraded
(the monitoring agent is not available in squeeze).
We need to coordinate the downtime with the community users. We
mostly have existing channels for that, which depend on the service,
and, which Lars (and perhaps Ian J) will be able to advise on.
ACTION: Credativ to consult Lars (CC Ian J) about communicating
downtime
Conversation on IRC. Lars added the notes to the ticket.
Critical impact on the community: list server and xenbits
Wiki is also risky and will affect users, but doesn't impact developers
(also runs wheezy)
Ian suggest to start with mailing list server and xenbits
ACTION: Credativ to communicate a schedule of downtimes to Lars and Ian
Make sure that Lars is on the ticket
Note that Lars is travelling from Jan 17-27, 30-Feb 1
(need 3-4 days lead time)
The Rackspace VMs lack the Rackspace agent. This agent would provide
an improved view in the Rackspace control panel.
ACTION: Credativ to install the Rackspace agent on the VMs.
Not all of the Rackspace VMs have been properly handed over to
Credativ
ACTION: Ian J to check the machine list and previous emails,
determine the state of all the remaining VMs, gain access as
necessary, and hand them over to Credativ (or delete), as
applicable
Backups
-------
OPEN
We discussed a variety of possible approaches. Martin suggested that
we could perhaps back up the Rackspace VMs to the colo, and perhaps
vice versa.
The colo contains a number of service hosts (mostly VMs) most of whose
relevant state is configuration rather than data. But also a
PostgreSQL database, currently 6Gby, growing at ~~~3Gb/yr, which could
be streamed using the Postgres replication protocol (also providing a
read-only view for reporting etc.)
ACTION: Credativ to investigate after Ian J has provided access to
the colo, and make a proposal
Martin: not sure whether we have enough diskspace 1-2-1.
Ian: could probably add more diskspace if needed
Martin: it is not clear what the back-up is for
Ian: Suppose the COLO rack catches fire or a power spike
Martin: Not all of the COLO VMs need to be backed up as they look
like development VMs
Ian: At the moment our config management is very poor. The OSSTEST VM
contains a lot of stuff which does not need to be backed up
ACTION: Ian to set out some requirements by email and send to Martin
Ian: There are 4 x 1TB hard disks per machine in the COLO and we seem
to be not using it. There is no usage of DRDB.
ACTION: Creative to double check