Xen project Mailing List

Attendees:

Xen: Ian Jackson, Lars Kurth

Credativ: Felix Geyer, Michael Sprengel, Yogesh Patel, Martin Zobel-Helas

Tasks in the Marlborough colo, by ticket

----------------------------------------

CLOSED

65869 Cubietruck disks (in ARM crate)

65871 Machines been given to AllNet

66150 Colo access list

67351 rimava0 failure

67602 Colo rack inventory

ACTION: Ian to commit to our git and maybe fold into existing spreadsheet

Done

(none) Discussion of state of our rack

(none) Ticket workflow for colo tickets

OPEN

(none) Rack rails for ARM crate

ACTION: Ian Cambpell to procure new rails

NEW

(none) Order missing hard rives

ACTION: Ian to file a ticket with Harddrive info and then Yogesh can send a price list

Other itty-bitty bits

---------------------

CLOSED

65860 Password manager

(none) Admin VM has no DNS name

OPEN / NEW

65860 Password manager

ACTION: Ian J to enroll Ian Campbell and Birin Sanchez's PGP keys.

(not done)

(none) Ticket system web access

Credativ report that the ticket system web UI can only grant

web access to tickets by a particular submitter, which would

not be so useful. It might be worth moving the ticket queue

to a VM in Rackspace.

Nearly done, but not yet close. The idea is that we have a generic e-mail address.

ACTION: Felix/Martin will get back via IRC re credentials

Ian: can be a xenproject.org alias

Ian: need to add alias to message labs

ACTION: Ian see above

(none) Report of hours used in support contract

Ian J has not received this report (but maybe wasn't supposed

to, as Lars is the contract contact).

Martin: spoke with Felix how to best get this done. Waiting for Jogesh to commit hours. Then we can get reports on demand.

Jogesh: will have this by the end of day

ACTION: Credativ to send report

NEW:

Ian: general question whether anyone from Credative to see whether there any new issues regarding the Rackspace VMs

Felix: Need to upgrade to wheezy because of package dependencies

Felix: Some (not sure which) machines are monitored

Michael: Some inventory has been done

Ian: Is there a list of new TODO items

Michael: Create child tickets for each host for the upgrade

ACTION: Creative to create these tickets

Bugzilla is very old

ACTION: Lars to send a note to the list that we are planning to kill that bugzilla and see whether anyone needs the data. Could do a R/O archive view.

Ian: if nobody objects we should just kill it

Test colo network access

------------------------

DONE

ACTION: Ian J to make sure Credativ have appropriate access, and to

send an introductory email

NEW

ACTION: Ian to file a ticket to look at the firewall for the two main hosts in the COLO as Ian does not have a lot of confidence in it

Monitoring

----------

DONE:

OPEN:

We lack individual tracking of which Rackspace VMs are properly set

up.

ACTION: Credativ to create sub-tickets for each machine that they've

been given access to and communicate when done to us.

The test colo service machines (dom0's and VMs) ought to be subject to

monitoring too. There was discussion of whether this should happen in

the dom0, or the infrastructure VM. Ian J preferred the use the

infrastructure VM. Of course the new monitoring VM at Rackspace would

need to be able to notice if the colo went dead.

ACTION: Credativ to investigate after Ian J has provided access

ACTION: Creativ to raise tickets for VMs where it is not obvious what they

are used for and for Ian/Lars to comment. Please add Lars and Ian

to the ticket.

ACTION: Credativ to set up a new VM for the monitoring daemon and

cause it to email Credativ

Q: Done. But should mails be sent to another e-mail address?

Ian: send to root@xxxxxxxxxxxxxx

ACTION: Creative to add root@xxxxxxxxxxxxxx

Q: Can we create new VMs?

Ian: Could use the infra VM, but don't have a strong opinion.

If we could avoid creating a VM for that purpose.

ACTION: Install satellite agents into info VM on the COLO and alter

the connection, such that will talk to the monitoring at

the relevant Rackspace VM.

Several of the Rackspace VMs are squeeze. They need to be upgraded

(the monitoring agent is not available in squeeze).

We need to coordinate the downtime with the community users. We

mostly have existing channels for that, which depend on the service,

and, which Lars (and perhaps Ian J) will be able to advise on.

ACTION: Credativ to consult Lars (CC Ian J) about communicating

downtime

Conversation on IRC. Lars added the notes to the ticket.

Critical impact on the community: list server and xenbits

Wiki is also risky and will affect users, but doesn't impact developers

(also runs wheezy)

Ian suggest to start with mailing list server and xenbits

ACTION: Credativ to communicate a schedule of downtimes to Lars and Ian

Make sure that Lars is on the ticket

Note that Lars is travelling from Jan 17-27, 30-Feb 1

(need 3-4 days lead time)

The Rackspace VMs lack the Rackspace agent. This agent would provide

an improved view in the Rackspace control panel.

ACTION: Credativ to install the Rackspace agent on the VMs.

Not all of the Rackspace VMs have been properly handed over to

Credativ

ACTION: Ian J to check the machine list and previous emails,

determine the state of all the remaining VMs, gain access as

necessary, and hand them over to Credativ (or delete), as

applicable

Backups

-------

OPEN

We discussed a variety of possible approaches. Martin suggested that

we could perhaps back up the Rackspace VMs to the colo, and perhaps

vice versa.

The colo contains a number of service hosts (mostly VMs) most of whose

relevant state is configuration rather than data. But also a

PostgreSQL database, currently 6Gby, growing at ~~~3Gb/yr, which could

be streamed using the Postgres replication protocol (also providing a

read-only view for reporting etc.)

ACTION: Credativ to investigate after Ian J has provided access to

the colo, and make a proposal

Martin: not sure whether we have enough diskspace 1-2-1.

Ian: could probably add more diskspace if needed

Martin: it is not clear what the back-up is for

Ian: Suppose the COLO rack catches fire or a power spike

Martin: Not all of the COLO VMs need to be backed up as they look

like development VMs

Ian: At the moment our config management is very poor. The OSSTEST VM

contains a lot of stuff which does not need to be backed up

ACTION: Ian to set out some requirements by email and send to Martin

Ian: There are 4 x 1TB hard disks per machine in the COLO and we seem

to be not using it. There is no usage of DRDB.

ACTION: Creative to double check

[Wg-test-framework] Minutes: Creative - Xen Project meeting (Jan 14/2016)