|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Wg-test-framework] Minutes - Credativ/Xen 2015-12-17
Attendees:
Xen: Ian Jackson
Credativ: Martin Zobel-Helas, Felix Geyer, Yogesh Patel
I think I got everything but please let me know if not.
Tasks in the Marlborough colo, by ticket
----------------------------------------
65869 Cubietruck disks (in ARM crate)
Now sorted out by Yogesh and Ian Campbell. It appears that
simply reseating connectors has fixed the problems. Nodes
handed back; Ian C is running recommissioning flights on some
of them.
ACTION: Credativ: close ticket
(none) Rack rails for ARM crate
(This was discussed on IRC, included in these minutes for
completness)
One of the rails (left side, seen from the front) seems not to
run properly, and Yogesh found a ball bearing ball on the
machine below. The machine is stable right now (not at
risk of collapsing).
ACTION: Ian Cambpell to procure new rails
65871 3 machines suffering from boot order problems
Have been removed from the rack by Yogesh.
ACTION: Yogesh to try to deliver to All-net
66150 Colo access list
This is all sorted out
ACTION: Credativ: close ticket
67351 rimava0 failure
(Also discussed on IRC)
We discovered that the labels on rimava0 and rimava1 were not
consistent with the documentation and software config; we
swapped the labels to avoid changing the software.
We also discovered that the layout document was not accurate.
(More about this later.)
Mysteriously rimava0 started working again, possibly due to
PSU cable being reseated (felt slightly loose, says Yogesh).
ACTION: Credativ: close ticket
67602 Colo rack inventory
Ian J asked Yogesh to inventory the physical contents of the
rack including the PDU connections, so that we can correct
discrepancies with our documentation.
Action (now done): Yogesh to email list to Ian J.
ACTION: Credativ: close ticket
ACTION: Ian to commit to our git and maybe fold into existing spreadsheet
(none) Discussion of state of our rack
Yogesh said he had seen better, but also seen worse. He
advised that he didn't see the need to spend a lot of time
redoing and neatening the wiring.
The serial connectors on rimava[01] had not been screwed in
(see above), which Yogesh corrected. The others probably
aren't screwed in either, but we are not going to do that
proactively as it probably risks more disruption.
(none) Ticket workflow for colo tickets
Yogesh asked if he should poll the Xen/Credativ ticket queue
to look for relevant work. Martin said that Credativ staff in
Germany would be looking at that queue, so there was no need
for Yogesh to poll the queue: relevant tickets would be
assigned to Yogesh as necessary.
After the discussion of the Marlborough colo was completed, we excused
Yogesh.
Other itty-bitty bits
---------------------
65860 Password manager
This is now set up. From the Xen end, only Ian J is currently
configured as an encryption recipient.
ACTION: Credativ: close ticket
ACTION: Ian J to enroll Ian Campbell and Birin Sanchez's PGP keys.
(none) Next meeting
14th of January at the same time
Action (now done): Martin/Felix to tell Yogesh
Many people will be away over parts of the Christmas and New
Year period.
(none) Ticket system web access
Credativ report that the ticket system web UI can only grant
web access to tickets by a particular submitter, which would
not be so useful. It might be worth moving the ticket queue
to a VM in Rackspace.
ACTION: Felix/Martin to investigate
(none) Report of hours used in support contract
Ian J has not received this report (but maybe wasn't supposed
to, as Lars is the contract contact).
ACTION: Martin to talk to David Brauner (CC'd on contract mails)
to check the email was sent.
If it was sent and Ian J wants a copy, or this needs chasing,
Ian J can liase directly with David.
(none) Admin VM has no DNS name
The primary DNS zone is xenproject.org, in the standard place
(in /etc) in the VM mail.xenproject.org. The reverse DNS is
controlled via the RS panel.
ACTION: Felix/Martin to add a DNS name (and update the reverse DNS)
We discussed revision control: currently the zonefile is in
git by virtue of etckeeper. At some point we may want to
move it to the gitolite in the admin VM. But not right now.
Test colo network access
------------------------
Credativ have not been properly introduced to the test colo service
machines, which ought to be subject to backup and monitoring.
ACTION: Ian J to make sure Credativ have appropriate access, and to
send an introductory email
Monitoring
----------
We lack individual tracking of which Rackspace VMs are properly set
up.
ACTION: Credativ to create sub-tickets for each machine that they've
been given access to
The wheezy+ VMs that Credativ have access to now have the monitoring
agent installed. There isn't anything talking to them though yet.
ACTION: Credativ to set up a new VM for the monitoring daemon and
cause it to email Credativ
Several of the Rackspace VMs are squeeze. They need to be upgraded
(the monitoring agent is not available in squeeze).
We need to coordinate the downtime with the community users. We
mostly have existing channels for that, which depend on the service,
and, which Lars (and perhaps Ian J) will be able to advise on.
ACTION: Credativ to consult Lars (CC Ian J) about communicating
downtime
ACTION: Credativ to then make appropriate plans for upgrading
The Rackspace VMs lack the Rackspace agent. This agent would provide
an improved view in the Rackspace control panel.
ACTION: Credativ to install the Rackspace agent on the VMs.
Not all of the Rackspace VMs have been properly handed over to
Credativ
ACTION: Ian J to check the machine list and previous emails,
determine the state of all the remaining VMs, gain access as
necessary, and hand them over to Credativ (or delete), as
applicable
The test colo service machines (dom0's and VMs) ought to be subject to
monitoring too. There was discussion of whether this should happen in
the dom0, or the infrastructure VM. Ian J preferred the use the
infrastructure VM. Of course the new monitoring VM at Rackspace would
need to be able to notice if the colo went dead.
ACTION: Credativ to investigate after Ian J has provided access
Backups
-------
We discussed a variety of possible approaches. Martin suggested that
we could perhaps back up the Rackspace VMs to the colo, and perhaps
vice versa.
The colo contains a number of service hosts (mostly VMs) most of whose
relevant state is configuration rather than data. But also a
PostgreSQL database, currently 6Gby, growing at ~~~3Gb/yr, which could
be streamed using the Postgres replication protocol (also providing a
read-only view for reporting etc.)
ACTION: Credativ to investigate after Ian J has provided access to
the colo, and make a proposal
Ian.
_______________________________________________
Wg-test-framework mailing list
Wg-test-framework@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |