[sepia] Re: Planned Outage this Wed August 19

19 Aug 2020

This work is complete.  Jobs/workers are writing to the new EC pool.

Patrick will begin migrating logs from the last month to the new pool now.

Thanks for your patience and thanks Patrick for your help!

On 8/17/20 2:24 PM, David Galloway wrote:
...
  As you may know, the Sepia Long Running Cluster has
been hitting
 capacity limits over the past week or so.  This has resulted in service
 disruptions to teuthology runs, chacra.ceph.com,
 docker-mirror.front.sepia.ceph.com, and quay.ceph.io.

 We've been able to get by by deleting/compressing logs more aggressively
 but it's not ideal or sustainable.

 Patrick has created a new erasure coded pool/filesystem that will allow
 us to keep the same amount of logs but use less space.  In order to have
 teuthology workers start writing logs to that pool, we need to take an
 outage.

 At 0400 UTC 19AUG2020, I will instruct all teuthology workers to die
 after their running jobs finish.  At 1300 UTC, I will kill any jobs that
 are still running.  This gives the lab 9 hours to gracefully shut down.

 At that point, we will switch the mountpoint on teuthology.front over to
 the new EC pool and start storing new logs there.

 At the same time, Patrick will start migrating logs on the existing/old
 pool to the new pool.  This means that logs from 7/20 through 8/19 will
 be unavailable (you'll see 404s) via the Pulpito web UI and qa-proxy
 URLs until they're migrated to the new EC pool.

 Let me know if you have any questions/concerns.

 Thanks,

2024

2023

2022

2021

2020

2019

[sepia] Re: Planned Outage this Wed August 19