Around 18JUN2020 0700 UTC, an errant `sudo rm -rf ceph` from the root
directory on a senta unfortunately wiped out almost all data on the Ceph
cluster in our upstream Sepia lab (AKA Long Running Cluster or LRC).
Only teuthology job logs were preserved.
I would guess because teuthology workers were actively writing jobs logs
and files, the /teuthology-archive directory didn't get entirely wiped out.
Here is a list of directories we lost:
bz
cephdrop (
drop.ceph.com)
cephfs-perf
chacra (
chacra.ceph.com)
containers (quay.ceph.io)
dgalloway
diskprediction_config.txt
doug-is-great
el8
filedump.ceph.com
firmware
home.backup01
home.gitbuilder-archive
job1.0.0
jspray.senta02.home.tar.gz
old.repos
post (files submitted using ceph-post-file)
sftp (
drop.ceph.com/qa)
shaman
signer (signed upstream release packages)
tmp
traces
While I /did/ have backups of
chacra.ceph.com binaries, the amount of
data (> 1TB) backed up was too much to keep snapshots of. My daily
backup script performs an `rsync --delete-delay` so if files are gone on
the source, they get deleted from the backup. This is fine (and
preferred) for backups we have snapshots of. However, the backup script
ran *after* the errant `rm -rf` so unfortunately everything on
chacra.ceph.com is gone. I have patched the backup script to *not*
--delete-delay backups that we don't keep snapshots of.
I restored the vagrant and valgrind
chacra.ceph.com repos because I saw
teuthology jobs failing because of those missing repos. Kefu also
rebuilt and pushed ceph-libboost 1.72. (THANK YOU, KEFU!)
We started using the quay.ceph.io registry (instead of quay.io) on June
17. Containers pushed to that registry were stored on the LRC as well
so I had to delete the repo and start over this morning. Anything you
see in the web UI should pull without issue:
https://quay.ceph.io/repository/ceph-ci/ceph?tab=tags
To prevent data loss in the future, Patrick graciously set up new
filesystems and client credentials on the LRC. Because senta{02..04}
are considered developer playgrounds, all users have sudo access. The
sentas now mount /teuthology-archive read-only at /teuthology. If you
need to unzip and inspect log files on a senta, you can do so in
/scratch (another new filesystem on the LRC).
It will likely take weeks of "where did X go" e-mails to mailing lists,
job and build failures, bugs filed, IRC pings, etc. for me to find and
restore everything that was used on a regular basis. I appreciate your
patience and understanding in the meantime.
Take care & be well,
--
David Galloway
Systems Administrator, RDU
Ceph Engineering
IRC: dgalloway