Hello Magnus,
On Thu, Sep 3, 2020 at 11:55 PM Magnus HAGDORN <Magnus.Hagdorn(a)ed.ac.uk> wrote:
Hi there,
we reconfigured our ceph cluster yesterday to remove the cluster
network and things didn't quite go to plan. I am trying to figure out
what went wrong and also what to do next.
We are running nautilus 14.2.10 on Scientific Linux 7.8.
So, we are using a mixture of RBDs and cephfs. For the transition we
switched off all machines that are using the RBDs and switched off the
cephfs using
ceph fs set one down true
Once no more MDS were running we reconfigured ceph to remove the
cluster network and set various flags
ceph osd set noout
ceph osd set nodown
ceph osd set pause
ceph osd set nobackfill
ceph osd set norebalance
ceph osd set norecover
We then restarted the OSDs one host at a time. During this process ceph
was mostly happy, except for two PGs. After all OSDs had been restarted
we switched off the cluster network switches to make sure it was
totally gone. ceph was still happy. The PG error also disappeared. We
then unset all those errors and re-enabled cephfs.
We then switched on the servers using the RBDs with no issues. So far
so good.
We then started using the cephfs (we keep VM images on the cephfs). The
MDS were showing an error. I restarted the MDS but they didn't come
back.We then followed the instructions here:
https://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disas…
up to truncating the journal. The MDS started again. However, as soon
as we started writing the cephfs the MDS crashed. A scrub of the cephfs
revealed backtrace damage.
I'm confused why you started the disaster recovery procedure when the
procedure you follow should result in no damage to the PGs (and
subsequently CephFS). It'd be helpful to know what this original error
was.
Backtrace damage is usually resolved with a scrub.
We have now followed the remaining steps of the
disaster recovery
procedure and are waiting for the cephfs-data-scan scan_extents to
complete.
It would be really helpful if you could give an indication of how long
this process will take (we have ~40TB in our cephfs) and how many
workers to use.
I don't have any recent data on how long it could take but you might
try using at least 8 workers.
The other missing bit of documentation is the cephfs
scrubbing. Is that
something we should run routinely?
CephFS scrubbing is usually done when something goes wrong or backing
metadata needs updated for some reason as part of an upgrade (e.g.
Mimic and snapshot formats). It's not considered necessary to do it on
a routine basis. RADOS PG scrubbing is sufficient for ensuring that
the backing data is routinely checked for correctness/redundancy.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D