On Tue, Oct 1, 2019 at 5:25 AM Frank Schilder <frans(a)dtu.dk> wrote:
I'm running a cepf fs with an 8+2 EC data pool. Disks are on 10 hosts and failure
domain is host. Version is mimic 13.2.2. Today I added a few OSDs to one of the hosts and
observed that a lot of PGs became inactive even though 9 out of 10 hosts were up all the
time. After getting the 10th host and all disks up, I still ended up with a large amount
of undersized PGs and degraded objects, which I don't understand as no OSD was
removed.
Here some details about the steps taken on the host with new disks, main questions at the
end:
- shut down OSDs (systemctl stop docker)
- reboot host (this is necessary due to OS deployment via warewulf)
Devices got renamed and not all disks came back up (4 OSDs remained down). This is
expected, I need to re-deploy the containers to adjust for device name changes. Around
this point PGs started peering and some failed waiting for 1 of the down OSDs. I don't
understand why they didn't just remain active with 9 out of 10 disks. Until this
moment of some OSDs coming up, all PGs were active. With min_size=9 I would expect all PGs
to remain active with no changes to 9 out of the 10 hosts.
- redeploy docker containers
- all disks/OSDs come up, including the 4 OSDs from above
- inactive PGs complete peering and become active
- now I have a los of degraded Objects and undersized PGs even though not a single OSD
was removed
I don't understand why I have degraded objects. I should just have misplaced
objects:
HEALTH_ERR
22995992/145698909 objects misplaced (15.783%)
Degraded data redundancy: 5213734/145698909 objects degraded (3.578%), 208
pgs degraded, 208
pgs undersized
Degraded data redundancy (low space): 169 pgs backfill_toofull
Note: The backfill_toofull with low utilization (usage: 38 TiB used, 1.5 PiB / 1.5 PiB
avail) is a known issue in ceph (
https://tracker.ceph.com/issues/39555)
Also, I should be able to do whatever with 1 out of 10 hosts without loosing data access.
What could be the problem here?
Questions summary:
Why does peering not succeed to keep all PGs active with 9 out of 10 OSDs up and in?
I would just double check that min_size=9 for your pool, it should be
set to that, but that is the only reason I can think that you are
seeing this problem.
Why do undersized PGs arise even though all OSDs are
up?
I've noticed on my cluster that sometimes when an OSD goes down, the
EC considers the OSD missing when it comes back online and needs to
resync. Not sure what exactly causes this to happen, but it happens
more often than it should.
Why do degraded objects arise even though no OSD was
removed?
If you are writing objects while the PGs are undersized (host/osds
down), then it will have to sync those writes to the OSDs that were
down. This is the number of degraded objects.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1