We've gotten a bit further, after evaluating how this remapped count was
determine (pg_temp), we've found the PGs counted as being remapped:
root@ceph01:~# ceph osd dump |grep pg_temp
pg_temp 3.7af [93,1,29]
pg_temp 3.7bc [137,97,5]
pg_temp 3.7d9 [72,120,18]
pg_temp 3.7e8 [80,21,71]
pg_temp 3.7fd [74,51,8]
Looking at 3.7af:
root@ceph01:~# ceph pg map 3.7af
osdmap e15406 pg 3.7af (3.f) -> up [87,156,29] acting [87,156,29]
I'm unclear why this is staying in pg_temp. Is there a way to clean this
up? I would have expected it to be cleaned up as per docs but I might be
missing something here.
On Thu, Aug 6, 2020 at 2:40 PM David Orman <ormandj(a)corenode.com> wrote:
Still haven't figured this out. We went ahead and
upgraded the entire
cluster to Podman 2.0.4 and in the process did OS/Kernel upgrades and
rebooted every node, one at a time. We've still got 5 PGs stuck in
'remapped' state, according to 'ceph -s' but 0 in the pg dump output in
that state. Does anybody have any suggestions on what to do about this?
On Wed, Aug 5, 2020 at 10:54 AM David Orman <ormandj(a)corenode.com> wrote:
Hi,
We see that we have 5 'remapped' PGs, but are unclear why/what to do
about it. We shifted some target ratios for the autobalancer and it
resulted in this state. When adjusting ratio, we noticed two OSDs go down,
but we just restarted the container for those OSDs with podman, and they
came back up. Here's status output:
###################
root@ceph01:~# ceph status
INFO:cephadm:Inferring fsid x
INFO:cephadm:Inferring config x
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
cluster:
id: 41bb9256-c3bf-11ea-85b9-9e07b0435492
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph01,ceph04,ceph02,ceph03,ceph05 (age 2w)
mgr: ceph03.ytkuyr(active, since 2w), standbys: ceph01.aqkgbl,
ceph02.gcglcg, ceph04.smbdew, ceph05.yropto
osd: 168 osds: 168 up (since 2d), 168 in (since 2d); 5 remapped pgs
data:
pools: 3 pools, 1057 pgs
objects: 18.00M objects, 69 TiB
usage: 119 TiB used, 2.0 PiB / 2.1 PiB avail
pgs: 1056 active+clean
1 active+clean+scrubbing+deep
io:
client: 859 KiB/s rd, 212 MiB/s wr, 644 op/s rd, 391 op/s wr
root@ceph01:~#
###################
When I look at ceph pg dump, I don't see any marked as remapped:
###################
root@ceph01:~# ceph pg dump |grep remapped
INFO:cephadm:Inferring fsid x
INFO:cephadm:Inferring config x
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
dumped all
root@ceph01:~#
###################
Any idea what might be going on/how to recover? All OSDs are up. Health
is 'OK'. This is Ceph 15.2.4 deployed using Cephadm in containers, on
Podman 2.0.3.