Not sure which version you are on, but adding these to your /etc/ceph/ceph.conf file and restarting the OSD processes can go a long way to helping these really long blocks. It won't get rid of them completely, but should really help.

osd op queue = wpq
osd op queue cut off = high

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Sep 16, 2019 at 11:34 PM Thomas <74cmonty@gmail.com> wrote:
Hi,

I have defined pool hdd which is exclusively used by virtual disks of
multiple KVMs / LXCs.
Yesterday I run these commands
osdmaptool om --upmap out.txt --upmap-pool hdd
source out.txt
and Ceph started rebalancing this pool.

However since then no KVM / LXC is reacting anymore.
If I try to start a new KVM it hangs in boot process.

This is the output of ceph health detail:
root@ld3955:/mnt/rbd# ceph health detail
HEALTH_ERR 28 nearfull osd(s); 1 pool(s) nearfull; Reduced data
availability: 1 pg inactive, 1 pg peering; Degraded data redundancy (low
space): 8 pgs backfill_toofull; 1 subtrees have overcommitted pool
target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio;
2 pools have too many placement groups; 672 slow requests are blocked >
32 sec; 4752 stuck requests are blocked > 4096 sec
OSD_NEARFULL 28 nearfull osd(s)
    osd.42 is near full
    osd.44 is near full
    osd.45 is near full
    osd.77 is near full
    osd.84 is near full
    osd.94 is near full
    osd.101 is near full
    osd.103 is near full
    osd.106 is near full
    osd.109 is near full
    osd.113 is near full
    osd.118 is near full
    osd.120 is near full
    osd.136 is near full
    osd.138 is near full
    osd.142 is near full
    osd.147 is near full
    osd.156 is near full
    osd.159 is near full
    osd.161 is near full
    osd.168 is near full
    osd.192 is near full
    osd.202 is near full
    osd.206 is near full
    osd.208 is near full
    osd.226 is near full
    osd.234 is near full
    osd.247 is near full
POOL_NEARFULL 1 pool(s) nearfull
    pool 'hdb_backup' is nearfull
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg peering
    pg 30.1b9 is stuck peering for 4722.750977, current state peering,
last acting [183,27,63]
PG_DEGRADED_FULL Degraded data redundancy (low space): 8 pgs
backfill_toofull
    pg 11.465 is active+remapped+backfill_wait+backfill_toofull, acting
[308,351,58]
    pg 11.5c4 is active+remapped+backfill_wait+backfill_toofull, acting
[318,336,54]
    pg 11.afd is active+remapped+backfill_wait+backfill_toofull, acting
[347,220,315]
    pg 11.b82 is active+remapped+backfill_toofull, acting [314,320,254]
    pg 11.1803 is active+remapped+backfill_wait+backfill_toofull, acting
[88,363,302]
    pg 11.1aac is active+remapped+backfill_wait+backfill_toofull, acting
[328,275,95]
    pg 11.1c09 is active+remapped+backfill_wait+backfill_toofull, acting
[55,124,278]
    pg 11.1e36 is active+remapped+backfill_wait+backfill_toofull, acting
[351,92,315]
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool
target_size_bytes
    Pools ['hdb_backup'] overcommit available storage by 1.708x due to
target_size_bytes    0  on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool
target_size_ratio
    Pools ['hdb_backup'] overcommit available storage by 1.708x due to
target_size_ratio 0.000 on pools []
POOL_TOO_MANY_PGS 2 pools have too many placement groups
    Pool hdd has 512 placement groups, should have 128
    Pool pve_cephfs_metadata has 32 placement groups, should have 4
REQUEST_SLOW 672 slow requests are blocked > 32 sec
    249 ops are blocked > 2097.15 sec
    284 ops are blocked > 1048.58 sec
    108 ops are blocked > 524.288 sec
    9 ops are blocked > 262.144 sec
    22 ops are blocked > 131.072 sec
    osd.9 has blocked requests > 524.288 sec
    osds 0,2,6,68 have blocked requests > 1048.58 sec
    osd.3 has blocked requests > 2097.15 sec
REQUEST_STUCK 4752 stuck requests are blocked > 4096 sec
    1431 ops are blocked > 67108.9 sec
    513 ops are blocked > 33554.4 sec
    909 ops are blocked > 16777.2 sec
    1809 ops are blocked > 8388.61 sec
    90 ops are blocked > 4194.3 sec
    osd.63 has stuck requests > 67108.9 sec


My interpretation is that Ceph
a) is busy with remapping PGs of pool hdb_backup
b) has identified several OSDs with either blocked or stuck requests.

Any of these OSDs belongs to pool hdd, though.
osd.9 belongs to node A, osd.63 and osd.68 belongs to node C (there are
4 nodes serving OSD in the cluster).

I have tried to fix this issue, but it failed with
- ceph osd set noout
- restart of relevant OSD by systemctl restart ceph-osd@<id>
and finally server reboot.

I also tried to migrate the virtual disks to another pool, but this
fails, too.

There are no changes on server side, like network or disks or whatsoever.

How can I resolve this issue?

THX
Thomas
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io