On 11/15/19 1:29 PM, Thomas Schneider wrote:
This cluster has a long unhealthy story, means this
issue is not
happening out of the blue.
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_WARN
1 MDSs report slow metadata IOs
noscrub,nodeep-scrub flag(s) set
Reduced data availability: 1 pg inactive, 1 pg down
1 subtrees have overcommitted pool target_size_bytes
1 subtrees have overcommitted pool target_size_ratio
18 slow requests are blocked > 32 sec
mons ld5505,ld5506 are low on available space
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 2h)
mgr: ld5507(active, since 28h), standbys: ld5506, ld5505
mds: cephfs:1 {0=ld4465=up:active} 1 up:standby
osd: 441 osds: 438 up, 438 in
I think this is the problem. You are lacking a few OSDs which are
probably needed to get that PG back online.
> flags noscrub,nodeep-scrub
>
> data:
> pools: 6 pools, 8432 pgs
> objects: 63.28M objects, 241 TiB
> usage: 723 TiB used, 796 TiB / 1.5 PiB avail
> pgs: 0.012% pgs not active
> 8431 active+clean
> 1 creating+down
>
> io:
> client: 33 MiB/s rd, 14.20k op/s rd, 0 op/s wr
>
>
> Am 15.11.2019 um 13:24 schrieb Wido den Hollander:
>>
>> On 11/15/19 11:22 AM, Thomas Schneider wrote:
>>> Hi,
>>> ceph health is reporting: pg 59.1c is creating+down, acting [426,438]
>>>
>>> root@ld3955:~# ceph health detail
>>> HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub
>>> flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1
>>> subtrees have overcommitted pool target_size_bytes; 1 subtrees have
>>> overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on
>>> available space
>>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>>> mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest
>>> blocked for 120721 secs
>>> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
>>> PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down
>>> pg 59.1c is creating+down, acting [426,438]
>>> MON_DISK_LOW mons ld5505,ld5506 are low on available space
>>> mon.ld5505 has 22% avail
>>> mon.ld5506 has 29% avail
>>>
>>> root@ld3955:~# ceph pg dump_stuck inactive
>>> ok
>>> PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
>>> 59.1c creating+down [426,438] 426 [426,438] 426
>>>
>>> How can I fix this?
>> Did you change anything to the cluster?
>>
>> Can you share this output:
>>
>> $ ceph status
>>
>> As there seems that more things are wrong with this system. This doesn't
>> happen out of the blue. Something must have happened.
>>
>> Wido
>>
>>> THX
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>
>