PG in state: creating+down

List overview All Threads
Download

newer

older

Re: Full FLash NVME Cluster...

Cannot list RBDs in any pool /...

Thomas Schneider

15 Nov 2019 15 Nov '19

3:52 p.m.

Hi, ceph health is reporting: pg 59.1c is creating+down, acting [426,438] root@ld3955:~# ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on available space MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest blocked for 120721 secs OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down pg 59.1c is creating+down, acting [426,438] MON_DISK_LOW mons ld5505,ld5506 are low on available space mon.ld5505 has 22% avail mon.ld5506 has 29% avail root@ld3955:~# ceph pg dump_stuck inactive ok PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 59.1c creating+down [426,438] 426 [426,438] 426 How can I fix this? THX

Show replies by date

Wido den Hollander

15 Nov 15 Nov

5:54 p.m.

On 11/15/19 11:22 AM, Thomas Schneider wrote:

...

Did you change anything to the cluster? Can you share this output: $ ceph status As there seems that more things are wrong with this system. This doesn't happen out of the blue. Something must have happened. Wido > > THX > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Thomas Schneider

5:59 p.m.

This cluster has a long unhealthy story, means this issue is not happening out of the blue. root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_WARN 1 MDSs report slow metadata IOs noscrub,nodeep-scrub flag(s) set Reduced data availability: 1 pg inactive, 1 pg down 1 subtrees have overcommitted pool target_size_bytes 1 subtrees have overcommitted pool target_size_ratio 18 slow requests are blocked > 32 sec mons ld5505,ld5506 are low on available space services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 2h) mgr: ld5507(active, since 28h), standbys: ld5506, ld5505 mds: cephfs:1 {0=ld4465=up:active} 1 up:standby osd: 441 osds: 438 up, 438 in flags noscrub,nodeep-scrub data: pools: 6 pools, 8432 pgs objects: 63.28M objects, 241 TiB usage: 723 TiB used, 796 TiB / 1.5 PiB avail pgs: 0.012% pgs not active 8431 active+clean 1 creating+down io: client: 33 MiB/s rd, 14.20k op/s rd, 0 op/s wr Am 15.11.2019 um 13:24 schrieb Wido den Hollander:

...

On 11/15/19 11:22 AM, Thomas Schneider wrote:

Did you change anything to the cluster? Can you share this output: $ ceph status As there seems that more things are wrong with this system. This doesn't happen out of the blue. Something must have happened. Wido > THX > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

Wido den Hollander

8:29 p.m.

On 11/15/19 1:29 PM, Thomas Schneider wrote:

...

I think this is the problem. You are lacking a few OSDs which are probably needed to get that PG back online. > flags noscrub,nodeep-scrub > > data: > pools: 6 pools, 8432 pgs > objects: 63.28M objects, 241 TiB > usage: 723 TiB used, 796 TiB / 1.5 PiB avail > pgs: 0.012% pgs not active > 8431 active+clean > 1 creating+down > > io: > client: 33 MiB/s rd, 14.20k op/s rd, 0 op/s wr > > > Am 15.11.2019 um 13:24 schrieb Wido den Hollander: >> >> On 11/15/19 11:22 AM, Thomas Schneider wrote: >>> Hi, >>> ceph health is reporting: pg 59.1c is creating+down, acting [426,438] >>> >>> root@ld3955:~# ceph health detail >>> HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub >>> flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1 >>> subtrees have overcommitted pool target_size_bytes; 1 subtrees have >>> overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on >>> available space >>> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs >>> mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest >>> blocked for 120721 secs >>> OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set >>> PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down >>> pg 59.1c is creating+down, acting [426,438] >>> MON_DISK_LOW mons ld5505,ld5506 are low on available space >>> mon.ld5505 has 22% avail >>> mon.ld5506 has 29% avail >>> >>> root@ld3955:~# ceph pg dump_stuck inactive >>> ok >>> PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY >>> 59.1c creating+down [426,438] 426 [426,438] 426 >>> >>> How can I fix this? >> Did you change anything to the cluster? >> >> Can you share this output: >> >> $ ceph status >> >> As there seems that more things are wrong with this system. This doesn't >> happen out of the blue. Something must have happened. >> >> Wido >> >>> THX >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users(a)ceph.io >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>> >

Konstantin Shalygin

18 Nov 18 Nov

10:13 a.m.

On 11/15/19 5:22 PM, Thomas Schneider wrote:

...

root@ld3955:~# ceph pg dump_stuck inactive ok PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 59.1c creating+down [426,438] 426 [426,438] 426

I think this is classic PG OD [1] [1] https://ceph.io/community/new-luminous-pg-overdose-protection/ k

1641

days inactive

1644

days old

ceph-users@ceph.io

Manage subscription

4 comments

3 participants

tags (0)

participants (3)

Konstantin Shalygin
Thomas Schneider
Wido den Hollander