I'll power the cluster up today or tomorrow and take a look again, Dan, but
the initial problem is that many of the pgs can't be queried — the requests
time out. I don't know if it's purely the stale, or just the unknown pgs,
that can't be queried, but I'll investigate if there's something wrong with
mgr. I typically have plenty of running mgrs.
Thanks for the advice on ignore_history. I'll avoid it for now.
On Fri, Feb 5, 2021 at 6:52 AM Dan van der Ster <dan(a)vanderster.com> wrote:
Eeek! Don't run
`osd_find_best_info_ignore_history_les = true` -- that
leads to data loss even such that you don't expect.
Are you sure all OSDs are up?
Query a PG to find out why it is unknown: `ceph pg <id> query`. Feel
free to share that
In fact, the 'unknown' state means the MGR doesn't know the state of
the PG -- is your MGR running correctly now?
-- Dan
On Fri, Feb 5, 2021 at 4:49 PM Jeremy Austin <jhaustin(a)gmail.com> wrote:
I was in the middle of a rebalance on a small test cluster with about 1%
of
pgs degraded, and shut the cluster entirely down
for maintenance.
On startup, many pgs are entirely unknown, and most stale. In fact most
pgs
can't be queried! No mon failures. Would osd
logs tell me why pgs aren't
even moving to an inactive state?
I'm not concerned about data loss due to the shutdown (all activity to
the
cluster had been stopped), so should I be setting
some or all OSDs "
osd_find_best_info_ignore_history_les = true"?
Thank you,
--
Jeremy Austin
jhaustin(a)gmail.com
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Jeremy Austin
jhaustin(a)gmail.com