I tried both several times. I looks like it just had to read through the
entire journal. I wish there was more progress notification about journal
reading progress in debug less than 10 because 10 is way too noisy. That
could give us an idea of how much longer there is left to go. It seems that
the MDS got way too behind on segments ~14,000 from some naughty clients
and caused the journal to explode and the MDS to eventually just not
respond to the monitors.
Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Thu, Mar 12, 2020 at 12:48 AM Yan, Zheng <ukernel(a)gmail.com> wrote:
On Thu, Mar 12, 2020 at 1:41 PM Robert LeBlanc
<robert(a)leblancnet.us>
wrote:
This is the second time this happened in a couple of weeks. The MDS locks
up and the stand-by can't take over so the Montiors black list them. I
try
to unblack list them, but they still say this in
the logs
mds.0.1184394 waiting for osdmap 234947 (which blacklists prior instance)
Looking at a pg dump, it looks like the epoch is passed that.
$ ceph pg map 3.756
osdmap e234953 pg 3.756 (3.756) -> up [113,180,115] acting [113,180,115]
Last time, it seemed to just recover after about an hour all by it's
self.
Any way to speed this up?
try restart the standby mds
Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io