On Fri, Oct 23, 2020 at 9:02 AM David C <dcsysengineer(a)gmail.com> wrote:
Success!
I remembered I had a server I'd taken out of the cluster to
investigate some issues, that had some good quality 800GB Intel DC
SSDs, dedicated an entire drive to swap, tuned up min_free_kbytes,
added an MDS to that server and let it run. Took 3 - 4 hours but
eventually came back online. It used the 128GB of RAM and about 250GB
of the swap.
Dan, thanks so much for steering me down this path, I would have more
than likely started hacking away at the journal otherwise!
Frank, thanks for pointing me towards that other thread, I used your
min_free_kbytes tip
I now need to consider updating - I wonder if the risk averse CephFS
operator would go for the latest Nautilus or latest Octopus, it used
to be that the newer CephFS code meant the most stable but don't know
if that's still the case.
You need to first upgrade to Nautilus in any case. n+2 releases is the
max delta between upgrades.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D