On 29/10/2020 19:29, Zhenshi Zhou wrote:
Hi Alex,
We found that there were a huge number of keys in the "logm" and
"osdmap"
table
while using ceph-monstore-tool. I think that could be the root cause.
But that is exactly how Ceph works. It might need that very old OSDMap
to get all the PGs clean again. An OSD which has been gone for a very
long time and needs to catch up to make a PG clean.
If not all PGs are active+clean you will and can see the MON databases
grow rapidly.
Therefor I always deploy 1TB SSDs in all Monitors. Not expensive anymore
and they give breathing room.
I always deploy physical and dedicated machines for Monitors just to
prevent these cases.
Wido
> Well, some pages also say that disable 'insight' module can resolve this
> issue, but
> I checked our cluster and we didn't enable this module. check this page
> <https://tracker.ceph.com/issues/39955>.
>
> Anyway, our cluster is unhealthy though, it just need time keep recovering
> data :)
>
> Thanks
>
> Alex Gracie <alexandergracie17(a)gmail.com> 于2020年10月29日周四 下午10:57写道:
>
>> We hit this issue over the weekend on our HDD backed EC Nautilus cluster
>> while removing a single OSD. We also did not have any luck using
>> compaction. The mon-logs filled up our entire root disk on the mon servers
>> and we were running on a single monitor for hours while we tried to finish
>> recovery and reclaim space. The past couple weeks we also noticed "pg not
>> scubbed in time" errors but are unsure if they are related. I'm still
the
>> exact cause of this(other than the general misplaced/degraded objects) and
>> what kind of growth is acceptable for these store.db files.
>>
>> In order to get our downed mons restarted, we ended up backing up and
>> coping the /var/lib/ceph/mon/* contents to a remote host, setting up an
>> sshfs mount to that new host with large NVME and SSDs, ensuring the mount
>> paths were owned by ceph, then clearing up enough space on the monitor host
>> to start the service. This allowed our store.db directory to grow freely
>> until the misplaced/degraded objects could recover and monitors all
>> rejoined eventually.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>