I think that from the 3rd time the database just goes
into
compaction maintenance
Can you share some more details what exactly you mean? Do you mean
that if I restart a MON three times it goes into compaction
maintenance and that it's not related to a timing? We tried the same
on a different MON and only did two tests:
- stopping a MON for less than 5 minutes, starting it again, sync
happens immediately
- stopping a MON for more than 5 minutes, starting it again, sync
takes 15 minutes
This doesn't feel related to the payload size or keys option, but a
timing option.
Zitat von Eugen Block <eblock(a)nde.ag>ag>:
Thanks, Dan!
Yes that sounds familiar from the luminous and
mimic days.
The workaround for zillions of snapshot keys at that time was to use:
ceph config set mon mon_sync_max_payload_size 4096
I actually did search for mon_sync_max_payload_keys, not bytes so I
missed your thread, it seems. Thanks for pointing that out. So the
defaults seem to be these in Octopus:
"mon_sync_max_payload_keys": "2000",
"mon_sync_max_payload_size": "1048576",
So it could be in your case that the sync payload
is just too small to
efficiently move 42 million osd_snap keys? Using debug_paxos and debug_mon
you should be able to understand what is taking so long, and tune
mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly.
I'm confused, if the payload size is too small, why would decreasing
it help? Or am I misunderstanding something? But it probably won't
hurt to try it with 4096 and see if anything changes. If not we can
still turn on debug logs and take a closer look.
And additional to Dan suggestion, the HDD is not
a good choices for
RocksDB, which is most likely the reason for this thread, I think
that from the 3rd time the database just goes into compaction
maintenance
Believe me, I know... but there's not much they can currently do
about it, quite a long story... But I have been telling them that
for months now. Anyway, I will make some suggestions and report back
if it worked in this case as well.
Thanks!
Eugen
Zitat von Dan van der Ster <dan.vanderster(a)clyso.com>om>:
> Hi Eugen!
>
> Yes that sounds familiar from the luminous and mimic days.
>
> Check this old thread:
>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F3W2HXMYNF5…
> (that thread is truncated but I can tell you that it worked for Frank).
> Also the even older referenced thread:
>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/M5ZKF7PTEO2…
>
> The workaround for zillions of snapshot keys at that time was to use:
> ceph config set mon mon_sync_max_payload_size 4096
>
> That said, that sync issue was supposed to be fixed by way of adding the
> new option mon_sync_max_payload_keys, which has been around since nautilus.
>
So it could be in your case that the sync payload
is just too small to
efficiently move 42 million osd_snap keys? Using debug_paxos and debug_mon
you should be able to understand what is taking so long, and tune
mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly.
>
> Good luck!
>
> Dan
>
> ______________________________________________________
> Clyso GmbH | Ceph Support and Consulting |
https://www.clyso.com
>
>
>
> On Thu, Jul 6, 2023 at 1:47 PM Eugen Block <eblock(a)nde.ag> wrote:
>
>> Hi *,
>>
>> I'm investigating an interesting issue on two customer clusters (used
>> for mirroring) I've not solved yet, but today we finally made some
>> progress. Maybe someone has an idea where to look next, I'd appreciate
>> any hints or comments.
>> These are two (latest) Octopus clusters, main usage currently is RBD
>> mirroring with snapshot mode (around 500 RBD images are synced every
>> 30 minutes). They noticed very long startup times of MON daemons after
>> reboot, times between 10 and 30 minutes (reboot time already
>> subtracted). These delays are present on both sites. Today we got a
>> maintenance window and started to check in more detail by just
>> restarting the MON service (joins quorum within seconds), then
>> stopping the MON service and wait a few minutes (still joins quorum
>> within seconds). And then we stopped the service and waited for more
>> than 5 minutes, simulating a reboot, and then we were able to
>> reproduce it. The sync then takes around 15 minutes, we verified with
>> other MONs as well. The MON store is around 2 GB of size (on HDD), I
>> understand that the sync itself can take some time, but what is the
>> threshold here? I tried to find a hint in the MON config, searching
>> for timeouts with 300 seconds, there were only a few matches
>> (mon_session_timeout is one of them), but I'm not sure if they can
>> explain this behavior.
>> Investigating the MON store (ceph-monstore-tool dump-keys) I noticed
>> that there were more than 42 Million osd_snap keys, which is quite a
>> lot and would explain the size of the MON store. But I'm also not sure
>> if it's related to the long syncing process.
>> Does that sound familiar to anyone?
>>
>> Thanks,
>> Eugen
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>