OSD bootstrap time - ceph-users

8 Jun 2021

Hi everyone,

recently I'm noticing that starting OSDs for the first time takes ages
(like, more than an hour) before they are even picked up by the monitors
as "up" and start backfilling. I'm not entirely sure if this is a new
phenomenon or if it always was that way. Either way, I'd like to
understand why.

When I execute `ceph daemon osd.X status`, it says "state: preboot" and
I can see the "newest_map" increase slowly. Apparently, a new OSD
doesn't fetch the latest OSD map and gets to work, but instead fetches
hundreds of thousands of OSD maps from the mon, burning CPU while
parsing them.

I wasn't able to find any good documentation on the OSDMap, in
particular why its historical versions need to be kept and why the OSD
seemingly needs so many of them. Can anybody point me in the right
direction? Or is something wrong with my cluster?

Best regards,
Jan-Philipp Litza