(adding the list back to the thread)
On Wednesday, March 27, 2024 12:54:34 PM EDT Daniel Brown wrote:
John
I got curious and was taking another quick look through the python script
for cephadm.
That's always welcome. :-D
This is probably too simple of a question to be asking
— or maybe I should
say, I’m not expecting that there’s a simple answer to what might seem like
a simple question -
Is there anything that notifies the cluster, or the other hosts in a
cluster, when a host is going into maintenance mode that it is going into
maintenance mode, or is cephadm just doing systemctl commands behind the
scenes to stop and later restart the appropriate ceph containers locally on
that host?
Maybe a better way to say it would be - what is differentiating between
maintenance mode and a host simply crashing or going offline?
I'll paraphrase Adam King, tech lead for cephadm here:
If one runs the command from cephadm binary directly, it will be disabling/
stopping the systemd target only. The intention is for users to use the `ceph
orch host maintenance` ... commands.
When you use the orch command (quoting Adam here):
```
when we put something into maintenance mode we
1) disable and stop the systemd target for the daemons on the host
2) set the noout flag for all the OSDs on that host
3) internally to cephadm mark the host as having a status of "maintenance"
which has some effects such as us not refreshing metadata on that host or
attempting to place/remove daemons from there
The main difference from that to a host going offline is the noout flag for the
OSDs, and that cephadm will not periodically try to check if the host is
alive, as it would do for an offline host.
I believe the noout flag stops it from trying to migrate all the data on that
OSDs to other OSDs as it shouldn't be necessary if they will be coming back
```
The `cephadm host-maintenance enter` is meant to be a component of the `ceph
orch host maintenance` workflow. It still has a bug, the way it always exits
with an error is wrong. But you may not want to use it directly.
Reference links:
https://docs.ceph.com/en/latest/cephadm/host-management/#maintenance-mode
https://docs.ceph.com/en/latest/dev/cephadm/host-maintenance/
> On Mar 22, 2024, at 6:26 AM, Daniel Brown
<daniel.h.brown(a)thermify.cloud>
> wrote:
>
>
> Looks like it got OK’ed. I’ll put in something today.
>
>
> --
> Dan Brown
>
>> On Mar 21, 2024, at 13:44, John Mulligan <phlogistonjohn(a)asynchrono.us>
>> wrote:>>
>> On Thursday, March 21, 2024 11:43:19 AM EDT Daniel Brown wrote:
>>> Assuming I need admin approval to report this on tracker, how long does
>>> it
>>> take to get approved?? Signed up a couple days ago, but still seeing
>>> “Your
>>> account was created and is now pending administrator approval.”
>>
>> That's unfortunate. I pinged about your issue signing up on the ceph
>> slack
>> channel for infrastructure. Hopefully, that'll get somebody's
attention.
>> If
>> you don't get access by tomorrow feel free to ping me again directly and
>> then *I'll* file the issue for you instead of having you wait around
>> more.