Thank you Eugen!
After finding what the target name actually was it all worked like a charm.
Best regards, Mikael
On Wed, Jun 21, 2023 at 11:05 AM Eugen Block <eblock(a)nde.ag> wrote:
Hi,
Will that try to be smart and just restart a few
at a time to keep things
up and available. Or will it just trigger a restart everywhere
simultaneously.
basically, that's what happens for example during an upgrade if
services are restarted. It's designed to be a rolling upgrade
procedure so restarting all daemons of a specific service at the same
time would cause an interruption. So the daemons are scheduled to
restart and the mgr decides when it's safe to restart the next (this
is a test cluster started on Nautilus, but it's on Quincy now):
nautilus:~ # ceph orch restart osd.osd-hdd-ssd
Scheduled to restart osd.5 on host 'nautilus'
Scheduled to restart osd.0 on host 'nautilus'
Scheduled to restart osd.2 on host 'nautilus'
Scheduled to restart osd.1 on host 'nautilus2'
Scheduled to restart osd.4 on host 'nautilus2'
Scheduled to restart osd.7 on host 'nautilus2'
Scheduled to restart osd.3 on host 'nautilus3'
Scheduled to restart osd.8 on host 'nautilus3'
Scheduled to restart osd.6 on host 'nautilus3'
When it comes to OSDs it's possible (or even likely) that multiple
OSDs are restarted at the same time, depending on the pools (and their
replication size) they are part of. But ceph tries to avoid "inactive
PGs" which is critical, of course. An edge case would be a pool with
size 1 where restarting an OSD would cause an inactive PG until the
OSD is up again. But since size 1 would be a bad idea anyway (except
for testing purposes) you'd have to live with that.
If you have the option I'd recommend to create a test cluster and play
around with these things to get a better understanding, especially
when it comes to upgrade tests etc.
I guess in my current scenario, restarting one
host at the time makes
most
sense, with a
systemctl restart ceph-{fsid}.target
and then checking that "ceph -s" says OK before proceeding to the next
Yes, if your crush-failure-domain is host that should be safe, too.
Regards,
Eugen
Zitat von Mikael Öhman <micketeer(a)gmail.com>om>:
The documentation very briefly explains a few
core commands for
restarting
things;
https://docs.ceph.com/en/quincy/cephadm/operations/#starting-and-stopping-d…
> but I feel I'm lacking quite some details of what is safe to do.
>
> I have a system in production, clusters connected via CephFS and some
> shared block devices.
> We would like to restart some things due to some new network
> configurations. Going daemon by daemon would take forever, so I'm curious
> as to what happens if one tries the command;
>
> ceph orch restart osd
>
Will that try to be smart and just restart a few
at a time to keep things
up and available. Or will it just trigger a restart everywhere
simultaneously.
>
I guess in my current scenario, restarting one
host at the time makes
most
sense, with a
systemctl restart ceph-{fsid}.target
and then checking that "ceph -s" says OK before proceeding to the next
> host, but I'm still curious as to what the "ceph orch restart xxx"
command
would do (but not enough to try it out in
production)
Best regards, Mikael
Chalmers University of Technology
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io