Em ter., 4 de fev. de 2020 às 15:19, <DHilsbos(a)performair.com> escreveu:
Rodrigo;
Best bet would be to check logs. Check the OSD logs on the affected server. Check
cluster logs on the MONs. Check OSD logs on other servers.
Your Ceph version(s) and your OS distribution and version would also be useful to help
you troubleshoot this OSD flapping issue.
Looking at the logs I finally found the issue: when I said that there
were no changes in network topology, I was mistaken. I removed an
unused (or at least I thought so) network board from each server.
These servers had 2 network boards that I installed and configured so
I would have a "public network" and a "cluster network". That was
when
I was first installing the ceph cluster.
After having some problems with this set up I was advised by members
of this list to not use this dual network setup as it could make
debugging much more difficult. I followed this advice, or at least
tried to.
To make a long story short, ceph was still trying to use the second
network for some OSDs. With a "ceph config rm global cluster_network"
and a general restart of the cluster, everything started working
again.
Thanks for the help and sorry for the confusion.
Regards,
Rodrigo