Ceph (cepadm) quincy: can't add osd from remote nodes. - ceph-users

14 Feb 2023

Hello!

I stuck with a problem, while trying to create cluster of 3 nodes (AWS EC2 instancies):
fa11 ~ # ceph orch host ls
HOST  ADDR           LABELS  STATUS
fa11  172.16.24.67   _admin
fa12  172.16.23.159  _admin
fa13  172.16.25.119  _admin
3 hosts in cluster

Each of them have 2 disks (all accepted by CEPH):
fa11 ~ # ceph orch device ls
HOST  PATH          TYPE  DEVICE ID                                         SIZE 
AVAILABLE  REFRESHED  REJECT REASONS
fa11  /dev/nvme1n1  ssd   Amazon_Elastic_Block_Store_vol016651cf7f3b9c9dd  8589M  Yes     
  7m ago
fa11  /dev/nvme2n1  ssd   Amazon_Elastic_Block_Store_vol034082d7d364dfbdb  5368M  Yes     
  7m ago
fa12  /dev/nvme1n1  ssd   Amazon_Elastic_Block_Store_vol0ec193fa3f77fee66  8589M  Yes     
  3m ago
fa12  /dev/nvme2n1  ssd   Amazon_Elastic_Block_Store_vol018736f7eeab725f5  5368M  Yes     
  3m ago
fa13  /dev/nvme1n1  ssd   Amazon_Elastic_Block_Store_vol0443a031550be1024  8589M  Yes     
  84s ago
fa13  /dev/nvme2n1  ssd   Amazon_Elastic_Block_Store_vol0870412d37717dc2c  5368M  Yes     
  84s ago

fa11 is first host, where from I manage cluster.
Adding OSD from fa11 itself works fine:
fa11 ~ # ceph orch daemon add osd fa11:/dev/nvme1n1
Created osd(s) 0 on host 'fa11'

But it doesn't work for other 2 hosts (it hangs forever):
fa11 ~ # ceph orch daemon add osd fa12:/dev/nvme1n1
^CInterrupted

Logs on fa12 shows that it hangs at following step:
fa12 ~ # tail /var/log/ceph/a9ef6c26-ac38-11ed-9429-06e6bc29c1db/ceph-volume.log
...
[2023-02-14 07:38:20,942][ceph_volume.process][INFO  ] Running command:
/usr/bin/ceph-authtool --gen-print-key
[2023-02-14 07:38:20,964][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph
--cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
a51506c2-e910-4763-9a0c-f6c2194944e2

I'm not sure what might be the reason for this hanging?

*Additional details:
1) cephadm installed, using curl
(https://docs.ceph.com/en/quincy/cephadm/install/#curl-based-installation)
2) I use user "ceph", instead of "root" and port 2222 instead of 22.
First node was bootstrapped, using below command:
cephadm bootstrap --mon-ip 172.16.24.67 --allow-fqdn-hostname --ssh-user ceph 
--ssh-config /home/anton/ceph/ssh_config --cluster-network 172.16.16.0/20
--skip-monitoring-stack

Content of /home/anton/ceph/ssh_config:
fa11 ~ # cat /home/anton/ceph/ssh_config
Host *
  User ceph
  Port 2222
  IdentityFile /home/ceph/.ssh/id_rsa
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null
3) Hosts fa12 and fa13 were added, using commnds:
ceph orch host add fa12.testing.swiftserve.com 172.16.23.159 --labels _admin
ceph orch host add fa13.testing.swiftserve.com 172.16.25.119 --labels _admin

Thanks in advance!
BR/Anton