`cephadm` not deploying OSDs from a storage spec - ceph-users

1 Feb 2021

Hello,

I am trying to set up a test cluster with the cephadm tool on Ubuntu 20.04 nodes.
Following the directions at https://docs.ceph.com/en/octopus/cephadm/install/, I have set
up the monitor and manager on a management node, and added two hosts that I want to use
for storage. All storage devices present on those nodes are included in the output of
`ceph orch device ls`, and all are marked “available”. However, when I try to deploy OSDs
with `ceph orch apply osd -i spec.yml`, following the example for HDD+SSD storage spec at
https://docs.ceph.com/en/latest/cephadm/drivegroups/#the-simple-case, I see the new
service in the output of `ceph orch ls`, but it is not running anywhere (“0/2”), and no
OSDs get created. I am not sure how to debug this, and any pointers would be much
appreciated.

Thank you,

Davor

Output:
```
# ceph orch host ls
INFO:cephadm:Inferring fsid 150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3
INFO:cephadm:Inferring config
/var/lib/ceph/150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3/mon.sps-head/config
INFO:cephadm:Using recent ceph image ceph/ceph:v15
HOST      ADDR      LABELS  STATUS
sps-head  sps-head  mon
sps-st1   sps-st1   mon
sps-st2   sps-st2

# ceph orch device ls
INFO:cephadm:Inferring fsid 150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3
INFO:cephadm:Inferring config
/var/lib/ceph/150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3/mon.sps-head/config
INFO:cephadm:Using recent ceph image ceph/ceph:v15
Hostname  Path          Type  Serial            Size   Health   Ident  Fault  Available
sps-head  /dev/nvme0n1  ssd   S5JXNS0N504446R   1024G  Unknown  N/A    N/A    Yes
sps-st1   /dev/nvme0n1  ssd   S5JXNS0N504948D   1024G  Unknown  N/A    N/A    Yes
sps-st1   /dev/nvme1n1  ssd   S5JXNS0N504958T   1024G  Unknown  N/A    N/A    Yes
sps-st1   /dev/sdb      hdd   5000cca28ed36018  14.0T  Unknown  N/A    N/A    Yes
sps-st1   /dev/sdc      hdd   5000cca28ed353e5  14.0T  Unknown  N/A    N/A    Yes
[…]

# cat /mnt/osd_spec.yml
service_type: osd
service_id: default_drive_group
placement:
    host_pattern: 'sps-st[1-6]'
data_devices:
    rotational: 1
db_devices:
    rotational: 0

[**After running `ceph orch apply osd -i spec.yml`:**]

# ceph orch ls
NAME                     RUNNING  REFRESHED  AGE  PLACEMENT    IMAGE NAME                 
         IMAGE ID
alertmanager                 1/1  9m ago     6h   count:1     
docker.io/prom/alertmanager:v0.20.0  0881eb8f169f
crash                        3/3  9m ago     6h   *            docker.io/ceph/ceph:v15    
         5553b0cb212c
grafana                      1/1  9m ago     6h   count:1     
docker.io/ceph/ceph-grafana:6.6.2    a0dce381714a
mgr                          2/2  9m ago     6h   count:2      docker.io/ceph/ceph:v15    
         5553b0cb212c
mon                          1/2  9m ago     3h   label:mon    docker.io/ceph/ceph:v15    
         5553b0cb212c
node-exporter                0/3  -          -    *            <unknown>            
               <unknown>
osd.default_drive_group      0/2  -          -    sps-st[1-6]  <unknown>            
               <unknown>
prometheus                   1/1  9m ago     6h   count:1     
docker.io/prom/prometheus:v2.18.1    de242295e225

[** I am not sure why neither “osd.default_drive_group” nor “node-exporter” are running
anywhere. How do I check that? **]

# ceph osd tree
INFO:cephadm:Inferring fsid 150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3
INFO:cephadm:Inferring config
/var/lib/ceph/150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3/mon.sps-head/config
INFO:cephadm:Using recent ceph image ceph/ceph:v15
ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default

# ceph orch --version
ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)
```