To run a `ceph orch...` (or really any command to the cluster) you should
first open a shell with `cephadm shell`. That will put you in a bash shell
inside a container that has the ceph packages matching the ceph version in
your cluster. If you just want a single command rather than an interactive
shell, you can also do `cephadm shell -- ceph orch...`. Also, this might
not turn out to be an issue, but just thinking ahead, the devices cephadm
will typically allow you to put an OSD on should match what's output by
`ceph orch device ls` (which is populated by `cephadm ceph-volume --
inventory --format=json-pretty` if you want to look further). So I'd
generally say to always check that before making any OSDs through the
orchestrator. I also generally like to recommend setting up OSDs through
drive group specs (
)
over using `ceph orch daemon add osd...` although that's a tangent to what
you're trying to do now.
On Wed, Nov 29, 2023 at 4:14 PM Francisco Arencibia Quesada <
arencibia.francisco(a)gmail.com> wrote:
Thanks so much Adam, that worked great, however I can
not add any storage
with:
sudo cephadm ceph orch daemon add osd node2-ceph:/dev/nvme1n1
root@node1-ceph:~# ceph status
cluster:
id: 9d8f1112-8ef9-11ee-838e-a74e679f7866
health: HEALTH_WARN
Failed to apply 1 service(s): osd.all-available-devices
2 failed cephadm daemon(s)
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum node1-ceph (age 18m)
mgr: node1-ceph.jitjfd(active, since 17m)
osd: 0 osds: 0 up, 0 in (since 6m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
root@node1-ceph:~#
Regards
On Wed, Nov 29, 2023 at 5:45 PM Adam King <adking(a)redhat.com> wrote:
I think I remember a bug that happened when there
was a small mismatch
between the cephadm version being used for bootstrapping and the container.
In this case, the cephadm binary used for bootstrap knows about the
ceph-exporter service and the container image being used does not. The
ceph-exporter was removed from quincy between 17.2.6 and 17.2.7 so I'd
guess the cephadm binary here is a bit older and it's pulling hte 17.2.7
image. For now, I'd say just workaround this by running bootstrap with
`--skip-monitoring-stack` flag. If you want the other services in the
monitoring stack after bootstrap you can just run `ceph orch apply
<service>` for services alertmanager, prometheus, node-exporter, and
grafana and it would get you in the same spot as if you didn't provide the
flag and weren't hitting the issue.
For an extra note, this failed bootstrap might be leaving things around
that could cause subsequent bootstraps to fail. If you run `cephadm ls` and
see things listed, you can grab the fsid from the output of that command
and run `cephadm rm-cluster --force --fsid <fsid>` to clean up the env
before bootstrapping again.
On Wed, Nov 29, 2023 at 11:32 AM Francisco Arencibia Quesada <
arencibia.francisco(a)gmail.com> wrote:
> Hello guys,
>
> This situation is driving me crazy, I have tried to deploy a ceph
> cluster,
> in all ways possible, even with ansible and at some point it breaks. I'm
> using Ubuntu 22.0.4. This is one of the errors I'm having, some problem
> with ceph-exporter. Please could you help me, I have been dealing with
> this for like 5 days.
> Kind regards
>
> root@node1-ceph:~# cephadm bootstrap --mon-ip 10.0.0.52
> Verifying podman|docker is present...
> Verifying lvm2 is present...
> Verifying time synchronization is in place...
> Unit systemd-timesyncd.service is enabled and running
> Repeating the final host check...
> docker (/usr/bin/docker) is present
> systemctl is present
> lvcreate is present
> Unit systemd-timesyncd.service is enabled and running
> Host looks OK
> Cluster fsid: 4ce3a92a-8ddd-11ee-9b23-6341187f70c1
> Verifying IP 10.0.0.52 port 3300 ...
> Verifying IP 10.0.0.52 port 6789 ...
> Mon IP `10.0.0.52` is in CIDR network `10.0.0.0/24` <http://10.0.0.0/24>
> Mon IP `10.0.0.52` is in CIDR network `10.0.0.0/24` <http://10.0.0.0/24>
> Mon IP `10.0.0.52` is in CIDR network `10.0.0.1/32` <http://10.0.0.1/32>
> Mon IP `10.0.0.52` is in CIDR network `10.0.0.1/32` <http://10.0.0.1/32>
> Internal network (--cluster-network) has not been provided, OSD
> replication
> will default to the public_network
> Pulling container image quay.io/ceph/ceph:v17...
> Ceph version: ceph version 17.2.7
> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
> Extracting ceph user uid/gid from container image...
> Creating initial keys...
> Creating initial monmap...
> Creating mon...
> Waiting for mon to start...
> Waiting for mon...
> mon is available
> Assimilating anything we can from ceph.conf...
> Generating new minimal ceph.conf...
> Restarting the monitor...
> Setting mon public_network to 10.0.0.1/32,10.0.0.0/24
> Wrote config to /etc/ceph/ceph.conf
> Wrote keyring to /etc/ceph/ceph.client.admin.keyring
> Creating mgr...
> Verifying port 9283 ...
> Waiting for mgr to start...
> Waiting for mgr...
> mgr not available, waiting (1/15)...
> mgr not available, waiting (2/15)...
> mgr not available, waiting (3/15)...
> mgr not available, waiting (4/15)...
> mgr not available, waiting (5/15)...
> mgr is available
> Enabling cephadm module...
> Waiting for the mgr to restart...
> Waiting for mgr epoch 5...
> mgr epoch 5 is available
> Setting orchestrator backend to cephadm...
> Generating ssh key...
> Wrote public SSH key to /etc/ceph/ceph.pub
> Adding key to root@localhost authorized_keys...
> Adding host node1-ceph...
> Deploying mon service with default placement...
> Deploying mgr service with default placement...
> Deploying crash service with default placement...
> Deploying ceph-exporter service with default placement...
> Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
> CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=node1-ceph -e
> CEPH_USE_RANDOM_NONCE=1 -v
> /var/log/ceph/4ce3a92a-8ddd-11ee-9b23-6341187f70c1:/var/log/ceph:z -v
> /tmp/ceph-tmp6yz3vt5s:/etc/ceph/ceph.client.admin.keyring:z -v
> /tmp/ceph-tmpfhd01qwu:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch
> apply ceph-exporter
> /usr/bin/ceph: stderr Error EINVAL: Usage:
> /usr/bin/ceph: stderr ceph orch apply -i <yaml spec> [--dry-run]
> /usr/bin/ceph: stderr ceph orch apply <service_type>
> [--placement=<placement_string>] [--unmanaged]
> /usr/bin/ceph: stderr
> Traceback (most recent call last):
> File "/usr/sbin/cephadm", line 9653, in <module>
> main()
> File "/usr/sbin/cephadm", line 9641, in main
> r = ctx.func(ctx)
> File "/usr/sbin/cephadm", line 2205, in _default_image
> return func(ctx)
> File "/usr/sbin/cephadm", line 5774, in command_bootstrap
> prepare_ssh(ctx, cli, wait_for_mgr_restart)
> File "/usr/sbin/cephadm", line 5275, in prepare_ssh
> cli(['orch', 'apply', t])
> File "/usr/sbin/cephadm", line 5708, in cli
> return CephContainer(
> File "/usr/sbin/cephadm", line 4144, in run
> out, _, _ = call_throws(self.ctx, self.run_cmd(),
> File "/usr/sbin/cephadm", line 1853, in call_throws
> raise RuntimeError('Failed command: %s' % ' '.join(command))
> RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
> CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=node1-ceph -e
> CEPH_USE_RANDOM_NONCE=1 -v
> /var/log/ceph/4ce3a92a-8ddd-11ee-9b23-6341187f70c1:/var/log/ceph:z -v
> /tmp/ceph-tmp6yz3vt5s:/etc/ceph/ceph.client.admin.keyring:z -v
> /tmp/ceph-tmpfhd01qwu:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch
> apply ceph-exporter
>
> --
> *Francisco Arencibia Quesada.*
> *DevOps Engineer*
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
--
*Francisco Arencibia Quesada.*
*DevOps Engineer*