September 2023 - ceph-users

by Frank Schilder

Hi all, I replaced a disk in our octopus cluster and it is rebuilding. I noticed that since the replacement there is no scrubbing going on. Apparently, an OSD having a PG in backfill_wait state seems to block deep scrubbing all other PGs on that OSD as well - at least this is how it looks. Some numbers: the pool in question has 8192 PGs with EC 8+3 and ca 850 OSDs. A total of 144 PGs needed backfilling (were remapped after replacing the disk). After about 2 days we are down to 115 backfill_wait + 3 backfilling. It will take a bit more than a week to complete. There is plenty of time and IOP/s available to deep-scrub PGs on the side, but since the backfill started there is zero scrubbing/deep scrubbing going on and "PGs not deep scrubbed in time" messages are piling up. Is there a way to allow (deep) scrub in this situation? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

8 months

2
2
0 0

After power outage, osd do not restart

by Patrick Begou

Hi, After a power outage on my test ceph cluster, 2 osd fail to restart. The log file show: 8e5f-00266cf8869c(a)osd.2.service: Failed with result 'timeout'. Sep 21 11:55:02 mostha1 systemd[1]: Failed to start Ceph osd.2 for 250f9864-0142-11ee-8e5f-00266cf8869c. Sep 21 11:55:12 mostha1 systemd[1]: ceph-250f9864-0142-11ee-8e5f-00266cf8869c(a)osd.2.service: Service RestartSec=10s expired, scheduling restart. Sep 21 11:55:12 mostha1 systemd[1]: ceph-250f9864-0142-11ee-8e5f-00266cf8869c(a)osd.2.service: Scheduled restart job, restart counter is at 2. Sep 21 11:55:12 mostha1 systemd[1]: Stopped Ceph osd.2 for 250f9864-0142-11ee-8e5f-00266cf8869c. Sep 21 11:55:12 mostha1 systemd[1]: ceph-250f9864-0142-11ee-8e5f-00266cf8869c(a)osd.2.service: Found left-over process 1858 (bash) in control group while starting unit. Ignoring. Sep 21 11:55:12 mostha1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Sep 21 11:55:12 mostha1 systemd[1]: ceph-250f9864-0142-11ee-8e5f-00266cf8869c(a)osd.2.service: Found left-over process 2815 (podman) in control group while starting unit. Ignoring. This is not critical as it is a test cluster and it is actually rebalancing on other osd but I would like to know how to return to HEALTH_OK status. Smartctl show the HDD are OK. So is there a way to recover the osd from this state ? Version is 15.2.17 (juste moved from 15.2.13 to 15.2.17 yesterday, will try to move to latest versions as soon as this problem is solved) Thanks Patrick

8 months

3
6
0 0

ceph orch osd data_allocate_fraction does not work

by Boris Behrens

I have a use case where I want to only use a small portion of the disk for the OSD and the documentation states that I can use data_allocation_fraction [1] But cephadm can not use this and throws this error: /usr/bin/podman: stderr ceph-volume lvm batch: error: unrecognized arguments: --data-allocate-fraction 0.1 So, what I actually want to achieve: Split up a single SSD into: 3-5x block.db for spinning disks (5x 320GB or 3x 500GB regarding if I have 8TB HDDs or 16TB HDDs) 1x SSD OSD (100G) for RGW index / meta pools 1x SSD OSD (100G) for RGW gc pool because of this bug [2] My service definition looks like this: service_type: osd service_id: hdd-8tb placement: host_pattern: '*' crush_device_class: hdd spec: data_devices: rotational: 1 size: ':9T' db_devices: rotational: 0 limit: 5 size: '1T:2T' encrypted: true block_db_size: 320000000000 --- service_type: osd service_id: hdd-16tb placement: host_pattern: '*' crush_device_class: hdd spec: data_devices: rotational: 1 size: '14T:' db_devices: rotational: 0 limit: 1 size: '1T:2T' encrypted: true block_db_size: 500000000000 --- service_type: osd service_id: gc placement: host_pattern: '*' crush_device_class: gc spec: data_devices: rotational: 0 size: '1T:2T' encrypted: true data_allocate_fraction: 0.05 --- service_type: osd service_id: ssd placement: host_pattern: '*' crush_device_class: ssd spec: data_devices: rotational: 0 size: '1T:2T' encrypted: true data_allocate_fraction: 0.05 [1] https://docs.ceph.com/en/pacific/cephadm/services/osd/#ceph.deployment.driv… [2] https://tracker.ceph.com/issues/53585 -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

8 months

2
1
0 0

OSD not starting after being mounted with ceph-objectstore-tool --op fuse

by Budai Laszlo

Hello, I have a problem with an OSD not starting after being mounted offline using the ceph-objectstore-tool --op fuse command. The cephadm orch ps now shows me the osd in error state: osd.0 storage1 error 2m ago 5h - 4096M <unknown> <unknown> <unknown> If I'm checkung the logs on the node I can see the following messages in the system journal: Sep 21 10:26:13 storage1 systemd[1]: Started Ceph osd.0 for 82eb0cee-583a-11ee-b10b-abe63a69ab28. Sep 21 10:26:14 storage1 bash[50983]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0 Sep 21 10:26:14 storage1 bash[50983]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-0 --no-mon-config --dev /dev/mapper/ceph--aac54f64--d2a7--42e6> Sep 21 10:26:14 storage1 bash[50983]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--aac54f64--d2a7--42e6--a09d--1373e3524414-osd--block--57cfd62d--ae4d--4cae--8c64--be255837> Sep 21 10:26:14 storage1 bash[50983]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0 Sep 21 10:26:14 storage1 bash[50983]: Running command: /usr/bin/ln -s /dev/mapper/ceph--aac54f64--d2a7--42e6--a09d--1373e3524414-osd--block--57cfd62d--ae4d--4cae--8c64--be25583728fa /var/lib> Sep 21 10:26:14 storage1 bash[50983]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0 Sep 21 10:26:14 storage1 bash[50983]: --> ceph-volume raw activate successful for osd ID: 0 Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.607+0000 7f91c87cd540 0 set uid:gid to 167:167 (ceph:ceph) Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.607+0000 7f91c87cd540 0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process ceph-osd, pid> Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.607+0000 7f91c87cd540 0 pidfile_write: ignore empty --pid-file Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bdev(0x55d79b319400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bdev(0x55d79b319400 /var/lib/ceph/osd/ceph-0/block) open size 10733223936 (0x27fc00000, 10 GiB) block> Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06 Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bdev(0x55d79b318c00 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bdev(0x55d79b318c00 /var/lib/ceph/osd/ceph-0/block) open size 10733223936 (0x27fc00000, 10 GiB) block> Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block size 10 GiB Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.611+0000 7f91c87cd540 1 bdev(0x55d79b318c00 /var/lib/ceph/osd/ceph-0/block) close Sep 21 10:26:14 storage1 bash[51214]: debug 2023-09-21T10:26:14.899+0000 7f91c87cd540 1 bdev(0x55d79b319400 /var/lib/ceph/osd/ceph-0/block) close Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.143+0000 7f91c87cd540 0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.143+0000 7f91c87cd540 -1 Falling back to public interface Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.175+0000 7f91c87cd540 0 load: jerasure load: lrc Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.175+0000 7f91c87cd540 1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.179+0000 7f91c87cd540 -1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.179+0000 7f91c87cd540 1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.179+0000 7f91c87cd540 -1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.179+0000 7f91c87cd540 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 63.00 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.179+0000 7f91c87cd540 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0114000 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.179+0000 7f91c87cd540 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000026 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 63.00 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 0 osd.0:0.OSDShard using op scheduler mClockScheduler Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 -1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 63.00 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0114000 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000026 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 0 osd.0:1.OSDShard using op scheduler mClockScheduler Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 -1 bdev(0x55d79c120000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 63.00 Sep 21 10:26:15 storage1 bash[51214]: debug 2023-09-21T10:26:15.183+0000 7f91c87cd540 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0114000 Sep 21 10:26:15 storage1 systemd[1]: ceph-82eb0cee-583a-11ee-b10b-abe63a69ab28(a)osd.0.service: Main process exited, code=exited, status=1/FAILURE Sep 21 10:26:16 storage1 systemd[1]: ceph-82eb0cee-583a-11ee-b10b-abe63a69ab28(a)osd.0.service: Failed with result 'exit-code'. Any idea how to bring the osd back into the cluster? Thank you, Laszlo

8 months

1
0
0 0

Error adding OSD

by Budai Laszlo

Hi all, I am trying to add an OSD using cephadm but it fails with the message found below. Do you have any ide what may be wrong? The given device used to be in the cluster but it has been removed, and now the device appears as available in the `ceph orch device ls`. Thank you, Laszlo root@monitor1:~# ceph orch device ls| grep storage3 storage3 /dev/sdb hdd ATA_QEMU_HARDDISK_QM00002 10.7G No 15m ago Insufficient space (<10 extents) on vgs, LVM detected, locked storage3 /dev/sdc hdd ATA_QEMU_HARDDISK_QM00003 10.7G Yes 15m ago storage3 /dev/sdd hdd ATA_QEMU_HARDDISK_QM00004 8589M No 15m ago locked root@monitor1:~# Here it is the error: Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 843, in _daemon_add_osd raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception raise e RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/config/ceph.conf Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232 -e NODE_NAME=storage3 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8:/var/run/ceph:z -v /var/log/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8:/var/log/ceph:z -v /var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmp4r8kteec:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmppw___l6k:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232 lvm batch --no-auto /dev/sdc --yes --no-systemd /usr/bin/docker: stderr --> passed data devices: 1 physical, 0 LVM /usr/bin/docker: stderr --> relative data size: 1.0 /usr/bin/docker: stderr Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d90dcdee-035c-4f3c-80f6-5d3eed25d598 /usr/bin/docker: stderr Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgcreate --force --yes ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e /dev/sdc /usr/bin/docker: stderr stdout: Physical volume "/dev/sdc" successfully created. /usr/bin/docker: stderr stdout: Volume group "ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e" successfully created /usr/bin/docker: stderr Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 2559 -n osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598 ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e /usr/bin/docker: stderr stdout: Logical volume "osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598" created. /usr/bin/docker: stderr Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: stderr Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-6 /usr/bin/docker: stderr Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e/osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598 /usr/bin/docker: stderr Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1 /usr/bin/docker: stderr Running command: /usr/bin/ln -s /dev/ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e/osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598 /var/lib/ceph/osd/ceph-6/block /usr/bin/docker: stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-6/activate.monmap /usr/bin/docker: stderr stderr: got monmap epoch 3 /usr/bin/docker: stderr --> Creating keyring file for osd.6 /usr/bin/docker: stderr Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-6/keyring /usr/bin/docker: stderr Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-6/ /usr/bin/docker: stderr Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 6 --monmap /var/lib/ceph/osd/ceph-6/activate.monmap --keyfile - --osdspec-affinity None --osd-data /var/lib/ceph/osd/ceph-6/ --osd-uuid d90dcdee-035c-4f3c-80f6-5d3eed25d598 --setuser ceph --setgroup ceph /usr/bin/docker: stderr stderr: 2023-09-20T19:34:34.916+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6/) _read_fsid unparsable uuid /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.152+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.152+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.152+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.156+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.156+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.156+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.712+0000 7f77400f1540 -1 bluestore(/var/lib/ceph/osd/ceph-6//block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-6//block: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.712+0000 7f77400f1540 -1 bdev(0x563bb89db400 /var/lib/ceph/osd/ceph-6//block) open open got: (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.712+0000 7f77400f1540 -1 OSD::mkfs: ObjectStore::mkfs failed with error (13) Permission denied /usr/bin/docker: stderr stderr: 2023-09-20T19:34:35.712+0000 7f77400f1540 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-6/: (13) Permission denied /usr/bin/docker: stderr --> Was unable to complete a new OSD, will rollback changes /usr/bin/docker: stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.6 --yes-i-really-mean-it /usr/bin/docker: stderr stderr: purged osd.6 /usr/bin/docker: stderr --> Zapping: /dev/ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e/osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598 /usr/bin/docker: stderr --> Unmounting /var/lib/ceph/osd/ceph-6 /usr/bin/docker: stderr Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-6 /usr/bin/docker: stderr stderr: umount: /var/lib/ceph/osd/ceph-6 unmounted /usr/bin/docker: stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e/osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598 bs=1M count=10 conv=fsync /usr/bin/docker: stderr --> Only 1 LV left in VG, will proceed to destroy volume group ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e /usr/bin/docker: stderr Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e /usr/bin/docker: stderr stderr: Removing ceph--cf156193--5f39--4bfd--91c0--4e1d50fe0e4e-osd--block--d90dcdee--035c--4f3c--80f6--5d3eed25d598 (253:1) /usr/bin/docker: stderr stderr: Archiving volume group "ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e" metadata (seqno 5). /usr/bin/docker: stderr Releasing logical volume "osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598" /usr/bin/docker: stderr stderr: Creating volume group backup "/etc/lvm/backup/ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e" (seqno 6). /usr/bin/docker: stderr stdout: Logical volume "osd-block-d90dcdee-035c-4f3c-80f6-5d3eed25d598" successfully removed /usr/bin/docker: stderr stderr: Removing physical volume "/dev/sdc" from volume group "ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e" /usr/bin/docker: stderr stdout: Volume group "ceph-cf156193-5f39-4bfd-91c0-4e1d50fe0e4e" successfully removed /usr/bin/docker: stderr Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc /usr/bin/docker: stderr stdout: Labels on physical volume "/dev/sdc" successfully wiped. /usr/bin/docker: stderr --> Zapping successful for OSD: 6 /usr/bin/docker: stderr Traceback (most recent call last): /usr/bin/docker: stderr File "/usr/sbin/ceph-volume", line 11, in <module> /usr/bin/docker: stderr load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')() /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__ /usr/bin/docker: stderr self.main(self.argv) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc /usr/bin/docker: stderr return f(*a, **kw) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main /usr/bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch /usr/bin/docker: stderr instance.main() /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main /usr/bin/docker: stderr terminal.dispatch(self.mapper, self.argv) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch /usr/bin/docker: stderr instance.main() /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root /usr/bin/docker: stderr return func(*a, **kw) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 441, in main /usr/bin/docker: stderr self._execute(plan) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 460, in _execute /usr/bin/docker: stderr c.create(argparse.Namespace(**args)) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root /usr/bin/docker: stderr return func(*a, **kw) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", line 26, in create /usr/bin/docker: stderr prepare_step.safe_prepare(args) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 252, in safe_prepare /usr/bin/docker: stderr self.prepare() /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root /usr/bin/docker: stderr return func(*a, **kw) /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 394, in prepare /usr/bin/docker: stderr osd_fsid, /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 119, in prepare_bluestore /usr/bin/docker: stderr db=db /usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 484, in osd_mkfs_bluestore /usr/bin/docker: stderr raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command))) /usr/bin/docker: stderr RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 6 --monmap /var/lib/ceph/osd/ceph-6/activate.monmap --keyfile - --osdspec-affinity None --osd-data /var/lib/ceph/osd/ceph-6/ --osd-uuid d90dcdee-035c-4f3c-80f6-5d3eed25d598 --setuser ceph --setgroup ceph Traceback (most recent call last): File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 9653, in <module> main() File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 9641, in main r = ctx.func(ctx) File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 2153, in _infer_config return func(ctx) File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 2069, in _infer_fsid return func(ctx) File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 2181, in _infer_image return func(ctx) File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 2056, in _validate_fsid return func(ctx) File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 6254, in command_ceph_volume out, err, code = call_throws(ctx, c.run_cmd(), verbosity=CallVerbosity.QUIET_UNLESS_ERROR) File "/var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/cephadm.7ab03136237675497d535fb1b85d1d0f95bbe5b95f32cd4e6f3ca71a9f97bf3c", line 1853, in call_throws raise RuntimeError('Failed command: %s' % ' '.join(command)) RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232 -e NODE_NAME=storage3 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8:/var/run/ceph:z -v /var/log/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8:/var/log/ceph:z -v /var/lib/ceph/314d068c-56ee-11ee-87e2-cd6d389cbfb8/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmp4r8kteec:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmppw___l6k:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232 lvm batch --no-auto /dev/sdc --yes --no-systemd

8 months

1
0
0 0

S3website range requests - possible issue

by Ondřej Kukla

Hello, In our deployment we are using the mix of s3 and s3website RGW. I’ve noticed strange behaviour when sending range requests to the s3website RGWs that I’m not able to replicate on the s3 ones. I’ve created a simple wrk LUA script to test sending range requests on tiny ranges so the issue is easily seen. When sending these requests against s3 RGW I can see that the amount of data read from Ceph is ± equivalent to what the RGW sends to the client. This change very dramatically when I’m doing the same test against s3website RGW. The read from Ceph is huge (3Gb/s compared to ~22Mb/s on s3 RGW) I seems to me like the RGW is reading the whole files and then sending just the range which is different then what s3 does. I do not understand why would s3website need to read that much from Ceph and I believe this is a bug - I was looking through the tracker and wasn’t able to find anything related to s3website and range requests. Did anyone else noticed this issue? You can replicate it by running this wrk command wrk -t56 -c500 -d5m http://${rgwipaddress}:8080/${bucket}/videos/ -s wrk-range-small.lua wrk script -- Initialize the pseudo random number generator math.randomseed( os.time()) math.random(); math.random(); math.random() i = 1 function request() if i == 8 then i = 1 end local nrangefrom = math.random() local nrangeto = math.random(100) local path = wrk.path url = path..i..".mp4" wrk.headers["Range"] = nrangefrom.."-"..nrangeto i = i+1 return wrk.format(nil, url) end Kind regards, Ondrej

8 months

1
1
0 0

cephfs mount 'stalls'

by Marc

I am still on nautilus and some clients are still on centos7 which mount the cephfs. These mounts stall at some point. Currently I am mounting with something like this in the fstab. id=cephfsclientid,client_mountpoint=/cephfs/test /mnt/test fuse.ceph noauto,_netdev,noatime,x-systemd.device-timeout=30,x-systemd.mount-timeout=30,x-systemd.automount,x-systemd.idle-timeout=30 0 0 When the mount stalls I am fixing it with a umount -l, but it would be nicer of course when it would not behave like this. Can this be fixed on el7 and Nautilus, like with different mount options or so?

8 months

3
4
0 0

Re: S3website range requests - possible issue

by Ondřej Kukla

I was checking the tracker again and I found already fixed issue that seems to be connected with this issue. https://tracker.ceph.com/issues/44508 Here is the PR that fixes it https://github.com/ceph/ceph/pull/33807 What I’m still not understanding is why this is only happening when using s3website api. Is there someone who could shed some light on this? Regards, Ondrej

8 months

1
0
0 0

libceph: mds1 IP+PORT wrong peer at address

by Frank Schilder

Hi all, we seem to have hit a bug in the ceph fs kernel client and I just want to confirm what action to take. We get the error "wrong peer at address" in dmesg and some jobs on that server seem to get stuck in fs access; log extract below. I found these 2 tracker items https://tracker.ceph.com/issues/23883 and https://tracker.ceph.com/issues/41519, which don't seem to have fixes. My questions: - Is this harmless or does it indicate invalid/corrupted client cache entries? - How to resolve, ignore, umount+mount or reboot? Here an extract from the dmesg log, the error has survived a couple of MDS restarts already: [Mon Mar 6 12:56:46 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:05:18 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386 [Mon Mar 6 13:05:18 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:13:50 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386 [Mon Mar 6 13:13:50 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN) [Mon Mar 6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN) [Mon Mar 6 13:16:45 2023] ceph: mds1 reconnect start [Mon Mar 6 13:16:45 2023] ceph: mds1 reconnect start [Mon Mar 6 13:16:48 2023] ceph: mds1 reconnect success [Mon Mar 6 13:16:48 2023] ceph: mds1 reconnect success [Mon Mar 6 13:18:13 2023] ceph: update_snap_trace error -22 [Mon Mar 6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN) [Mon Mar 6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN) [Mon Mar 6 13:18:23 2023] ceph: mds1 recovery completed [Mon Mar 6 13:18:23 2023] ceph: mds1 recovery completed [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect start [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect start [Mon Mar 6 13:18:28 2023] ceph: mds7 reconnect success [Mon Mar 6 13:18:29 2023] ceph: mds7 reconnect success [Mon Mar 6 13:18:35 2023] ceph: update_snap_trace error -22 [Mon Mar 6 13:18:35 2023] ceph: mds7 recovery completed [Mon Mar 6 13:18:35 2023] ceph: mds7 recovery completed [Mon Mar 6 13:22:22 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Mon Mar 6 13:22:22 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Mon Mar 6 13:30:54 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [...] [Thu Mar 9 09:37:24 2023] slurm.epilog.cl (31457): drop_caches: 3 [Thu Mar 9 09:38:26 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:38:26 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 09:46:58 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:46:58 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 09:55:30 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 09:55:30 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address [Thu Mar 9 10:04:02 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347 [Thu Mar 9 10:04:02 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

8 months

4
6
0 0

No snap_schedule module in Octopus

by Patrick Begou

Hi, I'm working on a small POC for a ceph setup on 4 old C6100 power-edge. I had to install Octopus since latest versions were unable to detect the HDD (too old hardware ??). No matter, this is only for training and understanding Ceph environment. My installation is based on https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noar… bootstrapped. I'm reaching the point to automate the snapshots (I can create snapshot by hand without any problem). The documentation https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noar… says to use the snap_schedule module but this module does not exist. # ceph mgr module ls | jq -r '.enabled_modules []' cephadm dashboard iostat prometheus restful Have I missed something ? Is there some additional install steps to do for this module ? Thanks for your help. Patrick

8 months

3
8
0 0

2024

2023

2022

2021

2020

2019

ceph-users September 2023