March 2023 - ceph-users - lists.ceph.io

Re: Upgrade from 16.2.7. to 16.2.11 failing on OSDs

by Lo Re Giuseppe

To add, on this, the issue seemed related to a process (ceph-volume) which was doing check operations on all devices. The systemctl osd service was timing out because of that and the osd daemon was going into error state. We noticed that version 17.2.5 had a change related to ceph-volume, in particular https://tracker.ceph.com/issues/57627. We decided to skip 16.2.11 and jump to 17.2.5. This second attempt went well, so the issue is now solved. Note: the upgrade 16.2.7 -> 16.2.11 went smoothly in a TDS cluster with identical OS/software, but much smaller, 3 nodes with a couple of disk each, so the issue seems really to be about the number of devices and nodes. Regards, Giuseppe On 30.03.23, 16:56, "Lo Re Giuseppe" <giuseppe.lore(a)cscs.ch <mailto:giuseppe.lore@cscs.ch>> wrote: Dear all, On one of our clusters I started the upgrade process from 16.2.7 to 16.2.11. Mon and mgr and crash processes were done easily/quickly, then at the first attempt of upgrading a OSD container the upgrade process stopped because of the OSD process is not able to start after the upgrade. Does anyone have any hint on how to unblock the upgrade? Some details below: Regards, Giuseppe I started the upgrade process with the cephadm command: “”” [root@naret-monitor01 ~]# ceph orch upgrade start --ceph-version 16.2.11 Initiating upgrade to quay.io/ceph/ceph:v16.2.11 “”” After a short time: “”” [root@naret-monitor01 ~]# ceph orch upgrade status { "target_image": quay.io/ceph/ceph@sha256:1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add<mailto:quay.io/ceph/ceph@sha256:1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add>, "in_progress": true, "which": "Upgrading all daemon types on all hosts", "services_complete": [ "crash", "mon", "mgr" ], "progress": "64/2039 daemons upgraded", "message": "Error: UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.4 on host naret-osd01 failed.", "is_paused": true } “”” The ceph health command reports: “”” [root@naret-monitor01 ~]# ceph health detail HEALTH_WARN 1 failed cephadm daemon(s); 1 osds down; Degraded data redundancy: 2654362/6721382840 objects degraded (0.039%), 14 pgs degraded, 14 pgs undersized; Upgrading daemon osd.4 on host naret-osd01 failed. [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon osd.22 on naret-osd01 is in error state [WRN] OSD_DOWN: 1 osds down osd.4 (root=default,host=naret-osd01) is down [WRN] PG_DEGRADED: Degraded data redundancy: 2654362/6721382840 objects degraded (0.039%), 14 pgs degraded, 14 pgs undersized pg 28.88 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1373,1337,1508,852,2147483647,483] pg 28.528 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1063,793,2147483647,931,338,1777] pg 28.594 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1208,891,1651,364,2147483647,53] pg 28.8b4 is stuck undersized for 6m, current state active+undersized+degraded, last acting [521,1273,1238,138,1539,2147483647] pg 28.a90 is stuck undersized for 6m, current state active+undersized+degraded, last acting [237,1665,1836,2147483647,192,1410] pg 28.ad6 is stuck undersized for 6m, current state active+undersized+degraded, last acting [870,466,350,885,1601,2147483647] pg 28.b34 is stuck undersized for 6m, current state active+undersized+degraded, last acting [920,1596,2147483647,115,201,941] pg 28.c14 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1389,424,2147483647,268,1646,632] pg 28.dba is stuck undersized for 6m, current state active+undersized+degraded, last acting [1099,561,2147483647,1806,1874,1145] pg 28.ee2 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1621,1904,1044,2147483647,1545,722] pg 29.163 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1883,2147483647,1509,1697,1187,235] pg 29.1c1 is stuck undersized for 6m, current state active+undersized+degraded, last acting [122,1226,962,1254,1215,2147483647] pg 29.254 is stuck undersized for 6m, current state active+undersized+degraded, last acting [1782,1839,1545,412,196,2147483647] pg 29.2a1 is stuck undersized for 6m, current state active+undersized+degraded, last acting [370,2147483647,575,1423,1755,446] [WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.4 on host naret-osd01 failed. Upgrade daemon: osd.4: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.4 ... Non-zero exit code 1 from systemctl start ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4> systemctl: stderr Job for ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice> failed because a timeout was exceeded. systemctl: stderr See "systemctl status ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice>" and "journalctl -xe" for details. Traceback (most recent call last): File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 9248, in <module> main() File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 9236, in main r = ctx.func(ctx) File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 1990, in _default_image return func(ctx) File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 5041, in command_deploy ports=daemon_ports) File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 2952, in deploy_daemon c, osd_fsid=osd_fsid, ports=ports) File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 3197, in deploy_daemon_units call_throws(ctx, ['systemctl', 'start', unit_name]) File "/var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2", line 1657, in call_throws raise RuntimeError(f'Failed command: {" ".join(command)}: {s}') RuntimeError: Failed command: systemctl start ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4>: Job for ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice> failed because a timeout was exceeded. See "systemctl status ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice>" and "journalctl -xe" for details. “”” On the OSD server we have: “”” [root@naret-osd01 ~]# uname -a Linux naret-osd01 4.18.0-425.10.1.el8_7.x86_64 #1 SMP Wed Dec 14 16:00:01 EST 2022 x86_64 x86_64 x86_64 GNU/Linux [root@naret-osd01 ~]# podman -v podman version 4.2.0 [root@naret-osd01 ~]# ceph -v ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) [root@naret-osd01 ~]# cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="8.7 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.7" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.7 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos" HOME_URL=https://www.redhat.com/ <https://www.redhat.com/> DOCUMENTATION_URL=https://access.redhat.com/documentation/red_hat_enterpris… <https://access.redhat.com/documentation/red_hat_enterprise_linux/8/> BUG_REPORT_URL=https://bugzilla.redhat.com/ <https://bugzilla.redhat.com/> REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_BUGZILLA_PRODUCT_VERSION=8.7 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.7" “”” Systemctl says: “”” systemctl status ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice> … ● ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice> - Ceph osd.4 for 63334166-d991-11eb-99de-40a6b72108d0 Loaded: loaded (/etc/systemd/system/ceph-63334166-d991-11eb-99de-40a6b72108d0@.service<mailto:/etc/systemd/system/ceph-63334166-d991-11eb-99de-40a6b72108d0@.service>; enabled; vendor preset: disabled) Active: failed (Result: timeout) since Mon 2023-03-27 15:34:29 CEST; 6min ago Process: 730621 ExecStopPost=/bin/rm -f /run/ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-pid<mailto:/run/ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-pid> /run/ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-cid<mailto:/run/ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-cid> (code=exited, status=0/SUCCESS) Process: 730209 ExecStopPost=/bin/bash /var/lib/ceph/63334166-d991-11eb-99de-40a6b72108d0/osd.4/unit.poststop (code=exited, status=0/SUCCESS) Process: 710355 ExecStartPre=/bin/rm -f /run/ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-pid<mailto:/run/ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-pid> /run/ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-cid<mailto:/run/ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-cid> (code=exited, status=0/SUCCESS) Main PID: 23025 (code=exited, status=0/SUCCESS) Tasks: 62 (limit: 1647878) Memory: 961.8M CGroup: /system.slice/system-ceph\x2d63334166\x2dd991\x2d11eb\x2d99de\x2d40a6b72108d0.slice/ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice ├─libpod-payload-b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e │ └─754976 /usr/bin/ceph-osd -n osd.4 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug └─runtime └─754965 /usr/bin/conmon --api-version 1 -c b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e -u b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e/userdata -p /run/containers/storage/overlay-containers/b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e/userdata/pidfile -n ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4 --exit-dir /run/libpod/exits --full-attach -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e/userdata/oci-log --conmon-pidfile /run/ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-pid<mailto:/run/ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice-pid> --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg cni --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg b4f0ebebdfec38942b614756b6329b04d2939db29a0a9823e314b848680bc58e Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.886+0000 7f52e6ae1700 1 osd.4 pg_epoch: 821628 pg[28.dbas2( v 821618'4657799 (819107'4647770,821618'4657799] local-lis/les=749842/749843 n=239290 ec=130297/130290 lis/c=821623/749842 les/c/f=821624/749843/0 sis=821628 pruub=7.751406670s) [1099,561,4,1806,1874,1145]p1099(0) r=2 lpr=821628 pi=[749842,821628)/1 crt=821618'4657799 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.039081573s@ mbc={} ps=[4~6]] state<Start>: transitioning to Stray Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.886+0000 7f52e8ae5700 1 osd.4 pg_epoch: 821628 pg[29.163s1( v 821572'139334 (776804'129273,821572'139334] local-lis/les=749851/749852 n=65683 ec=130801/130801 lis/c=821623/749851 les/c/f=821624/749852/0 sis=821628 pruub=8.023463249s) [1883,4,1509,1697,1187,235]p1883(0) r=1 lpr=821628 pi=[749851,821628)/1 crt=821572'139334 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.311203003s@ mbc={}] start_peering_interval up [1883,4,1509,1697,1187,235] -> [1883,4,1509,1697,1187,235], acting [1883,2147483647,1509,1697,1187,235] -> [1883,4,1509,1697,1187,235], acting_primary 1883(0) -> 1883, up_primary 1883(0) -> 1883, role -1 -> 1, features acting 4540138297136906239 upacting 4540138297136906239 Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.886+0000 7f52e72e2700 1 osd.4 pg_epoch: 821628 pg[29.2a1s1( v 821500'140649 (776804'130601,821500'140649] local-lis/les=749849/749850 n=65848 ec=130801/130801 lis/c=821623/749849 les/c/f=821624/749850/0 sis=821628 pruub=7.845988274s) [370,4,575,1423,1755,446]p370(0) r=1 lpr=821628 pi=[749849,821628)/1 crt=821500'140649 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.133728981s@ mbc={}] start_peering_interval up [370,4,575,1423,1755,446] -> [370,4,575,1423,1755,446], acting [370,2147483647,575,1423,1755,446] -> [370,4,575,1423,1755,446], acting_primary 370(0) -> 370, up_primary 370(0) -> 370, role -1 -> 1, features acting 4540138297136906239 upacting 4540138297136906239 Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.887+0000 7f52e8ae5700 1 osd.4 pg_epoch: 821628 pg[29.163s1( v 821572'139334 (776804'129273,821572'139334] local-lis/les=749851/749852 n=65683 ec=130801/130801 lis/c=821623/749851 les/c/f=821624/749852/0 sis=821628 pruub=8.023443222s) [1883,4,1509,1697,1187,235]p1883(0) r=1 lpr=821628 pi=[749851,821628)/1 crt=821572'139334 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.311203003s@ mbc={}] state<Start>: transitioning to Stray Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.887+0000 7f52e72e2700 1 osd.4 pg_epoch: 821628 pg[29.2a1s1( v 821500'140649 (776804'130601,821500'140649] local-lis/les=749849/749850 n=65848 ec=130801/130801 lis/c=821623/749849 les/c/f=821624/749850/0 sis=821628 pruub=7.845966339s) [370,4,575,1423,1755,446]p370(0) r=1 lpr=821628 pi=[749849,821628)/1 crt=821500'140649 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.133728981s@ mbc={}] state<Start>: transitioning to Stray Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.887+0000 7f52e72e2700 1 osd.4 pg_epoch: 821628 pg[28.8b4s5( v 821618'2906095 (817032'2896088,821618'2906095] local-lis/les=749842/749843 n=239377 ec=130295/130290 lis/c=821623/749842 les/c/f=821624/749843/0 sis=821628 pruub=8.158309937s) [521,1273,1238,138,1539,4]p521(0) r=5 lpr=821628 pi=[749842,821628)/1 crt=821618'2906095 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.446221352s@ mbc={} ps=[4~6]] start_peering_interval up [521,1273,1238,138,1539,4] -> [521,1273,1238,138,1539,4], acting [521,1273,1238,138,1539,2147483647] -> [521,1273,1238,138,1539,4], acting_primary 521(0) -> 521, up_primary 521(0) -> 521, role -1 -> 5, features acting 4540138297136906239 upacting 4540138297136906239 Mar 27 15:36:56 naret-osd01 ceph-63334166-d991-11eb-99de-40a6b72108d0-osd-4[754965]: debug 2023-03-27T13:36:56.887+0000 7f52e72e2700 1 osd.4 pg_epoch: 821628 pg[28.8b4s5( v 821618'2906095 (817032'2896088,821618'2906095] local-lis/les=749842/749843 n=239377 ec=130295/130290 lis/c=821623/749842 les/c/f=821624/749843/0 sis=821628 pruub=8.158291817s) [521,1273,1238,138,1539,4]p521(0) r=5 lpr=821628 pi=[749842,821628)/1 crt=821618'2906095 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 12.446221352s@ mbc={} ps=[4~6]] state<Start>: transitioning to Stray Mar 27 15:39:36 naret-osd01 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice>: Start request repeated too quickly. Mar 27 15:39:36 naret-osd01 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0(a)osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice<mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv <mailto:ceph-63334166-d991-11eb-99de-40a6b72108d0@osd.4.serv>ice>: Failed with result 'timeout'. Mar 27 15:39:36 naret-osd01 systemd[1]: Failed to start Ceph osd.4 for 63334166-d991-11eb-99de-40a6b72108d0. “”” _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io <mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>

1 year, 1 month

1
0
0 0

Re: cephadm automatic sizing of WAL/DB on SSD

by Calhoun, Patrick

I think that the backported fix for this issue made it into ceph v16.2.11. https://ceph.io/en/news/blog/2023/v16-2-11-pacific-released/ "ceph-volume: Pacific backports (pr#47413, Guillaume Abrioux, Zack Cerza, Arthur Outhenin-Chalandre)" https://github.com/ceph/ceph/pull/47413/commits/4252cc44211f0ccebf388374744… -Patrick

1 year, 1 month

1
0
0 0

ceph osd new: possible inconsistency whether UUID is a mandatory argument

by Oliver Schmidt

Hi everyone, I discovered a documentation inconsistency in Ceph Nautilus and would like to know whether this is still the case in the latest ceph release before reporting a bug. Unfortunately, I only have access to a Nautilus cluster right now. The quincy docs state [1]: > Create the OSD. If no UUID is given, it will be set automatically when the OSD starts up. The following command will output the OSD number, which you will need for subsequent steps: > >ceph osd create [{uuid} [{id}]] But the man pages [2] state that `ceph osd create` is deprecated in favour of `ceph osd new {<uuid>} {<id>} -i {<params.json>}`, with both uuid and id still being marked as optional parameters. But when actually running `ceph osd new` without a specified UUID, I get ``` Invalid command: missing required parameter uuid(<uuid>) osd new <uuid> {<osdname (id|osd.id)>} : Create a new OSD. If supplied, the `id` to be replaced needs to exist and have been previously destroyed. Reads secrets from JSON file via `-i <file>` (see man page). Error EINVAL: invalid command ``` under Nautilus. Is this still the case under Quincy, can someone reproduce this for me? Best regards Oliver Schmidt [1] https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-osds/ [2] https://docs.ceph.com/en/quincy/man/8/ceph/#osd -- Oliver Schmidt · os(a)flyingcircus.io · Systems Engineer Flying Circus Internet Operations GmbH · http://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

1 year, 1 month

1
0
0 0

5 host setup with NVMe's and HDDs

by Tino Todino

Hi folks. Just looking for some up to date advice please from the collective on how best to set up CEPH on 5 Proxmox hosts each configured with the following: AMD Ryzen 7 5800X CPU 64GB RAM 2x SSD (as ZFS boot disk for Proxmox) 1x 500GB NVMe for DB/WAL 1x 1TB NVMe as an OSD 1x 16TB SATA HDD as an OSD 2x 10GB NIC (One for Public and one for Cluster networks) 1 GB NIC for management interface The CEPH solution will be used primarily for storage of another Proxmox cluster's virtual machines and their data. We'd like a fast pool using the NVMe's for critical VMs and a slower HDD based pool for VM's that don't require such fast disk access and perhaps require more storage capacity. To expand in the future we will probably add more hosts in the same sort of configuration and/or replace NVMe/HDDs OSDs with more capacious ones. Ideas for configuration welcome please. Many thanks Tino Coastsense Ltd This E-mail is intended solely for the person or organisation to which it is addressed. It may contain privileged or confidential information and, if you are not the intended recipient, you must not copy, distribute or take any action in reliance upon it. Any views or opinions presented are solely those of the author and do not necessarily represent those of Marlan Maritime Technologies Ltd. If you have received this E-mail in error, please notify us as soon as possible and delete it from your computer. Marlan Maritime Technologies Ltd Registered in England & Wales 323 Mariners House, Norfolk Street, Liverpool. L1 0BG Company No. 08492427.

1 year, 1 month

2
1
0 0

orphan multipart objects in Ceph cluster

by Ramin Najjarbashi

I hope this email finds you well. I wanted to share a recent experience I had with our Ceph cluster and get your feedback on a solution I came up with. Recently, we had some orphan objects stuck in our cluster that were not visible by any client like s3cmd, boto3, and mc. This caused some confusion for our users, as the sum of all objects in their buckets was much less than what we showed in the panel. We made some adjustments for them, but the issue persisted. As we have billions of objects in our cluster, using normal tools to find orphans was impossible. So, I came up with a tricky way to handle the situation. I created a bash script that identifies and removes the orphan objects using radosgw-admin and rados commands. Here is the script: https://gist.github.com/RaminNietzsche/b9baa06b69fc5f56d907f3c953769182 I am hoping to get some feedback from the community on this solution. Have any of you faced similar challenges with orphan objects in your Ceph clusters? Do you have any suggestions or improvements for my script? Thank you for your time and help.

1 year, 1 month

2
1
0 0

osd_mclock_max_capacity_iops_ssd && multiple osd by nvme ?

by DERUMIER, Alexandre

Hi, I would like to advise to correctly tune osd_mclock_max_capacity_iops_ssd when you have multiple osd by nvme ? Does it need simply to divide the total iops of the nvme by number of osd ? But maybe it'll impact performance if more read/write are done on one of the osd ? I really don't known if we still need multiple osd by nvme in quincy ? Maybe it could be easier with a simple osd ? Regards, Alexandre

1 year, 1 month

1
0
0 0

RGW can't create bucket

by kamil.madac＠gmail.com

Hi, One of my customers had a correctly working RGW cluster with two zones in one zonegroup and since a few days ago users are not able to create buckets and are always getting Access denied. Working with existing buckets works (like listing/putting objects into existing bucket). The only operation which is not working is bucket creation. We also tried to create a new user, but the behavior is the same, and he is not able to create the bucket. We tried s3cmd, python script with boto library and also Dashboard as admin user. We are always getting Access Denied. Zones are in-sync. Has anyone experienced such behavior? Thanks in advance, here are some outputs: $ s3cmd -c .s3cfg_python_client mb s3://test ERROR: Access to bucket 'test' was denied ERROR: S3 error: 403 (AccessDenied) Zones are in-sync: Primary cluster: # radosgw-admin sync status realm 5429b434-6d43-4a18-8f19-a5720a89c621 (solargis-prod) zonegroup 00e4b3ff-1da8-4a86-9f52-4300c6d0f149 (solargis-prod-ba) zone 6067eec6-a930-45c7-af7d-a7ef2785a2d7 (solargis-prod-ba-dc) metadata sync no sync (zone is master) data sync source: e84fd242-dbae-466c-b4d9-545990590995 (solargis-prod-ba-hq) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source Secondary cluster: # radosgw-admin sync status realm 5429b434-6d43-4a18-8f19-a5720a89c621 (solargis-prod) zonegroup 00e4b3ff-1da8-4a86-9f52-4300c6d0f149 (solargis-prod-ba) zone e84fd242-dbae-466c-b4d9-545990590995 (solargis-prod-ba-hq) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 6067eec6-a930-45c7-af7d-a7ef2785a2d7 (solargis-prod-ba-dc) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source

1 year, 1 month

1
0
0 0

Adding new server to existing ceph cluster - with separate block.db on NVME

by Robert W. Eckert

Hi, I am trying to add a new server to an existing cluster, but cannot get the OSDs to create correctly When I try Cephadm ceph-volume lvm create, it returns nothing but the container info. [root@hiho ~]# cephadm ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p3 Inferring fsid fe3a7cb0-69ca-11eb-8d45-c86000d08867 Using ceph image with id 'cc65afd6173a' and tag '<none>' created on 2022-10-17 23:41:41 +0000 UTC quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346<mailto:quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346> so I tried cephadm shell, and ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p3 Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 41dafd4d-0579-4119-acca-6db31586a10f stderr: 2023-03-28T03:32:27.436+0000 7fa5d6253700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2023-03-28T03:32:27.436+0000 7fa5d6253700 -1 AuthRegistry(0x7fa5d0060d70) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2023-03-28T03:32:27.436+0000 7fa5d6253700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2023-03-28T03:32:27.436+0000 7fa5d6253700 -1 AuthRegistry(0x7fa5d0063da0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2023-03-28T03:32:27.437+0000 7fa5d6253700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2023-03-28T03:32:27.437+0000 7fa5d6253700 -1 AuthRegistry(0x7fa5d6251ea0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2023-03-28T03:32:27.451+0000 7fa5ceffd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2023-03-28T03:32:27.453+0000 7fa5cf7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2023-03-28T03:32:27.473+0000 7fa5cffff700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2023-03-28T03:32:27.474+0000 7fa5d6253700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication stderr: [errno 13] RADOS permission denied (error connecting to the cluster) --> RuntimeError: Unable to create a new OSD id I then copy the key ring file into the container using scp, but by that time the orchestrator created OSDs on the drives, so I have to delete the OSDs and start over. Then if I get the timing just right, I get this (from within cephadm shell): [ceph: root@hiho bootstrap-osd]# ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p3 Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e6e316d4-670d-4a9b-a50c-bc14d57394a3 Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgcreate --force --yes ceph-4d95584a-df28-4e21-9480-09a13f1fb804 /dev/sdd stdout: Physical volume "/dev/sdd" successfully created. stdout: Volume group "ceph-4d95584a-df28-4e21-9480-09a13f1fb804" successfully created Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 953861 -n osd-block-e6e316d4-670d-4a9b-a50c-bc14d57394a3 ceph-4d95584a-df28-4e21-9480-09a13f1fb804 stdout: Logical volume "osd-block-e6e316d4-670d-4a9b-a50c-bc14d57394a3" created. Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 119209 -n osd-db-9fc4f199-2c95-4ca7-a35c-ef4b08c86804 ceph-948a633c-420e-4f55-8515-b33e1c0ef18c stderr: Volume group "ceph-948a633c-420e-4f55-8515-b33e1c0ef18c" has insufficient free space (0 extents): 119209 required. --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.12 --yes-i-really-mean-it stderr: purged osd.12 --> RuntimeError: Unable to find any LV for zapping OSD: 12 [ceph: root@hiho bootstrap-osd]# ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p3 --> RuntimeError: Device /dev/sdd has a filesystem. [ceph: root@hiho bootstrap-osd]# ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p3 Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new b93e1a8a-af88-431c-b705-f49d717b050f Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgcreate --force --yes ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83 /dev/sdd stdout: Physical volume "/dev/sdd" successfully created. stdout: Volume group "ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83" successfully created Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 953861 -n osd-block-b93e1a8a-af88-431c-b705-f49d717b050f ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83 stdout: Logical volume "osd-block-b93e1a8a-af88-431c-b705-f49d717b050f" created. Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgcreate --force --yes ceph-4abdb6f8-c891-43f8-8135-dd8470f80130 /dev/nvme0n1p3 stdout: Physical volume "/dev/nvme0n1p3" successfully created. stdout: Volume group "ceph-4abdb6f8-c891-43f8-8135-dd8470f80130" successfully created Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvcreate --yes -l 119209 -n osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 ceph-4abdb6f8-c891-43f8-8135-dd8470f80130 stdout: Wiping ceph_bluestore signature on /dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13. stdout: Logical volume "osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13" created. Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-12 Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83/osd-block-b93e1a8a-af88-431c-b705-f49d717b050f Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5 Running command: /usr/bin/ln -s /dev/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83/osd-block-b93e1a8a-af88-431c-b705-f49d717b050f /var/lib/ceph/osd/ceph-12/block Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-12/activate.monmap stderr: got monmap epoch 33 --> Creating keyring file for osd.12 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/keyring Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/ Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 Running command: /usr/bin/chown -R ceph:ceph /dev/dm-8 Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --keyfile - --bluestore-block-db-path /dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 --osd-data /var/lib/ceph/osd/ceph-12/ --osd-uuid b93e1a8a-af88-431c-b705-f49d717b050f --setuser ceph --setgroup ceph stderr: 2023-03-28T03:23:21.180+0000 7f0a40fd63c0 -1 bluestore(/var/lib/ceph/osd/ceph-12/) _read_fsid unparsable uuid --> ceph-volume lvm prepare successful for: /dev/sdd Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83/osd-block-b93e1a8a-af88-431c-b705-f49d717b050f --path /var/lib/ceph/osd/ceph-12 --no-mon-config Running command: /usr/bin/ln -snf /dev/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83/osd-block-b93e1a8a-af88-431c-b705-f49d717b050f /var/lib/ceph/osd/ceph-12/block Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-12/block Running command: /usr/bin/chown -R ceph:ceph /dev/dm-5 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12 Running command: /usr/bin/ln -snf /dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 /var/lib/ceph/osd/ceph-12/block.db Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 Running command: /usr/bin/chown -R ceph:ceph /dev/dm-8 Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-12/block.db Running command: /usr/bin/chown -R ceph:ceph /dev/dm-8 Running command: /usr/bin/systemctl enable ceph-volume@lvm-12-b93e1a8a-af88-431c-b705-f49d717b050f stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume(a)lvm-12-b93e1a8a-af88-431c-b705-f49d717b050f.service<mailto:/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-12-b93e1a8a-af88-431c-b705-f49d717b050f.service> -> /usr/lib/systemd/system/ceph-volume@.service<mailto:/usr/lib/systemd/system/ceph-volume@.service>. Running command: /usr/bin/systemctl enable --runtime ceph-osd@12 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)12.service<mailto:/run/systemd/system/ceph-osd.target.wants/ceph-osd@12.service> -> /usr/lib/systemd/system/ceph-osd@.service<mailto:/usr/lib/systemd/system/ceph-osd@.service>. Running command: /usr/bin/systemctl start ceph-osd@12 stderr: Failed to connect to bus: No such file or directory --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.12 --yes-i-really-mean-it stderr: purged osd.12 --> Zapping: /dev/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83/osd-block-b93e1a8a-af88-431c-b705-f49d717b050f --> Unmounting /var/lib/ceph/osd/ceph-12 Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-12 stderr: umount: /var/lib/ceph/osd/ceph-12 unmounted Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83/osd-block-b93e1a8a-af88-431c-b705-f49d717b050f bs=1M count=10 conv=fsync stderr: 10+0 records in 10+0 records out stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0771422 s, 136 MB/s --> Only 1 LV left in VG, will proceed to destroy volume group ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83 Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83 stderr: Removing ceph--e91c86b4--613f--45b0--b9d1--3bb76ed10f83-osd--block--b93e1a8a--af88--431c--b705--f49d717b050f (253:5) stderr: Releasing logical volume "osd-block-b93e1a8a-af88-431c-b705-f49d717b050f" Archiving volume group "ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83" metadata (seqno 5). stdout: Logical volume "osd-block-b93e1a8a-af88-431c-b705-f49d717b050f" successfully removed. stderr: Removing physical volume "/dev/sdd" from volume group "ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83" stdout: Volume group "ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83" successfully removed stderr: Creating volume group backup "/etc/lvm/backup/ceph-e91c86b4-613f-45b0-b9d1-3bb76ed10f83" (seqno 6). Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdd stdout: Labels on physical volume "/dev/sdd" successfully wiped. --> Zapping: /dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130/osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13 bs=1M count=10 conv=fsync stderr: 10+0 records in 10+0 records out stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0131332 s, 798 MB/s --> Only 1 LV left in VG, will proceed to destroy volume group ceph-4abdb6f8-c891-43f8-8135-dd8470f80130 Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-4abdb6f8-c891-43f8-8135-dd8470f80130 stderr: Removing ceph--4abdb6f8--c891--43f8--8135--dd8470f80130-osd--db--5bab4d99--e2e5--48e7--aee6--d1a2103b9d13 (253:8) stderr: Releasing logical volume "osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13" Archiving volume group "ceph-4abdb6f8-c891-43f8-8135-dd8470f80130" metadata (seqno 5). stdout: Logical volume "osd-db-5bab4d99-e2e5-48e7-aee6-d1a2103b9d13" successfully removed. stderr: Removing physical volume "/dev/nvme0n1p3" from volume group "ceph-4abdb6f8-c891-43f8-8135-dd8470f80130" stdout: Volume group "ceph-4abdb6f8-c891-43f8-8135-dd8470f80130" successfully removed stderr: Creating volume group backup "/etc/lvm/backup/ceph-4abdb6f8-c891-43f8-8135-dd8470f80130" (seqno 6). Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/nvme0n1p3 stdout: Labels on physical volume "/dev/nvme0n1p3" successfully wiped. --> Zapping successful for OSD: 12 --> RuntimeError: command returned non-zero exit status: 1 I have 3 other servers with the split data and block db that I was able to install just using cephadm, so I am not sure what is off on this one. Is there any way to either add the block.db to the already created OSDs or get around the missing bootstrap.osd/ceph or manually configure the disk and block db? I have also tried ceph orch apply osd --all-available-devices --unmanaged=true to stop ceph from trying to take the OSD, but it still does Thanks, Rob

1 year, 1 month

2
3
0 0

s3-select introduction blog / Trino integration

by Gal Salomon

Hi https://ceph.io/en/news/blog/2022/s3select-intro/ Recently I published a blog on s3-select. the Blog discusses what it is, and why it is required. the last paragraph discusses the Trino(analytic SQL utility) and its integration with Ceph/s3select. that integration is still on-work, and it is quite promising. Trino does not just provides a comprehensive SQL but also provides scalable processing for the SQL statements. we will be glad to hear your ideas and comments. Gal.

1 year, 1 month

1
0
0 0

Re: Ceph cluster out of balance after adding OSDs

by Robert Sander

On 27.03.23 16:34, Pat Vaughan wrote: > Yes, all the OSDs are using the SSD device class. Do you have multiple CRUSH rules by chance? Are all pools using the same CRUSH rule? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

1 year, 1 month

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2023