January 2023 - ceph-users

by Geoffrey Rhodes

Good day all, I've an issue with a few OSDs (in two different nodes) that attempt to start but fail / crash quite quickly. They are all LVM disks. I've tried upgrading software, health checks on the hardware (nodes and disks) and there doesn't seem to be any issues there. Recently I've had a few "other" disks physically fail in the cluster and now have one PG down which is blocking some IO on CephFS. I've added the output of the osd journalctl and the osd log below in case it's helpful to identify anything obvious. I also set debug bluefs = 20 , saw this in another post. I recently manually upgraded this node to (17.2.0) before the problem began, later to (17.2.5). - The other osds in this node start / run fine. The other node (15.2.17) also has a few osds that will not start and some that run without issue. Could anyone point me in the right direction to investigate and solve my osd issues. https://pastebin.com/3PkCabdf https://pastebin.com/BT9bnhSb Production system mainly used for CephFS OS: Ubuntu 20.04.5 LTS Ceph versions: 15.2.17 - Octopus (one OSD node manually upgraded to 17.2.5 - Quincy) Erasure data pool (K=4, M=2) - The journal's for each osd are co-located on each drive Kind regards Geoffrey Rhodes

1 year, 2 months

1
1
0 0

rbd-mirror ceph quincy Not able to find rbd_mirror_journal_max_fetch_bytes config in rbd mirror

by ankit raikwar

Hello All, In the ceph quincy Not able to find rbd_mirror_journal_max_fetch_bytes config in rbd mirror i configured the ceph cluster almost 400 tb and enable the rbd-mirror in the starting stage i'm able to achive the almost 9 GB speed , but after the rebalane completed of the all the images . rbd-mirror speed got automaticily reduce to between 4 to 5 mbps. in my primary cluster we are continuelsy writing the 50 to 400 mbps data but replication speed only we get the 4 to 5 mbps. also we have the 10 Gbps replication network bandwidth. Note::- I also try to find the option rbd_mirror_journal_max_fetch_bytes but i'm not able to find the this option in the configuration. also when i try to set from the command line it's showing error like command: ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432 error: Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes' cluster version ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) Please suggest any alternative way to configurre this option or how i can improve the replication n/w speed.

1 year, 2 months

2
5
0 0

Real memory usage of the osd(s)

by Szabo, Istvan (Agoda)

Hello, If buffered_io is enabled, is there a way to know what is the exactly used physical memory from each osd? What I've found is the dump_mempools which last entries are the following, but this bytes would be the real physical memory usage? "total": { "items": 60005205, "bytes": 995781359 Also which metric is this value? I haven't found any. Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

1 year, 2 months

1
0
0 0

All pgs unknown

by Daniel Brunner

Hi, my ceph cluster started to show HEALTH_WARN, there are no healthy pgs left, all are unknown, but it seems my cephfs is still readable, how to investigate this any further? $ sudo ceph -s cluster: id: ddb7ebd8-65b5-11ed-84d7-22aca0408523 health: HEALTH_WARN failed to probe daemons or devices noout flag(s) set Reduced data availability: 339 pgs inactive services: mon: 1 daemons, quorum flucky-server (age 3m) mgr: flucky-server.cupbak(active, since 3m) mds: 1/1 daemons up osd: 18 osds: 18 up (since 26h), 18 in (since 7w) flags noout rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 339 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 339 unknown $ sudo ceph fs status cephfs - 2 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.flucky-server.ldzavv Reqs: 0 /s 61.9k 61.9k 17.1k 54.5k POOL TYPE USED AVAIL cephfs_metadata metadata 0 0 cephfs_data data 0 0 MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) $ docker logs ceph-ddb7ebd8-65b5-11ed-84d7-22aca0408523-mon-flucky-server cluster 2023-01-27T12:15:30.437140+0000 mgr.flucky-server.cupbak (mgr.144098) 200 : cluster [DBG] pgmap v189: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail debug 2023-01-27T12:15:31.995+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:15:32.437854+0000 mgr.flucky-server.cupbak (mgr.144098) 201 : cluster [DBG] pgmap v190: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:32.373735+0000 osd.9 (osd.9) 123948 : cluster [DBG] 9.a deep-scrub starts cluster 2023-01-27T12:15:33.013990+0000 osd.2 (osd.2) 41797 : cluster [DBG] 5.6 scrub starts cluster 2023-01-27T12:15:33.402881+0000 osd.9 (osd.9) 123949 : cluster [DBG] 9.13 scrub starts cluster 2023-01-27T12:15:34.438591+0000 mgr.flucky-server.cupbak (mgr.144098) 202 : cluster [DBG] pgmap v191: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:35.461575+0000 osd.9 (osd.9) 123950 : cluster [DBG] 7.16 deep-scrub starts debug 2023-01-27T12:15:37.005+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:15:36.439416+0000 mgr.flucky-server.cupbak (mgr.144098) 203 : cluster [DBG] pgmap v192: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:36.925368+0000 osd.2 (osd.2) 41798 : cluster [DBG] 7.15 deep-scrub starts cluster 2023-01-27T12:15:37.960907+0000 osd.2 (osd.2) 41799 : cluster [DBG] 6.6 scrub starts cluster 2023-01-27T12:15:38.440099+0000 mgr.flucky-server.cupbak (mgr.144098) 204 : cluster [DBG] pgmap v193: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:38.482333+0000 osd.9 (osd.9) 123951 : cluster [DBG] 2.2 scrub starts cluster 2023-01-27T12:15:38.959557+0000 osd.2 (osd.2) 41800 : cluster [DBG] 9.47 scrub starts cluster 2023-01-27T12:15:39.519980+0000 osd.9 (osd.9) 123952 : cluster [DBG] 4.b scrub starts cluster 2023-01-27T12:15:40.440711+0000 mgr.flucky-server.cupbak (mgr.144098) 205 : cluster [DBG] pgmap v194: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail debug 2023-01-27T12:15:42.012+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:15:41.536421+0000 osd.9 (osd.9) 123953 : cluster [DBG] 2.7 scrub starts cluster 2023-01-27T12:15:42.441314+0000 mgr.flucky-server.cupbak (mgr.144098) 206 : cluster [DBG] pgmap v195: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:43.954128+0000 osd.2 (osd.2) 41801 : cluster [DBG] 9.4f scrub starts cluster 2023-01-27T12:15:44.441897+0000 mgr.flucky-server.cupbak (mgr.144098) 207 : cluster [DBG] pgmap v196: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:45.944038+0000 osd.2 (osd.2) 41802 : cluster [DBG] 1.1f deep-scrub starts debug 2023-01-27T12:15:47.019+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:15:46.442532+0000 mgr.flucky-server.cupbak (mgr.144098) 208 : cluster [DBG] pgmap v197: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:47.543275+0000 osd.9 (osd.9) 123954 : cluster [DBG] 2.3 scrub starts cluster 2023-01-27T12:15:48.443081+0000 mgr.flucky-server.cupbak (mgr.144098) 209 : cluster [DBG] pgmap v198: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:48.515994+0000 osd.9 (osd.9) 123955 : cluster [DBG] 1.19 scrub starts cluster 2023-01-27T12:15:49.957501+0000 osd.2 (osd.2) 41803 : cluster [DBG] 7.11 scrub starts cluster 2023-01-27T12:15:50.443740+0000 mgr.flucky-server.cupbak (mgr.144098) 210 : cluster [DBG] pgmap v199: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:50.473278+0000 osd.9 (osd.9) 123956 : cluster [DBG] 5.10 scrub starts debug 2023-01-27T12:15:52.026+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:15:51.506790+0000 osd.9 (osd.9) 123957 : cluster [DBG] 5.1b deep-scrub starts cluster 2023-01-27T12:15:51.957026+0000 osd.2 (osd.2) 41804 : cluster [DBG] 4.16 scrub starts cluster 2023-01-27T12:15:52.444197+0000 mgr.flucky-server.cupbak (mgr.144098) 211 : cluster [DBG] pgmap v200: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:52.939466+0000 osd.2 (osd.2) 41805 : cluster [DBG] 5.1c scrub starts cluster 2023-01-27T12:15:53.470511+0000 osd.9 (osd.9) 123958 : cluster [DBG] 8.8 scrub starts cluster 2023-01-27T12:15:53.916653+0000 osd.2 (osd.2) 41806 : cluster [DBG] 5.6 deep-scrub starts cluster 2023-01-27T12:15:54.422547+0000 osd.9 (osd.9) 123959 : cluster [DBG] 9.3b deep-scrub starts cluster 2023-01-27T12:15:54.444675+0000 mgr.flucky-server.cupbak (mgr.144098) 212 : cluster [DBG] pgmap v201: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:55.409322+0000 osd.9 (osd.9) 123960 : cluster [DBG] 9.34 deep-scrub starts cluster 2023-01-27T12:15:55.921989+0000 osd.2 (osd.2) 41807 : cluster [DBG] 7.15 deep-scrub starts debug 2023-01-27T12:15:57.029+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 audit 2023-01-27T12:15:56.339185+0000 mgr.flucky-server.cupbak (mgr.144098) 213 : audit [DBG] from='client.144120 -' entity='client.admin' cmd=[{"prefix": "fs status", "target": ["mon-mgr", ""]}]: dispatch cluster 2023-01-27T12:15:56.445186+0000 mgr.flucky-server.cupbak (mgr.144098) 214 : cluster [DBG] pgmap v202: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:57.883819+0000 osd.2 (osd.2) 41808 : cluster [DBG] 6.6 deep-scrub starts cluster 2023-01-27T12:15:58.445697+0000 mgr.flucky-server.cupbak (mgr.144098) 215 : cluster [DBG] pgmap v203: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:15:59.415908+0000 osd.9 (osd.9) 123961 : cluster [DBG] 9.25 scrub starts cluster 2023-01-27T12:16:00.446210+0000 mgr.flucky-server.cupbak (mgr.144098) 216 : cluster [DBG] pgmap v204: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail debug 2023-01-27T12:16:02.033+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:16:02.446670+0000 mgr.flucky-server.cupbak (mgr.144098) 217 : cluster [DBG] pgmap v205: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail debug 2023-01-27T12:16:04.953+0000 7fa908bf2700 0 mon.flucky-server@0(leader) e1 handle_command mon_command({"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/flucky-server.cupbak/mirror_snapshot_schedule"} v 0) v1 debug 2023-01-27T12:16:04.953+0000 7fa908bf2700 0 log_channel(audit) log [INF] : from='mgr.144098 172.18.0.1:0/3192812764' entity='mgr.flucky-server.cupbak' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/flucky-server.cupbak/mirror_snapshot_schedule"}]: dispatch debug 2023-01-27T12:16:04.969+0000 7fa908bf2700 0 mon.flucky-server@0(leader) e1 handle_command mon_command({"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/flucky-server.cupbak/trash_purge_schedule"} v 0) v1 debug 2023-01-27T12:16:04.969+0000 7fa908bf2700 0 log_channel(audit) log [INF] : from='mgr.144098 172.18.0.1:0/3192812764' entity='mgr.flucky-server.cupbak' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/flucky-server.cupbak/trash_purge_schedule"}]: dispatch cluster 2023-01-27T12:16:04.447207+0000 mgr.flucky-server.cupbak (mgr.144098) 218 : cluster [DBG] pgmap v206: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:16:04.537785+0000 osd.9 (osd.9) 123962 : cluster [DBG] 9.27 scrub starts cluster 2023-01-27T12:16:04.795757+0000 osd.2 (osd.2) 41809 : cluster [DBG] 9.47 scrub starts audit 2023-01-27T12:16:04.956941+0000 mon.flucky-server (mon.0) 304 : audit [INF] from='mgr.144098 172.18.0.1:0/3192812764' entity='mgr.flucky-server.cupbak' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/flucky-server.cupbak/mirror_snapshot_schedule"}]: dispatch audit 2023-01-27T12:16:04.973875+0000 mon.flucky-server (mon.0) 305 : audit [INF] from='mgr.144098 172.18.0.1:0/3192812764' entity='mgr.flucky-server.cupbak' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/flucky-server.cupbak/trash_purge_schedule"}]: dispatch debug 2023-01-27T12:16:07.039+0000 7fa90b3f7700 1 mon.flucky-server(a)0(leader).osd e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232 full_alloc: 348127232 kv_alloc: 322961408 cluster 2023-01-27T12:16:06.447964+0000 mgr.flucky-server.cupbak (mgr.144098) 219 : cluster [DBG] pgmap v207: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:16:07.606921+0000 osd.9 (osd.9) 123963 : cluster [DBG] 9.1c scrub starts cluster 2023-01-27T12:16:08.448450+0000 mgr.flucky-server.cupbak (mgr.144098) 220 : cluster [DBG] pgmap v208: 339 pgs: 339 unknown; 0 B data, 0 B used, 0 B / 0 B avail cluster 2023-01-27T12:16:08.629529+0000 osd.9 (osd.9) 123964 : cluster [DBG] 9.2c scrub starts

1 year, 2 months

2
1
0 0

excluding from host_pattern

by E Taka

Hi, I wonder if it is possible to define a host pattern, which includes the host names ceph01…ceph19, but no other hosts, especially not ceph00. That means, this pattern is wrong: ceph[01][0-9] , since it includes ceph00. Not really a problem, but it seems that the "“host-pattern” is a regex that matches against hostnames and returns only matching hosts"¹ is not defined more precisely in the docs. 1) https://docs.ceph.com/en/latest/cephadm/host-management/

1 year, 2 months

4
4
0 0

Re: Ceph rbd clients surrender exclusive lock in critical situation

by Marc

> > > > Hi all, > > > > we are observing a problem on a libvirt virtualisation cluster that > might come from ceph rbd clients. Something went wrong during execution > of a live-migration operation and as a result we have two instances of > the same VM running on 2 different hosts, the source- and the > destination host. What we observe now is the the exclusive lock of the > RBD disk image moves between these two clients periodically (every few > minutes the owner flips). > > Hi Frank, > > If you are talking about RBD exclusive lock feature ("exclusive-lock" > under "features" in "rbd info" output) then this is expected. This > feature provides automatic cooperative lock transitions between clients > to ensure that only a single client is writing to the image at any > given time. It's there to protect internal per-image data structures > such as the object map, the journal or the client-side PWL (persistent > write log) cache from concurrent modifications in case the image is > opened by two or more clients. The name is confusing but it's NOT > about preventing other clients from opening and writing to the image. > Rather it's about serializing those writes. > I can remember asking this also quite some time ago. Maybe this is helpful https://www.wogri.at/scripts/ceph-libvirt-locking/

1 year, 2 months

4
6
0 0

rbd-mirror replication speed is very slow - but initial replication is fast

by ankit raikwar

Hello Team, Please help me i deploy two ceph cluster with 6 node configuration almost 800tb of capacity. and configurae in the DC-DR configuration for the data high availability. i eanbel the rwg and rbd block device mirroring for the replocatio of the data. we have the 10 GBPS fiber replication network . when we first start rbd mirror from our dc to dr starting time when we are replication our exsisting data that time we are getting almomst 8 GBPS replication speed and it's work fine. once all the exesting images data replicated now we are facing the replication speed issue . now only we are getting the 5 to 10 mbps relication speed. we also try to find the option like rbd_journal_max_payload_bytes and rbd_mirror_journal_max_fetch_bytes but max payload size we try to increase but we don't get any result regarding the speed. it still same . and rbd_mirror_journal_max_fetch_bytes option we are not able to find on the our ceph version. i also try to modify some other values and increase like rbd_mirror_memory_target rbd_mirror_memory_cache_min you also can find some refrence regarding this values for increase performace. Eugen [1] https://tracker.ceph.com/projects/ceph/repository/revisions/1ef12ea0d29f955… [2] https://github.com/ceph/ceph/pull/27670 Information of my ceph Cluster. Version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) rbd-mirror daemon version: 17.2.5 Mirror mode; pool max image mirro at time: 5 replication network: 10 gbps (dedicated) Client: DC cluster we are continue writing the 50 to 400 mbps data but replication only 5 to 10 mbps. issue: speed only we get the 4 to 5 mbps. also we have the 10 Gbps replication network bandwidth. Note::- I also try to find the option rbd_mirror_journal_max_fetch_bytes but i'm not able to find the this option in the configuration. also when i try to set from the command line it's showing error like command: ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432 error: Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes'

1 year, 2 months

1
0
0 0

OSDs fail to start after stopping them with ceph osd stop command

by Stefan Hanreich

We encountered the following problems while trying to perform maintenance on a Ceph cluster: The cluster consists of 7 Nodes with 10 OSDs each. There are 4 pools on it: 3 of them are replicated pools with 3/2 size/min_size and one is an erasure coded pool with m=2 and k=5. The following global flags were set: * noout * norebalance * nobackfill * norecover Then, after those flags were set, all OSDs were stopped via the command ceph osd stop, which seems to have caused the issue. After maintenance was done, all OSDs were started again via systemctl. Only about half of the 70 OSDs in total started at first - while the other half started, but got killed after a few seconds with the following log messages: ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff3fcf8d700 -1 osd.51 12161 map says i am stopped by admin. shutting down. ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 received signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.51 12161 *** Got signal Interrupt *** ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.51 12161 *** Immediate shutdown (osd_fast_shutdown=true) *** And indeed, when looking into the osd map via ceph osd dump, the remaining OSDs seem to be marked as stopped: osd.50 down out weight 0 up_from 9213 up_thru 9416 down_at 9760 last_clean_interval [9106,9207) [v2:10.0.1.61:6813/6211,v1:10.0.1.61:6818/6211] [v2:10.0.0.61:6814/6211,v1:10.0.0.61:6816/6211] exists,stop 9a2590c4-f50b-4550-bfd1-5aafb543cb59 We were able to restore some of the remaining OSDs via running ceph out osd XX ceph in osd XX and then starting the service again (via systemctl start). This did work for most OSDs, except for the OSDs that are located on one specific host. Some OSDs required several restarts until they did not kill themselves a few seconds after starting. This whole issue seems to be caused by the OSDs being marked as stopped in the OSD map [1]. Apparently this state should get reset when re-starting the OSD again [2], but for some reason this doesn't happen for some of the OSDs. This behavior seems to have been introduced via the following pull request [3]. We have also found the following commit where the logic regarding stop seemed to have been introduced [4]. We were looking into commands that reset the stopped status of the OSD in the OSD map, but did not find any way of forcing this. Since we are out of ideas on how to proceed with the remaining 10 OSDs that cannot get brought up: How does one recover from this situation? It seems like by running ceph osd stop the cluster got in a state that seems irrecoverable with the normal CLI commands available. We even looked into the possibility of manually manipulating the osdmap via the osdmaptool, but there doesn't seem to be a way to edit the start/stopped status and it also seems like a very invasive procedure. There does not seem to be any way we can see of recovering from this, apart from rebuilding all the OSDs - which we refrained from for now. Kind Regards Hanreich Stefan [1] https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/… [2] https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/… [3] https://github.com/ceph/ceph/pull/43664 [4] https://github.com/ceph/ceph/commit/5dbae13ce0f5b0104ab43e0ccfe94f832d0e1268

1 year, 2 months

2
2
0 0

Replacing OSD with containerized deployment

by Ken D

Dear Ceph-Users, i am struggling to replace a disk. My ceph-cluster is not replacing the old OSD even though I did: ceph orch osd rm 232 --replace The OSD 232 is still shown in the osd list, but the new hdd will be placed as a new OSD. This wouldnt mind me much, if the OSD was also placed on the bluestoreDB / NVME, but it doesn't. My steps: "ceph orch osd rm 232 --replace" remove the failed hdd. add the new one. Convert the disk within the servers bios, so that the node can have direct access on it. It shows up as /dev/sdt, enter maintenance mode reboot server drive is now /dev/sdm (which the old drive had) "ceph orch device zap node-x /dev/sdm " A new OSD is placed on the cluster. Can you give me a hint, where did I take a wrong turn? Why is the disk not being used as OSD 232? Best Ken P.S. Sorry for double sending this message, somehow this mail-address was not subscribed to the list anymore.

1 year, 2 months

1
0
0 0

Replacing OSD with containerized deployment

by mailing-lists

Dear Ceph-Users, i am struggling to replace a disk. My ceph-cluster is not replacing the old OSD even though I did: ceph orch osd rm 232 --replace The OSD 232 is still shown in the osd list, but the new hdd will be placed as a new OSD. This wouldnt mind me much, if the OSD was also placed on the bluestoreDB / NVME, but it doesn't. My steps: "ceph orch osd rm 232 --replace" remove the failed hdd. add the new one. Convert the disk within the servers bios, so that the node can have direct access on it. It shows up as /dev/sdt, enter maintenance mode reboot server drive is now /dev/sdm (which the old drive had) "ceph orch device zap node-x /dev/sdm " A new OSD is placed on the cluster. Can you give me a hint, where did I take a wrong turn? Why is the disk not being used as OSD 232? Best Ken P.S. Sorry for double sending this message, somehow this mail was not subscribed to the list anymore.

1 year, 2 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2023