- ceph-users - lists.ceph.io

by Vahideh Alinouri

Hi guys, I need setup Ceph over RDMA, but I faced many issues! The info regarding my cluster: Ceph version is Reef Network cards are Broadcom RDMA. RDMA connection between OSD nodes are OK. I just found ms_type = async+rdma config in document and apply it using ceph config set global ms_type async+rdma After this action the cluster crashes. I tried to cluster back, and I did: Put ms_type async+posix in ceph.conf Restart all MON services The cluster is back, but I don't have any active mgr. All OSDs are down too. Is there any order to do for setting up Ceph over RDMA? Thanks

2 weeks, 6 days

1
1
0 0

MDS crash

by alexey.gerasimov＠opencascade.com

Dear colleagues, hope that anybody can help us. The initial point: Ceph cluster v15.2 (installed and controlled by the Proxmox) with 3 nodes based on physical servers rented from a cloud provider. CephFS is installed also. Yesterday we discovered that some of the applications stopped working. During the investigation we recognized that we have the problem with Ceph, more precisely with СephFS - MDS daemons suddenly crashed. We tried to restart them and found that they crashed again immediately after the start. The crash information: 2024-04-17T17:47:42.841+0000 7f959ced9700 1 mds.0.29134 recovery_done -- successful recovery! 2024-04-17T17:47:42.853+0000 7f959ced9700 1 mds.0.29134 active_start 2024-04-17T17:47:42.881+0000 7f959ced9700 1 mds.0.29134 cluster recovered. 2024-04-17T17:47:43.825+0000 7f959aed5700 -1 ./src/mds/OpenFileTable.cc: In function 'void OpenFileTable::commit(MDSContext*, uint64_t, int)' thread 7f959aed5700 time 2024-04-17T17:47:43.831243+0000 ./src/mds/OpenFileTable.cc: 549: FAILED ceph_assert(count > 0) Next hours we read the tons of articles, studied the documentation, and checked the common state of Ceph cluster by the various diagnostic commands – but didn’t find anything wrong. At evening we decided to upgrade it up to v16, and finally to v17.2.7. Unfortunately, it didn’t solve the problem, MDS continue to crash with the same error. The only difference that we found is “1 MDSs report damaged metadata” in the output of ceph -s – see it below. I supposed that it may be the well-known bug, but couldn’t find the same one on https://tracker.ceph.com - there are several bugs associated with file OpenFileTable.cc but not related to ceph_assert(count > 0) We tried to check the source code of OpenFileTable.cc also, here is a fragment of it, in function OpenFileTable::_journal_finish int omap_idx = anchor.omap_idx; unsigned& count = omap_num_items.at(omap_idx); ceph_assert(count > 0); So, we guess that the object map is empty for some object in Ceph, and it is unexpected behavior. But again, we found nothing wrong in our cluster… Next, we started with https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ article – tried to reset the journal (despite that it was Ok all the time) and wipe the sessions using cephfs-table-tool all reset session command. No result… Now I decided to continue following this article and run cephfs-data-scan scan_extents command, it is working just now. But I have a doubt that it will solve the issue because of no problem with our objects in Ceph. Is it the new bug? or something else? Any idea is welcome! The important outputs: ----- ceph -s cluster: id: 4cd1c477-c8d0-4855-a1f1-cb71d89427ed health: HEALTH_ERR 1 MDSs report damaged metadata insufficient standby MDS daemons available 83 daemons have recently crashed 3 mgr modules have recently crashed services: mon: 3 daemons, quorum asrv-dev-stor-2,asrv-dev-stor-3,asrv-dev-stor-1 (age 22h) mgr: asrv-dev-stor-2(active, since 22h), standbys: asrv-dev-stor-1 mds: 1/1 daemons up osd: 18 osds: 18 up (since 22h), 18 in (since 29h) data: volumes: 1/1 healthy pools: 5 pools, 289 pgs objects: 29.72M objects, 5.6 TiB usage: 21 TiB used, 47 TiB / 68 TiB avail pgs: 287 active+clean 2 active+clean+scrubbing+deep io: client: 2.5 KiB/s rd, 172 KiB/s wr, 261 op/s rd, 195 op/s wr -----ceph fs dump e29480 enable_multiple, ever_enabled_multiple: 0,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 29480 flags 12 joinable allow_snaps allow_multimds_snaps created 2022-11-25T15:56:08.507407+0000 modified 2024-04-18T16:52:29.970504+0000 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features {} last_failure 0 last_failure_osd_epoch 14728 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=156636152} failed damaged stopped data_pools [5] metadata_pool 6 inline_data disabled balancer standby_count_wanted 1 [mds.asrv-dev-stor-1{0:156636152} state up:active seq 6 laggy since 2024-04-18T16:52:29.970479+0000 addr [v2:172.22.2.91:6800/2487054023,v1:172.22.2.91:6801/2487054023] compat {c=[1],r=[1],i=[7ff]}] -----cephfs-journal-tool --rank=cephfs:0 journal inspect Overall journal integrity: OK -----ceph pg dump summary version 41137 stamp 2024-04-18T21:17:59.133536+0000 last_osdmap_epoch 0 last_pg_scan 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG sum 29717605 0 0 0 0 6112544251872 13374192956 28493480 1806575 1806575 OSD_STAT USED AVAIL USED_RAW TOTAL sum 21 TiB 47 TiB 21 TiB 68 TiB -----ceph pg dump pools POOLID OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG 8 31771 0 0 0 0 131337887503 2482 140 401246 401246 7 839707 0 0 0 0 3519034650971 736 61 399328 399328 6 1319576 0 0 0 0 421044421 13374189738 28493279 206749 206749 5 27526539 0 0 0 0 2461702171417 0 0 792165 792165 2 12 0 0 0 0 48497560 0 0 6991 6991

2 weeks, 6 days

5
7
0 0

Cephadm stacktrace on copying ceph.conf

by Jesper Agerbo Krogh [JSKR]

Hi. We're currently getting these errors - and I seem to be missing a clear overview over the cause and how to debug. 3/26/24 9:38:09 PM[ERR]executing _write_files((['dkcphhpcadmin01', 'dkcphhpcmgt028', 'dkcphhpcmgt029', 'dkcphhpcmgt031', 'dkcphhpcosd033', 'dkcphhpcosd034', 'dkcphhpcosd035', 'dkcphhpcosd036', 'dkcphhpcosd037', 'dkcphhpcosd038', 'dkcphhpcosd039', 'dkcphhpcosd040', 'dkcphhpcosd041', 'dkcphhpcosd042', 'dkcphhpcosd043', 'dkcphhpcosd044'],)) failed. Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 240, in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run await self._send_files(path, b'') File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await self._send_file(srcpath, dstpath, attrs) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await self._make_cd_request(b'C', attrs, size, srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 79, in do_work return f(*arg) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1088, in _write_files self._write_client_files(client_files, host) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1107, in _write_client_files self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 261, in write_remote_file host, path, content, mode, uid, gid, addr)) File "/usr/share/ceph/mgr/cephadm/module.py", line 615, in wait_async return self.event_loop.get_result(coro) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 56, in get_result return asyncio.run_coroutine_threadsafe(coro, self._loop).result() File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/share/ceph/mgr/cephadm/ssh.py", line 249, in _write_remote_file raise OrchestratorError(msg) orchestrator._interface.OrchestratorError: Unable to write dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied 3/26/24 9:38:09 PM[ERR]Unable to write dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 240, in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run await self._send_files(path, b'') File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await self._send_file(srcpath, dstpath, attrs) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await self._make_cd_request(b'C', attrs, size, srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied 3/26/24 9:38:09 PM[INF]Updating dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf It seem to be related to the permissions that the manager writes the files with and the process copying them around. $ sudo ceph -v [sudo] password for adminjskr: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Best regards, Jesper Agerbo Krogh Director Digitalization Digitalization Topsoe A/S Haldor Topsøes Allé 1 2800 Kgs. Lyngby Denmark Phone (direct): 27773240     Read more attopsoe.com Topsoe A/S and/or its affiliates. This e-mail message (including attachments, if any) is confidential and may be privileged. It is intended only for the addressee. Any unauthorised distribution or disclosure is prohibited. Disclosure to anyone other than the intended recipient does not constitute waiver of privilege. If you have received this email in error, please notify the sender by email and delete it and any attachments from your computer system and records.

3 weeks

2
1
0 0

Re: Recoveries without any misplaced objects?

by Hector Martin

On 29/05/2023 20.55, Anthony D'Atri wrote: > Check the uptime for the OSDs in question I restarted all my OSDs within the past 10 days or so. Maybe OSD restarts are somehow breaking these stats? > >> On May 29, 2023, at 6:44 AM, Hector Martin <marcan(a)marcan.st> wrote: >> >> Hi, >> >> I'm watching a cluster finish a bunch of backfilling, and I noticed that >> quite often PGs end up with zero misplaced objects, even though they are >> still backfilling. >> >> Right now the cluster is down to 6 backfilling PGs: >> >> data: >> volumes: 1/1 healthy >> pools: 6 pools, 268 pgs >> objects: 18.79M objects, 29 TiB >> usage: 49 TiB used, 25 TiB / 75 TiB avail >> pgs: 262 active+clean >> 6 active+remapped+backfilling >> >> But there are no misplaced objects, and the misplaced column in `ceph pg >> dump` is zero for all PGs. >> >> If I do a `ceph pg dump_json`, I can see `num_objects_recovered` >> increasing for these PGs... but the misplaced count is still 0. >> >> Is there something else that would cause recoveries/backfills other than >> misplaced objects? Or perhaps there is a bug somewhere causing the >> misplaced object count to be misreported as 0 sometimes? >> >> # ceph -v >> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy >> (stable) >> >> - Hector >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > > - Hector

3 weeks, 1 day

3
3
0 0

Slow/blocked reads and writes

by Fábio Sato

Hello all, I am trying to troubleshoot a ceph cluster version 18.2.2 having users reporting slow and blocked reads and writes. When running "ceph status" I am seeing many warnings about its health state: cluster: id: cc881230-e0dd-11ee-aa9e-37c4e4e5e14b health: HEALTH_WARN 6 clients failing to respond to capability release 2 clients failing to advance oldest client/flush tid 1 MDSs report slow requests 1 MDSs behind on trimming Too many repaired reads on 11 OSDs Degraded data redundancy: 2 pgs degraded 105 pgs not deep-scrubbed in time 109 pgs not scrubbed in time 1 mgr modules have recently crashed 12 slow ops, oldest one blocked for 97678 sec, daemons [osd.11,osd.12,osd.15,osd.16,osd.19,osd.20,osd.28,osd.3,osd.32,osd.34]... have slow ops. services: mon: 3 daemons, quorum file03-xx,file04-xx,file05-xx (age 17h) mgr: file03-xx.xxxxxx(active, since 2w), standbys: file04-xx.xxxxxx mds: 1/1 daemons up, 1 standby osd: 44 osds: 44 up (since 17h), 44 in (since 39h); 492 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 2065 pgs objects: 66.44M objects, 140 TiB usage: 281 TiB used, 304 TiB / 586 TiB avail pgs: 16511162/134215883 objects misplaced (12.302%) 1508 active+clean 487 active+remapped+backfill_wait 53 active+clean+scrubbing+deep 8 active+clean+scrubbing 5 active+remapped+backfilling 2 active+recovering+degraded+repair 2 active+recovering+repair io: recovery: 47 MiB/s, 37 objects/s When checking the output of `ceph -w` I am flooded with crc error messages like the examples below: 2024-04-24T19:15:40.430334+0000 osd.32 [ERR] 3.566 full-object read crc 0xa5da7fa != expected 0xffffffff on 3:66a8d8f5:::10001c72400.00000007:head 2024-04-24T19:15:40.430507+0000 osd.39 [ERR] 3.270 full-object read crc 0xa1bc3a1e != expected 0xffffffff on 3:0e44aa2f:::1000265a625.00000003:head 2024-04-24T19:15:40.494249+0000 osd.28 [ERR] 3.469 full-object read crc 0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head 2024-04-24T19:15:40.529771+0000 osd.32 [ERR] 3.566 full-object read crc 0xa5da7fa != expected 0xffffffff on 3:66a8d8f5:::10001c72400.00000007:head 2024-04-24T19:15:40.582128+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.583350+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.662945+0000 osd.28 [ERR] 3.469 full-object read crc 0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head 2024-04-24T19:15:40.698197+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.699389+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.769191+0000 osd.28 [ERR] 3.469 full-object read crc 0x8e757c06 != expected 0xffffffff on 3:962852f4:::10001c72c2d.00000001:head 2024-04-24T19:15:40.834344+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head 2024-04-24T19:15:40.835513+0000 osd.19 [ERR] 3.4b full-object read crc 0x9222aec != expected 0xffffffff on 3:d20bddca:::100026fdb01.00000006:head I suspect this is the main issue affecting the cluster health state and performance so I am trying to address this first. The "expected 0xffffffff" crc seems like a bug to me and I found an open ticket (https://tracker.ceph.com/issues/53240) with similar error messages but I am not sure this is related to my case. Could someone point me to the steps to solve these errors? Cheers, -- Fabio

3 weeks, 1 day

1
0
0 0

Orchestrator not automating services / OSD issue

by Michael Baer

Hi, This problem started with trying to add a new storage server into a quincy v17.2.6 ceph cluster. Whatever I did, I could not add the drives on the new host as OSDs: via dashboard, via cephadm shell, by setting osd unmanaged to false. But what I started realizing is that orchestrator will also no longer automatically manage services. I.e. if a service is set to manage by labels, removing and adding labels to different hosts for that service has no affect. Same if I set a service to be manage via hostnames. Same if I try to drain a host (the services/podman containers just keep running). Although, I am able to add/rm services via 'cephadm shell ceph orch daemon add/rm'. But Ceph will not manage automatically using labels/hostnames. This apparently includes OSD daemons. I can not create and on the new host either automatically or manually, but I'm hoping the services/OSD issues are related and not two issues. I haven't been able to find any obvious errors in /var/log/ceph, /var/log/syslog, logs <container>, etc. I have been able to get 'slow ops' errors on monitors by trying to add OSDs manually (and having to restart the monitor). I've also gotten cephadm shell to hang. And had to restart managers. I'm not an expert and it could be something obvious, but I haven't been able to figure out a solution. If anyone has any suggestions, I would greatly appreciate them. Thanks, Mike -- Michael Baer ceph(a)mikesoffice.com

3 weeks, 1 day

2
2
0 0

Re: ceph-users Digest, Vol 118, Issue 85

by duluxoz

Hi Eugen, Thank you for a viable solution to our underlying issue - I'll attempt to implement it shortly. :-) However, with all the respect in world, I believe you are incorrect when you say the doco is correct (but I will be more than happy to be proven wrong). :-) The relevant text (extracted from the document page (the last couple of paragraphs)) says: ~~~ If a client already has a capability for file-system name |a| and path |dir1|, running |fs authorize| again for FS name |a| but path |dir2|, instead of modifying the capabilities client already holds, a new cap for |dir2| will be granted: cephfsauthorizeaclient.x/dir1rw cephauthgetclient.x [client.x] key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA== caps mds = "allow rw fsname=a path=/dir1" caps mon = "allow r fsname=a" caps osd = "allow rw tag cephfs data=a" cephfsauthorizeaclient.x/dir2rw updated caps for client.x cephauthgetclient.x [client.x] key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA== caps mds = "allow rw fsname=a path=dir1, allow rw fsname=a path=dir2" caps mon = "allow r fsname=a" caps osd = "allow rw tag cephfs data=a" ~~~ The above *seems* to me to say (as per the 2nd `cephauthgetclient.x` example) that a 2nd directory (dir2) *will* be added to the `client.x` authorisation. HOWEVER, this does not work in practice - hence my original query. This is what we originally attempted to do (word for word, only substituting our CechFS name for "a") and we got the error in the original post. So if the doco says that something can be done *and* gives a working example, but an end-user (admin) cannot achieve the same results but gets an error instead when following the exact same commands, then either the doco is incorrect *or* there is something else wrong. BUT your statement ("running 'ceph fs authorize' will overwrite the existing caps, it will not add more caps to the client") is in direct contradiction to the documentation ("If a client already has a capability for file-system name |a| and path |dir1|, running |fs authorize| again for FS name |a| but path |dir2|, instead of modifying the capabilities client already holds, a new cap for |dir2| will be granted"). So there's some sort of "disconnect" there. :-) Cheers On 24/04/2024 17:33, ceph-users-request(a)ceph.io wrote: > Send ceph-users mailing list submissions to > ceph-users(a)ceph.io > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > ceph-users-request(a)ceph.io > > You can reach the person managing the list at > ceph-users-owner(a)ceph.io > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of ceph-users digest..." > > Today's Topics: > > 1. Re: Latest Doco Out Of Date? (Eugen Block) > 2. Re: stretched cluster new pool and second pool with nvme > (Eugen Block) > 3. Re: Latest Doco Out Of Date? (Frank Schilder) > > _______________________________________________ > ceph-users mailing list --ceph-users(a)ceph.io > To unsubscribe send an email toceph-users-leave(a)ceph.io > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

3 weeks, 1 day

1
0
0 0

Latest Doco Out Of Date?

by duluxoz

Hi All, In reference to this page from the Ceph documentation: https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of that page it says that you can run the following commands: ~~~ ceph fs authorize a client.x /dir1 rw ceph fs authorize a client.x /dir2 rw ~~~ This will allow `client.x` to access both `dir1` and `dir2`. So, having a use case where we need to do this, we are, HOWEVER, getting the following error on running the 2nd command on a Reef 18.2.2 cluster: `Error EINVAL: client.x already has fs capabilities that differ from those supplied. To generate a new auth key for client.x, first remove client.x from configuration files, execute 'ceph auth rm client.x', then execute this command again.` Something we're doing wrong, or is the doco "out of date" (mind you, that's from the "latest" version of the doco, and the "reef" version), or is something else going on? Thanks in advance for the help Cheers Dulux-Oz

3 weeks, 1 day

4
9
0 0

List of bridges irc/slack/discord

by Alvaro Soto

(Last update) - https://github.com/orgs/opensource-latinamerica/discussions/3 ~~~ Adding a few unofficial/unregistered Ceph IRC channels (cephadm, crimson) IRC -> slack.oss.lat OFTC: starlingx -> slack: starlingx OFTC: openstack-latinamerica -> slack: stack-latinamerica OFTC: openstack-freezer -> slack: stack-freezer OFTC: ceph -> slack: ceph OFTC: sepia -> slack: sepia OFTC: cephfs -> slack: cephfs OFTC: ceph-dashboard -> slack: ceph-dashboard OFTC: ceph-devel -> slack: ceph-devel OFTC:cephadm -> slack:cephadm OFTC:crimson -> slack:crimson IRC -> ceph-storage.slack.com OFTC: ceph -> slack: ceph OFTC: sepia -> slack: sepia OFTC: cephfs -> slack: cephfs OFTC: ceph-dashboard -> slack: ceph-dashboard OFTC: ceph-devel -> slack: ceph-devel OFTC:cephadm -> slack:cephadm OFTC:crimson -> slack:crimson IRC -> discord(a)ceph.io OFTC: ceph -> discord(a)ceph.io: ceph OFTC: sepia -> discord(a)ceph.io: sepia OFTC: cephfs -> discord(a)ceph.io: cephfs OFTC: ceph-dashboard -> discord(a)ceph.io: ceph-dashboard OFTC: ceph-devel -> discord(a)ceph.io: ceph-devel OFTC:cephadm -> discord@ceph.io:cephadm OFTC:crimson -> discord@ceph.io:crimson Ceph invite URL: https://discord.gg/vacj9cZSmm ~~~ Cheers! -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* ---------------------------------------------------------- Great people talk about ideas, ordinary people talk about things, small people talk... about other people.

3 weeks, 2 days

1
0
0 0

Best practice and expected benefits of using separate WAL and DB devices with Bluestore

by Niklaus Hofer

Dear all We have an HDD ceph cluster that could do with some more IOPS. One solution we are considering is installing NVMe SSDs into the storage nodes and using them as WAL- and/or DB devices for the Bluestore OSDs. However, we have some questions about this and are looking for some guidance and advice. The first one is about the expected benefits. Before we undergo the efforts involved in the transition, we are wondering if it is even worth it. How much of a performance boost one can expect when adding NVMe SSDs for WAL-devices to an HDD cluster? Plus, how much faster than that does it get with the DB also being on SSD. Are there rule-of-thumb number of that? Or maybe someone has done benchmarks in the past? The second question is of more practical nature. Are there any best-practices on how to implement this? I was thinking we won't do one SSD per HDD - surely an NVMe SSD is plenty fast to handle the traffic from multiple OSDs. But what is a good ratio? Do I have one NVMe SSD per 4 HDDs? Per 6 or even 8? Also, how should I chop-up the SSD, using partitions or using LVM? Last but not least, if I have one SSD handle WAL and DB for multiple OSDs, losing that SSD means losing multiple OSDs. How do people deal with this risk? Is it generally deemed acceptable or is this something people tend to mitigate and if so how? Do I run multiple SSDs in RAID? I do realize that for some of these, there might not be the one perfect answer that fits all use cases. I am looking for best practices and in general just trying to avoid any obvious mistakes. Any advice is much appreciated. Sincerely Niklaus Hofer -- stepping stone AG Wasserwerkgasse 7 CH-3011 Bern Telefon: +41 31 332 53 63 www.stepping-stone.ch niklaus.hofer(a)stepping-stone.ch

3 weeks, 2 days

7
6
0 0

2024

2023

2022

2021

2020

2019

ceph-users