March 2024 - ceph-users - lists.ceph.io

by Jesper Agerbo Krogh [JSKR]

Hi. We're currently getting these errors - and I seem to be missing a clear overview over the cause and how to debug. 3/26/24 9:38:09 PM[ERR]executing _write_files((['dkcphhpcadmin01', 'dkcphhpcmgt028', 'dkcphhpcmgt029', 'dkcphhpcmgt031', 'dkcphhpcosd033', 'dkcphhpcosd034', 'dkcphhpcosd035', 'dkcphhpcosd036', 'dkcphhpcosd037', 'dkcphhpcosd038', 'dkcphhpcosd039', 'dkcphhpcosd040', 'dkcphhpcosd041', 'dkcphhpcosd042', 'dkcphhpcosd043', 'dkcphhpcosd044'],)) failed. Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 240, in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run await self._send_files(path, b'') File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await self._send_file(srcpath, dstpath, attrs) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await self._make_cd_request(b'C', attrs, size, srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 79, in do_work return f(*arg) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1088, in _write_files self._write_client_files(client_files, host) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1107, in _write_client_files self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 261, in write_remote_file host, path, content, mode, uid, gid, addr)) File "/usr/share/ceph/mgr/cephadm/module.py", line 615, in wait_async return self.event_loop.get_result(coro) File "/usr/share/ceph/mgr/cephadm/ssh.py", line 56, in get_result return asyncio.run_coroutine_threadsafe(coro, self._loop).result() File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/share/ceph/mgr/cephadm/ssh.py", line 249, in _write_remote_file raise OrchestratorError(msg) orchestrator._interface.OrchestratorError: Unable to write dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied 3/26/24 9:38:09 PM[ERR]Unable to write dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/ssh.py", line 240, in _write_remote_file await asyncssh.scp(f.name, (conn, tmp_path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp await source.run(srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run await self._send_files(path, b'') File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in _send_files self.handle_error(exc) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in handle_error raise exc from None File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in _send_files await self._send_file(srcpath, dstpath, attrs) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in _send_file await self._make_cd_request(b'C', attrs, size, srcpath) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in _make_cd_request self._fs.basename(path)) File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in make_request raise exc asyncssh.sftp.SFTPFailure: scp: /tmp/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf.new: Permission denied 3/26/24 9:38:09 PM[INF]Updating dkcphhpcmgt028:/var/lib/ceph/5c384430-da91-11ed-af9c-c780a5227aff/config/ceph.conf It seem to be related to the permissions that the manager writes the files with and the process copying them around. $ sudo ceph -v [sudo] password for adminjskr: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Best regards, Jesper Agerbo Krogh Director Digitalization Digitalization Topsoe A/S Haldor Topsøes Allé 1 2800 Kgs. Lyngby Denmark Phone (direct): 27773240     Read more attopsoe.com Topsoe A/S and/or its affiliates. This e-mail message (including attachments, if any) is confidential and may be privileged. It is intended only for the addressee. Any unauthorised distribution or disclosure is prohibited. Disclosure to anyone other than the intended recipient does not constitute waiver of privilege. If you have received this email in error, please notify the sender by email and delete it and any attachments from your computer system and records.

2 days

2
1
0 0

Re: Recoveries without any misplaced objects?

by Hector Martin

On 29/05/2023 20.55, Anthony D'Atri wrote: > Check the uptime for the OSDs in question I restarted all my OSDs within the past 10 days or so. Maybe OSD restarts are somehow breaking these stats? > >> On May 29, 2023, at 6:44 AM, Hector Martin <marcan(a)marcan.st> wrote: >> >> Hi, >> >> I'm watching a cluster finish a bunch of backfilling, and I noticed that >> quite often PGs end up with zero misplaced objects, even though they are >> still backfilling. >> >> Right now the cluster is down to 6 backfilling PGs: >> >> data: >> volumes: 1/1 healthy >> pools: 6 pools, 268 pgs >> objects: 18.79M objects, 29 TiB >> usage: 49 TiB used, 25 TiB / 75 TiB avail >> pgs: 262 active+clean >> 6 active+remapped+backfilling >> >> But there are no misplaced objects, and the misplaced column in `ceph pg >> dump` is zero for all PGs. >> >> If I do a `ceph pg dump_json`, I can see `num_objects_recovered` >> increasing for these PGs... but the misplaced count is still 0. >> >> Is there something else that would cause recoveries/backfills other than >> misplaced objects? Or perhaps there is a bug somewhere causing the >> misplaced object count to be misreported as 0 sometimes? >> >> # ceph -v >> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy >> (stable) >> >> - Hector >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > > - Hector

2 days, 10 hours

3
3
0 0

Status of IPv4 / IPv6 dual stack?

by Robert Sander

Hi, as the documentation sends mixed signals in https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv… "Note Binding to IPv4 is enabled by default, so if you just add the option to bind to IPv6 you’ll actually put yourself into dual stack mode." and https://docs.ceph.com/en/latest/rados/configuration/msgr2/#address-formats "Note The ability to bind to multiple ports has paved the way for dual-stack IPv4 and IPv6 support. That said, dual-stack operation is not yet supported as of Quincy v17.2.0." just the quick questions: Is a dual stacked networking with IPv4 and IPv6 now supported or not? From which version on is it considered stable? Are OSDs now able to register themselves with two IP addresses in the cluster map? MONs too? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

3 days, 17 hours

6
6
0 0

RGW: Cannot write to bucket anymore

by Malte Stroem

Hello, there is one bucket for a user in our Ceph cluster who is suddenly not able to write to one of his buckets. Reading works fine. All other buckets work fine. If we copy the bucket to another bucket on the same cluster, the error stays. Writing is not possible in the new bucket, too. Interesting: If we copy the contents of the bucket to a bucket in another Ceph cluster the error is gone. So now we know how to solve this but we do not finde the root cause. I checked the policies, lifecycle and versioning. Nothing. The user has FULL_CONTROL. Same settings for the user's other buckets he can still write to. Wenn setting debugging to higher numbers all I can see is something like this while trying to write to the bucket: s3:put_obj reading permissions s3:put_obj init op s3:put_obj verifying op mask s3:put_obj verifying op permissions op->ERRORHANDLER: err_no=-13 new_err_no=-13 cache get: name=default.rgw.log++script.postrequest. : hit (negative entry) s3:put_obj op status=0 s3:put_obj http status=403 1 ====== req done req=0x7fe8bb60a710 op status=0 http_status=403 latency=0.000000000s ====== I still think there is something with a policy or so. When we copy the bucket to another bucket in the same cluster, at first, while copying you can write to the new bucket but when the copy job progresses at one point writing is not possible anymore. But what is it? Best, Malte

5 days, 15 hours

2
4
0 0

Ceph 16.2.x mon compactions, disk writes

by Zakhar Kirpichenko

Hi, Monitors in our 16.2.14 cluster appear to quite often run "manual compaction" tasks: debug 2023-10-09T09:30:53.888+0000 7f48a329a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1696843853892760, "job": 64225, "event": "flush_started", "num_memtables": 1, "num_entries": 715, "num_deletes": 251, "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason": "Manual Compaction"} debug 2023-10-09T09:30:53.904+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb: (Original Log Time 2023/10/09-09:30:53.910204) [db_impl/db_impl_compaction_flush.cc:2516] [default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos; will stop at (end) debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb: (Original Log Time 2023/10/09-09:30:53.911004) [db_impl/db_impl_compaction_flush.cc:2516] [default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos; will stop at (end) debug 2023-10-09T09:32:08.956+0000 7f48a329a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1696843928961390, "job": 64228, "event": "flush_started", "num_memtables": 1, "num_entries": 1580, "num_deletes": 502, "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason": "Manual Compaction"} debug 2023-10-09T09:32:08.972+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb: (Original Log Time 2023/10/09-09:32:08.977739) [db_impl/db_impl_compaction_flush.cc:2516] [default] Manual compaction from level-0 to level-5 from 'logm .. 'logm; will stop at (end) debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb: (Original Log Time 2023/10/09-09:32:08.978512) [db_impl/db_impl_compaction_flush.cc:2516] [default] Manual compaction from level-5 to level-6 from 'logm .. 'logm; will stop at (end) debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.028+0000 7f48a329a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1696844009033151, "job": 64231, "event": "flush_started", "num_memtables": 1, "num_entries": 1430, "num_deletes": 251, "total_data_size": 8975535, "memory_usage": 9035920, "flush_reason": "Manual Compaction"} debug 2023-10-09T09:33:29.044+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb: (Original Log Time 2023/10/09-09:33:29.049585) [db_impl/db_impl_compaction_flush.cc:2516] [default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos; will stop at (end) debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction starting debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb: (Original Log Time 2023/10/09-09:33:29.050355) [db_impl/db_impl_compaction_flush.cc:2516] [default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos; will stop at (end) I have removed a lot of interim log messages to save space. During each compaction the monitor process writes approximately 500-600 MB of data to disk over a short period of time. These writes add up to tens of gigabytes per hour and hundreds of gigabytes per day. Monitor rocksdb and compaction options are default: "mon_compact_on_bootstrap": "false", "mon_compact_on_start": "false", "mon_compact_on_trim": "true", "mon_rocksdb_options": "write_buffer_size=33554432,compression=kNoCompression,level_compaction_dynamic_level_bytes=true", Is this expected behavior? Is this something I can adjust in order to extend the system storage life? Best regards, Zakhar

1 week, 3 days

5
32
0 0

Ceph Users Feedback Survey

by Neha Ojha

Hi everyone, On behalf of the Ceph Foundation Board, I would like to announce the creation of, and cordially invite you to, the first of a recurring series of meetings focused solely on gathering feedback from the users of Ceph. The overarching goal of these meetings is to elicit feedback from the users, companies, and organizations who use Ceph in their production environments. You can find more details about the motivation behind this effort in our user survey [1] that we highly encourage all of you to take. This is an extension of the Ceph User Dev Meeting with concerted focus on Performance (led by Vincent Hsu, IBM) and Orchestration/Deployment (led by Matt Leonard, Bloomberg), to start off with. We would like to kick off this series of meetings on March 21, 2024. The survey will be open until March 18, 2024. Looking forward to hearing from you! Thanks, Neha [1] https://docs.google.com/forms/d/15aWxoG4wSQz7ziBaReVNYVv94jA0dSNQsDJGqmHCLM…

1 week, 3 days

1
2
0 0

Setting S3 bucket policies with multi-tenants

by Thomas Bennett

Hi, I'm running Ceph Quincy (17.2.6) with a rados-gateway. I have muti tenants, for example: - Tenant1$manager - Tenant1$readwrite I would like to set a policy on a bucket (backups for example) owned by *Tenant1$manager* to allow *Tenant1$readwrite* access to that bucket. I can't find any documentation that discusses this scenario. Does anyone know how to specify the Principle and Resource section of a policy.json file? Or any other configuration that I might be missing? I've tried some variations on Principal and Resource including and excluding tenant information, but not no luck yet. For example: { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"AWS": ["arn:aws:iam:::user/*Tenant1$readwrite*"]}, "Action": ["s3:ListBucket","s3:GetObject", ,"s3:PutObject"], "Resource": [ "arn:aws:s3:::*Tenant1/backups*" ] }] } I'm using s3cmd for testing, so: s3cmd --config s3cfg.manager setpolicy policy.json s3://backups/ Returns: s3://backups/: Policy updated And then testing: s3cmd --config s3cfg.readwrite ls s3://backups/ ERROR: Access to bucket 'backups' was denied ERROR: S3 error: 403 (AccessDenied) Thanks, Tom

1 week, 4 days

4
4
0 0

MDS Behind on Trimming...

by Erich Weiler

Hi All, I've been battling this for a while and I'm not sure where to go from here. I have a Ceph health warning as such: # ceph -s cluster: id: 58bde08a-d7ed-11ee-9098-506b4b4da440 health: HEALTH_WARN 1 MDSs report slow requests 1 MDSs behind on trimming services: mon: 5 daemons, quorum pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d) mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz mds: 1/1 daemons up, 2 standby osd: 46 osds: 46 up (since 9h), 46 in (since 2w) data: volumes: 1/1 healthy pools: 4 pools, 1313 pgs objects: 260.72M objects, 466 TiB usage: 704 TiB used, 424 TiB / 1.1 PiB avail pgs: 1306 active+clean 4 active+clean+scrubbing+deep 3 active+clean+scrubbing io: client: 123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr And the specifics are: # ceph health detail HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked > 30 secs [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250) max_segments: 250, num_segments: 13884 That "num_segments" number slowly keeps increasing. I suspect I just need to tell the MDS servers to trim faster but after hours of googling around I just can't figure out the best way to do it. The best I could come up with was to decrease "mds_cache_trim_decay_rate" from 1.0 to .8 (to start), based on this page: https://www.suse.com/support/kb/doc/?id=000019740 But it doesn't seem to help, maybe I should decrease it further? I am guessing this must be a common issue...? I am running Reef on the MDS servers, but most clients are on Quincy. Thanks for any advice! cheers, erich

1 week, 5 days

5
24
0 0

Call for Interest: Managed SMB Protocol Support

by John Mulligan

Hello Ceph List, I'd like to formally let the wider community know of some work I've been involved with for a while now: adding Managed SMB Protocol Support to Ceph. SMB being the well known network file protocol native to Windows systems and supported by MacOS (and Linux). The other key word "managed" meaning integrating with Ceph management tooling - in this particular case cephadm for orchestration and eventually a new MGR module for managing SMB shares. The effort is still in it's very early stages. We have a PR adding initial support for Samba Containers to cephadm [1] and a prototype for an smb MGR module [2]. We plan on using container images based on the samba-container project [3] - a team I am already part of. What we're aiming for is a feature set similar to the current NFS integration in Ceph, but with a focus on bridging non-Linux/Unix clients to CephFS using a protocol built into those systems. A few major features we have planned include: * Standalone servers (internally defined users/groups) * Active Directory Domain Member Servers * Clustered Samba support * Exporting Samba stats via Prometheus metrics * A `ceph` cli workflow loosely based on the nfs mgr module I wanted to share this information in case there's wider community interest in this effort. I'm happy to take your questions / thoughts / suggestions in this email thread, via Ceph slack (or IRC), or feel free to attend a Ceph Orchestration weekly meeting! I try regularly attend and we sometimes discuss design aspects of the smb effort there. It's on the Ceph Community Calendar. Thanks! [1] - https://github.com/ceph/ceph/pull/55068 [2] - https://github.com/ceph/ceph/pull/56350 [3] - https://github.com/samba-in-kubernetes/samba-container/ Thanks for reading, --John Mulligan

2 weeks, 1 day

7
16
0 0

Cephadm host keeps trying to set osd_memory_target to less than minimum

by mads2a＠gmail.com

I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB of RAM each, and the remaining have 24gb For some reason i am unable to identify, the first host in the cluster appears to constantly be trying to set the osd_memory_target variable to roughly half of what the calculated minimum is for the cluster, i see the following spamming the logs constantly Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing value: Value '480485376' is below minimum 939524096 Default is set to 4294967296. I did double check and osd_memory_base (805306368) + osd_memory_cache_min (134217728) adds up to minimum exactly osd_memory_target_autotune is currently enabled. But i cannot for the life of me figure out how it is arriving at 480485376 as a value for that particular host that even has the most RAM. Neither the cluster or the host is even approaching max utilization on memory, so it's not like there are processes competing for resources.

2 weeks, 1 day

3
11
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2024