Hi,
So after, looking into OSDs memory usage, which seem to be fine, on a v16.2.13 running with cephadm, on el8, it seems that the kernel is using a lot of memory.
# smem -t -w -k
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 65.0G 18.6G 46.4G
userspace memory 50.1G 260.5M 49.9G
free memory 9.9G 9.9G 0
---------------------------------------------------------- 125.0G 28.8G 96.3G
Comparing with a similar other cluster, same OS, same ceph version, but running packages instead if containers, and machines have a little bit more memory:
# smem -t -w -k
Area Used Cache Noncache
firmware/hardware 0 0 0
kernel image 0 0 0
kernel dynamic memory 52.8G 50.5G 2.4G
userspace memory 123.9G 198.5M 123.7G
free memory 10.6G 10.6G 0
---------------------------------------------------------- 187.3G 61.3G 126.0G
Does anyone have an idea why when using containers with podman the kernel needs a lot more memory?
Luis Domingues
Proton AG
Dear all,
How are you?
I have a cluster on Pacific with 3 hosts, each one with 1 mon, 1 mgr
and 12 OSDs.
One of the hosts, darkside1, has been out of quorum according to ceph
status.
Systemd showed 4 services dead, two mons and two mgrs.
I managed to systemctl restart one mon and one mgr, but even after
several attempts, the remaining mon and mgr services, when asked to
restart, keep returning to a failed state after a few seconds. They try
to auto-restart and then go into a failed state where systemd requires
me to manually set them to "reset-failed" before trying to start again.
But they never stay up. There are no clear messages about the issue in
/var/log/ceph/cephadm.log.
The host is still out of quorum.
I have failed to "turn on debug" as per
https://docs.ceph.com/en/pacific/rados/troubleshooting/log-and-debug/.
It seems I do not know the proper incantantion for "ceph daemon X config
show", no string for X seems to satisfy this command. I have tried
adding this:
[mon]
debug mon = 20
To my ceph.conf, but no additional lines of log are sent to
/var/log/cephadm.log
so I'm sorry I can´t provide more details.
Could someone help me debug this situation? I am sure that if just
reboot the machine, it will start up the services properly, as it always
has done, but I would prefer to fix this without this action.
Cordially,
Renata.
Hi Ceph users,
I have a Ceph 16.2.7 cluster that so far has been replicated over the `host` failure domain.
All `hosts` have been chosen to be in different `datacenter`s, so that was sufficient.
Now I wish to add more hosts, including some in already-used data centers, so I'm planning to use CRUSH's `datacenter` failure domain instead.
My problem is that when I add the `datacenter`s into the CRUSH tree, Ceph decides that it should now rebalance the entire cluster.
This seems unnecessary, and wrong.
Before, `ceph osd tree` (some OSDs omitted for legibility):
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 440.73514 root default
-3 146.43625 host node-4
2 hdd 14.61089 osd.2 up 1.00000 1.00000
3 hdd 14.61089 osd.3 up 1.00000 1.00000
-7 146.43625 host node-5
14 hdd 14.61089 osd.14 up 1.00000 1.00000
15 hdd 14.61089 osd.15 up 1.00000 1.00000
-10 146.43625 host node-6
26 hdd 14.61089 osd.26 up 1.00000 1.00000
27 hdd 14.61089 osd.27 up 1.00000 1.00000
After assigning of `datacenter` crush buckets:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 440.73514 root default
-18 146.43625 datacenter FSN-DC16
-7 146.43625 host node-5
14 hdd 14.61089 osd.14 up 1.00000 1.00000
15 hdd 14.61089 osd.15 up 1.00000 1.00000
-17 146.43625 datacenter FSN-DC18
-10 146.43625 host node-6
26 hdd 14.61089 osd.26 up 1.00000 1.00000
27 hdd 14.61089 osd.27 up 1.00000 1.00000
-16 146.43625 datacenter FSN-DC4
-3 146.43625 host node-4
2 hdd 14.61089 osd.2 up 1.00000 1.00000
3 hdd 14.61089 osd.3 up 1.00000 1.00000
This shows that the tree is essentially unchanged, it just "gained a level".
In `ceph status` I now get:
pgs: 1167541260/1595506041 objects misplaced (73.177%)
If I remove the `datacenter` level again, then the misplacement disappears.
On a minimal testing cluster, this misplacement issue did not appear.
Why does Ceph think that these objects are misplaced when I add the datacenter level?
Is there a more correct way to do this?
Thanks!
hello.
I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9 and 10 are in standby.
I have one ceph filesystem, and 2 mds are trimming.
Under one FILESYSTEM, there are 6 MDSs in RESOLVE, 1 MDS in REPLAY, and 3 in ACTIVE.
For some reason, since 36 hours ago, RESOLVE is stuck in TRIMMING, and so are the MDSs in REPLAY.
I've also tried FAILing each MDS, but to no avail.
I think something should change when the MDS in REPLAY goes to RESOLVE, but I don't know what.
Even looking at the logs of the REPLAY MDS, it's hard to see any messages other than it is TERMINATED every 11 minutes.
I'm desperate for someone's help.
At 01:27 this morning I received the first email about MDS cache is too large (mailing happens every 15 minutes if something happens). Looking into it, it was again a standby-replay host which stops working.
At 01:00 a few rsync processes start in parallel on a client machine. This copies data from a NFS share to Cephfs share to sync the latest changes. (we want to switch to Cephfs in the near future).
This crashing of the standby-replay mds happend a couple times now, so I think it would be good to get some help. Where should I look next?
Some cephfs information
----------------------------------
# ceph fs status
atlassian-opl - 8 clients
=============
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active atlassian-opl.mds5.zsxfep Reqs: 0 /s 7830 7803 635 3706
0-s standby-replay atlassian-opl.mds6.svvuii Evts: 0 /s 3139 1924 461 0
POOL TYPE USED AVAIL
cephfs.atlassian-opl.meta metadata 2186M 1161G
cephfs.atlassian-opl.data data 23.0G 1161G
atlassian-prod - 12 clients
==============
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active atlassian-prod.mds1.msydxf Reqs: 0 /s 2703k 2703k 905k 1585
1 active atlassian-prod.mds2.oappgu Reqs: 0 /s 961k 961k 317k 622
2 active atlassian-prod.mds3.yvkjsi Reqs: 0 /s 2083k 2083k 670k 443
0-s standby-replay atlassian-prod.mds4.qlvypn Evts: 0 /s 352k 352k 102k 0
1-s standby-replay atlassian-prod.mds5.egsdfl Evts: 0 /s 873k 873k 277k 0
2-s standby-replay atlassian-prod.mds6.ghonso Evts: 0 /s 2317k 2316k 679k 0
POOL TYPE USED AVAIL
cephfs.atlassian-prod.meta metadata 58.8G 1161G
cephfs.atlassian-prod.data data 5492G 1161G
MDS version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
When looking at the log on the MDS server, I've got the following:
2023-07-21T01:21:01.942+0000 7f668a5e0700 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2023-07-21T01:23:13.856+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5671 from mon.1
2023-07-21T01:23:18.369+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5672 from mon.1
2023-07-21T01:23:31.719+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5673 from mon.1
2023-07-21T01:23:35.769+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5674 from mon.1
2023-07-21T01:28:23.764+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5675 from mon.1
2023-07-21T01:29:13.657+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5676 from mon.1
2023-07-21T01:33:43.886+0000 7f6688ddd700 1 mds.atlassian-prod.pwsoel13143.qlvypn Updating MDS map to version 5677 from mon.1
(and another 20 lines about updating MDS map)
Alert mailings:
Mail at 01:27
----------------------------------
HEALTH_WARN
--- New ---
[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
mds.atlassian-prod.mds4.qlvypn(mds.0): MDS cache is too large (13GB/9GB); 0 inodes in use by clients, 0 stray files
=== Full health status ===
[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
mds.atlassian-prod.mds4.qlvypn(mds.0): MDS cache is too large (13GB/9GB); 0 inodes in use by clients, 0 stray files
Mail at 03:27
----------------------------------
HEALTH_OK
--- Cleared ---
[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
mds.atlassian-prod.mds4.qlvypn(mds.0): MDS cache is too large (14GB/9GB); 0 inodes in use by clients, 0 stray files
=== Full health status ===
Mail at 04:12
----------------------------------
HEALTH_WARN
--- New ---
[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
mds.atlassian-prod.mds4.qlvypn(mds.0): MDS cache is too large (15GB/9GB); 0 inodes in use by clients, 0 stray files
=== Full health status ===
[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
mds.atlassian-prod.mds4.qlvypn(mds.0): MDS cache is too large (15GB/9GB); 0 inodes in use by clients, 0 stray files
Best regards,
Sake
Hello dear CEPH users and developers,
we're dealing with strange problems.. we're having 12 node alma linux 9 cluster,
initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch
of KVM virtual machines accessing volumes using RBD.
everything is working well, but there is strange and for us quite serious issue
- speed of write operations (both sequential and random) is constantly degrading
drastically to almost unusable numbers (in ~1week it drops from ~70k 4k writes/s
from 1 VM to ~7k writes/s)
When I restart all OSD daemons, numbers immediately return to normal..
volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
INTEL SSDPE2KX080T8 NVMEs.
I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..
I don't see anything suspicious, nodes load is quite low, no logs errors,
network latency and throughput is OK too
Anyone having simimar issue?
I'd like to ask for hints on what should I check further..
we're running lots of 14.2.x and 15.2.x clusters, none showing similar
issue, so I'm suspecting this is something related to quincy
thanks a lot in advance
with best regards
nikola ciprich
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: servis(a)linuxbox.cz
-------------------------------------