Hello all! Hope that anybody can help us.
The initial point: Ceph cluster v15.2 (installed and controlled by the Proxmox) with 3
nodes based on physical servers rented from a cloud provider. The volumes provided by Ceph
using CephFS and RBD also. We run 2 MDS daemons but use max_mds=1 so one daemon was in
active state, and another in standby.
On Thursday some of the applications stopped working. After the investigation it was clear
that we have a problem with Ceph, more precisely with СephFS - both MDS daemons suddenly
crashed. We tried to restart them and found that they crashed again immediately after the
start. The crash information:
2024-04-17T17:47:42.841+0000 7f959ced9700 1 mds.0.29134 recovery_done -- successful
recovery!
2024-04-17T17:47:42.853+0000 7f959ced9700 1 mds.0.29134 active_start
2024-04-17T17:47:42.881+0000 7f959ced9700 1 mds.0.29134 cluster recovered.
2024-04-17T17:47:43.825+0000 7f959aed5700 -1 ./src/mds/OpenFileTable.cc: In function
'void OpenFileTable::commit(MDSContext*, uint64_t, int)' thread 7f959aed5700 time
2024-04-17T17:47:43.831243+0000
./src/mds/OpenFileTable.cc: 549: FAILED ceph_assert(count > 0)
Next hours we read tons of articles, studied the documentation, and checked the cluster
status in general by the various diagnostic commands - but didn't find anything wrong.
At evening we decided to upgrade our Ceph cluster; so, we upgraded it to v16, and finally
to v17.2.7. Unfortunately, it didn't solve the problem, MDS continue to crash with the
same error. The only difference that we found is the "1 MDSs report damaged
metadata" in the output of ceph -s - see it below.
I supposed that it may be the well-known bug, but couldn't find the same one on
https://tracker.ceph.com - there are several bugs associated with file OpenFileTable.cc
but not related to ceph_assert(count > 0)
We tried to check the source code of OpenFileTable.cc also, here is a fragment of it, in
function OpenFileTable::_journal_finish
int omap_idx = anchor.omap_idx;
unsigned& count = omap_num_items.at(omap_idx);
ceph_assert(count > 0);
So, we guess that the object map is empty for some object in Ceph, and it is unexpected
behavior. But again, we found nothing wrong in our cluster...
Next, we started with
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/
article - tried to reset the journal (despite that it was Ok all the time) and wipe the
sessions using cephfs-table-tool all reset session command. No result...
Now I decided to continue following this article and run cephfs-data-scan scan_extents
command, we started it on Friday but it is still working (2 from 3 workers finished, so
I'm waiting for the last one; may be I need more workers for the next command
cephfs-data-scan scan_inodes that I plan to run ). But I have a doubt that it will solve
the issue because, again, we guess that we have no problem with our objects in Ceph but
with metadata only...
Is it the new bug? or something else? What should we try additionally to run our MDS
daemon? Any idea is welcome!
The important outputs:
ceph -s
cluster:
id: 4cd1c477-c8d0-4855-a1f1-cb71d89427ed
health: HEALTH_ERR
1 MDSs report damaged metadata
insufficient standby MDS daemons available
83 daemons have recently crashed
3 mgr modules have recently crashed
services:
mon: 3 daemons, quorum asrv-dev-stor-2,asrv-dev-stor-3,asrv-dev-stor-1 (age 22h)
mgr: asrv-dev-stor-2(active, since 22h), standbys: asrv-dev-stor-1
mds: 1/1 daemons up
osd: 18 osds: 18 up (since 22h), 18 in (since 29h)
data:
volumes: 1/1 healthy
pools: 5 pools, 289 pgs
objects: 29.72M objects, 5.6 TiB
usage: 21 TiB used, 47 TiB / 68 TiB avail
pgs: 287 active+clean
2 active+clean+scrubbing+deep
io:
client: 2.5 KiB/s rd, 172 KiB/s wr, 261 op/s rd, 195 op/s wr
ceph fs dump
e29480
enable_multiple, ever_enabled_multiple: 0,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned
encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file
layout v2,10=snaprealm v2}
legacy client fscid: 1
Filesystem 'cephfs' (1)
fs_name cephfs
epoch 29480
flags 12 joinable allow_snaps allow_multimds_snaps
created 2022-11-25T15:56:08.507407+0000
modified 2024-04-18T16:52:29.970504+0000
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
required_client_features {}
last_failure 0
last_failure_osd_epoch 14728
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default
file layouts on dirs,4=dir inode in separate object,5=mds uses versioned
encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file
layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=156636152}
failed
damaged
stopped
data_pools [5]
metadata_pool 6
inline_data disabled
balancer
standby_count_wanted 1
[mds.asrv-dev-stor-1{0:156636152} state up:active seq 6 laggy since
2024-04-18T16:52:29.970479+0000 addr
[v2:172.22.2.91:6800/2487054023,v1:172.22.2.91:6801/2487054023] compat
{c=[1],r=[1],i=[7ff]}]
cephfs-journal-tool --rank=cephfs:0 journal inspect
Overall journal integrity: OK
ceph pg dump summary
version 41137
stamp 2024-04-18T21:17:59.133536+0000
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG
sum 29717605 0 0 0 0 6112544251872
13374192956 28493480 1806575 1806575
OSD_STAT USED AVAIL USED_RAW TOTAL
sum 21 TiB 47 TiB 21 TiB 68 TiB
ceph pg dump pools
POOLID OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG
8 31771 0 0 0 0 131337887503
2482 140 401246 401246
7 839707 0 0 0 0 3519034650971
736 61 399328 399328
6 1319576 0 0 0 0 421044421
13374189738 28493279 206749 206749
5 27526539 0 0 0 0 2461702171417
0 0 792165 792165
2 12 0 0 0 0 48497560
0 0 6991 6991
---
Best regards,
Alexey Gerasimov
System Manager
www.opencascade.com<http://www.opencascade.com/>
www.capgemini.com<http://www.capgemini.com>
[cid:image001.png@01DA93D0.B001CE80]