Hello,

I've been facing some issues with a single node ceph cluster (mimic). I know an environment like this shouldn't be in production but the server end up dealing with operational workloads for the last 2 years.

Some users detected some issues in cephfs; some files not being accessible and hanging the node while trying to list the content of affected folders.

I noticed a heavy memory load on the server. Main memory was consumed by cache as well as quite a reasonable swap.

The command "ceph health detail" reported some inactive PGs. Those PGs didn't exist.

After rebooting the node, an fsck was run in the 3 affected OSDs.

ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-1/

Unfortunately, all of them crashed with a core dump and now they don't start anymore.

The logs report messages like:

2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/version_set.cc:3088] Recovering from manifest file: MANIFEST-004059
2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all background work
2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2019-08-28 03:00:12.999 7f21d787c240 -1 rocksdb: NotFound:
2019-08-28 03:00:12.999 7f21d787c240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db:
2019-08-28 03:00:12.999 7f21d787c240 1 bluefs umount
2019-08-28 03:00:12.999 7f21d787c240 1 stupidalloc 0x0x5650c5255800 shutdown
2019-08-28 03:00:12.999 7f21d787c240 1 bdev(0x5650c5604a80 /var/lib/ceph/osd/ceph-0/block) close
2019-08-28 03:00:13.247 7f21d787c240 1 bdev(0x5650c5604700 /var/lib/ceph/osd/ceph-0/block) close
2019-08-28 03:00:13.479 7f21d787c240 -1 osd.0 0 OSD:init: unable to mount object store
2019-08-28 03:00:13.479 7f21d787c240 -1 ** ERROR: osd init failed: (5) Input/output error

I'm not sure if the fsck has introduced additional damage.

After that, I tried to mark unfound as lost with the following commands:

ceph pg 4.1e mark_unfound_lost revert

ceph pg 9.1d mark_unfound_lost revert

ceph pg 13.3 mark_unfound_lost revert

ceph pg 13.e mark_unfound_lost revert

Currently, since there are 3 OSD down, there are:

316 unclean PGs

76 inactive PGs

root@ceph-s01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-2 0.43599 root ssd
-4 0.43599 disktype ssd_disk
12 ssd 0.43599 osd.12 up 1.00000 1.00000
-1 60.03792 root default
-5 60.03792 disktype hdd_disk
0 hdd 0 osd.0 down 1.00000 1.00000
1 hdd 5.45799 osd.1 down 0 1.00000
2 hdd 5.45799 osd.2 up 1.00000 1.00000
3 hdd 5.45799 osd.3 up 1.00000 1.00000
4 hdd 5.45799 osd.4 up 1.00000 1.00000
5 hdd 5.45799 osd.5 up 1.00000 1.00000
6 hdd 5.45799 osd.6 up 1.00000 1.00000
7 hdd 5.45799 osd.7 down 0 1.00000
8 hdd 5.45799 osd.8 up 1.00000 1.00000
9 hdd 5.45799 osd.9 up 1.00000 1.00000
10 hdd 5.45799 osd.10 up 1.00000 1.00000
11 hdd 5.45799 osd.11 up 1.00000 1.00000

Running the following command, a MANIFEST file appeared in the folder db/lost. I guess that the repair moved here.

# ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-7 --out-dir osd7/

...

db/LOCK

db/MANIFEST-000001

db/OPTIONS-018543

db/OPTIONS-018581

db/lost/

db/lost/MANIFEST-018578

Any ideas? Suggestions?

Thank you.

Regards,

Jordi