Hi Derek,
first of all some BlueStore design overview to make sure we're on the
same plate.
BlueFS doesn't keep all the BlueStore data but just RocksDB part of it.
In your case BlueFS shares the same device with BlueStore user data.
Some space rebalance procedure takes periodically place to make sure
BlueFS has enough space to keep DB's data.
Hence there is a primary BlueStore space allocator which tracks the
whole volume space. And there is BlueFS one which is gifted by primary
allocator with some space depending on its needs.
Some observation on your case:
1) bluefs-bdev-sizes reports total device space and usage for BlueFS (!)
part of it. I.e. 22GiB are for BlueFS only, it provides no insight about
overall space usage.
2) Looks like Bluestore allocator complains about lack of free space.
Which means BlueFS plus user data took all the space. See:
-14> 2020-03-15 14:43:47.572 7f32925dd700 -1
bluestore(/var/lib/ceph/osd/ceph-681) _do_alloc_write failed to allocate
0x400000 allocated 0x 3ac000 min_alloc_size 0x4000 available 0x 0
-13> 2020-03-15 14:43:47.572 7f32925dd700 -1
bluestore(/var/lib/ceph/osd/ceph-681) _do_write _do_alloc_write failed
with (28) No space left on device
-12> 2020-03-15 14:43:47.572 7f32925dd700 -1
bluestore(/var/lib/ceph/osd/ceph-681) _txc_add_transaction error (28) No
space left on device not handled on operation 10 (op 4, counting from 0)
-11> 2020-03-15 14:43:47.572 7f32925dd700 -1
bluestore(/var/lib/ceph/osd/ceph-681) ENOSPC from bluestore,
misconfigured cluster
3) repair suffers from both lack of space for both allocators. BlueFS
one tries to acquire some additional space from the primary allocator
which fails to do that:
2020-03-15 23:55:14.816 7f0d3fac2c00 -1
bluestore(/var/lib/ceph/osd/ceph-709) allocate_bluefs_freespace failed
to allocate on 0xb000000 min_size 0xb000000 > allocated total 0x80000
bluefs_shared_alloc_size 0x10000 allocated 0x80000 available 0x 8000
2020-03-15 23:55:14.816 7f0d3fac2c00 -1 bluefs _allocate failed to
expand slow device to fit +0xaffa895
2020-03-15 23:55:14.816 7f0d3fac2c00 -1 bluefs _flush_range allocated:
0x0 offset: 0x0 length: 0xaffa895
OSD-709 has been already expanded, right?
What's the error reported by fsck?
4) OSD.681 has a number of checksum verification errors when reading DB
data:
2020-03-15 14:03:52.890 7f6311ffa700 3 rocksdb:
[table/block_based_table_reader.cc:1117] Encountered error while reading
data from compression dictionary block Corruption: block checksum
mismatch: expected 0, got 2324967102 in db/012948.sst offset
18446744073709551615 size 18446744073709551615
Can't say if this is bound to space shortage or not. Wondering if other
OSDs reported(-ing) something similar?
Thanks,
Igor
On 3/16/2020 7:15 AM, Derek Yarnell wrote:
Hi,
We have a production cluster that just suffered an issue with multiple
of our NVMe OSDs. Multiple of them died (>12) with errors that they no
longer had space with a 'ENOSPC from bluestore, misconfigured cluster'
error over 4 nodes. These are all simple one device bluestore osds.
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus
(stable)
This is an example[0] of one of the logs. In this case each of 8 NVMe
OSDs on a node have 106GB of space allocated to each bluestore NVMe OSD.
The ceph-bluestore-tool bluefs-bdev-sizes output only lists 22GiB for
osd 681. I extended the space of bluestore on a few of the OSDs via LVM
and then the bluefs-bdev-expand command. This worked for a few and not
for others.
Some of the ones that it did work for recovered for a bit then
re-entered the error state. Trying to extend the allocation didn't work
after that. When they failed again I ran the fsck which reported that
it found 1 error and then running repair I got a rather long stack trace[1].
# ceph-bluestore-tool --log-level 30 --command bluefs-bdev-sizes --path
/var/lib/ceph/osd/ceph-681
inferring bluefs devices from bluestore path
slot 1 /var/lib/ceph/osd/ceph-681/block -> /dev/dm-33
1 : device size 0x1a80000000 : own
0x[2480000~10000,24a0000~10000,2520000~60000,25f0000~c0000,2720000~50000,28a0000~110000,2a20000~230000,2cc0000~260000,2f30000~220000,31c0000~6b0000,38a0000~10000,3990000~3e0000,3d80000~530000,42d0000~590000,48d0000~400000,4d00000~7d0000,54f0000~c50000,6150000~10000,6190000~150000,6350000~c0000,6480000~160000,6640000~1e0000,6870000~c0000,6a00000~30000,6a40000~240000,6dd0000~310000,7210000~b0000,73a0000~b0000,76a0000~180000,7830000~80000,78e0000~240000,7b70000~90000,7c50000~b0000,7ef0000~140000,8040000~30000,8180000~250000,8440000~50000,84b0000~110000,8610000~c0000,9e20000~20000,9e50000~b0000,9f10000~60000,9f80000~30000,dd80000~180000,df70000~6a0000,e620000~5ae0000,15200000~3510000,187f0000~bf0000,19490000~1070000,1ab70000~4c0000,1b400000~7d0000,1bbe0000~c20000,1cd10000~340000,1d3a0000~860000,1dd00000~2e00000,20c00000~3f00000,24d00000~700000,25600000~700000,26100000~200000,26400000~300000,26b00000~600000,27400000~400000,27ba0000~6e0000,28500000~1d00000,2a400000~700000,2ac00000~100000,2
b100000~300000,2b470000~120000,2b700000~500000,2c000000~200000,2c400000~400000,2ca00000~100000,2cf00000~300000,2d340000~39b0000,30d00000~1f00000,32e00000~4bf0000,380a0000~3c0000,38500000~c0000,38bd0000~400000,390b0000~340000,39400000~100000,39900000~1000000,3ac00000~5d00000,40b90000~400000,41280000~db50000,4ee00000~700000,4f900000~4500000,54390000~100000,54e00000~18400000,6d800000~20d0000,6f8f0000~1a10000,71400000~4500000,76100000~300000,766e0000~6860000,7dd00000~c00000,7eac0000~a0000,7ef90000~f190000,8e1f0000~80000,8e410000~60000,8e480000~20000,8e4b0000~20000,8e5c0000~50000,8e7e0000~50000,8f160000~60000,8f240000~a0000,90000000~15e90000,a6200000~c3a0000,b25d0000~630000,b3000000~c00000,b3ee0000~90000,b4200000~d00000,b5a70000~160000,b63f0000~2a0000,b6720000~2820000,bab00000~400000,bbf60000~10ad0000,ccb90000~2300000,cf000000~2b00000,d1ca0000~10000,d1e00000~1400000,d3230000~1df0000,d5200000~1a00000,d6d00000~800000,d75e0000~6f0000,d7f00000~d00000,d9100000~400000,d9900000~d00000,da800000~
600000,daf10000~400000,db700000~1600000,dd280000~20000,dd670000~390000,dda30000~400000,de190000~70000,de2a0000~370000,de660000~20000,de700000~14770000,f3600000~700000,f3db0000~960000,f49e0000~5b00000,fa600000~c00000,fb300000~510000,fbb00000~100000,fbeb0000~450000,fc400000~2b0000,fd400000~400000,fde00000~c00000,ff0b0000~50000,ff200000~800000,ffd60000~10000,fff00000~a0000,100200000~300000,101600000~100000,101750000~300000,102120000~1e0000,1027f0000~a00000,103600000~330000,103b00000~200000,103e60000~4a0000,104310000~c00000,105030000~1200000,106800000~100000,106b20000~400000,107000000~300000,1073e0000~400000,107950000~86b0000,110140000~d0000,110350000~2e0000,110e20000~20000,110eb0000~a0000,110f60000~60000,110fd0000~1f0000,1112a0000~f0000,111420000~30000,1115b0000~30000,111620000~150000,111790000~40000,112560000~180000,112730000~180000,1129b0000~50000,112f90000~4a0000,113840000~c0000,113ea0000~40000,113fb0000~130000,114100000~310000,114470000~10000,114620000~120000,114810000~120000,114a0
0000~20000,114a90000~f0000,114c60000~e0000,114e80000~20000,114f70000~140000,1150c0000~50000,1151f0000~320000,1155f0000~10000,115670000~226f0000,137e30000~800000,138b50000~400000,139400000~1500000,13ae00000~4500000,13f480000~400000,13f950000~1b6e0000,15b700000~400000,15c000000~300000,15c600000~700000,15d9e0000~1820000,15f400000~c00000,160400000~d00000,1613a0000~630000,1619e0000~d20000,162800000~1b00000,164600000~7550000,170d00000~1800000,172580000~100000,172d70000~1c190000,18f100000~40e0000,193700000~400000,193c90000~6970000,19aa90000~188d0000,1b3770000~c220000,1bff00000~1200000,1c12f0000~400000,1c2c00000~400000,1c3f80000~400000,1c5300000~4200000,1c95c0000~b9a40000,283800000~1800000,289000000~3800000,28d000000~6000000,293800000~2000000,296000000~e000000,2a4800000~465d0000,4b6100000~4c560000,502800000~32500000,cb8500000~10f600000,1139800000~7a000000,1668d30000~3d290000,1794000000~3e800000,191c000000~3ee00000,1a49800000~4110000,1a4f800000~40f0000,1a70800000~4100000,1a76000000~3c10000]
= 0x582550000 : using 0x56d090000(22 GiB)
Any help here would be appreciated, I have stopped out CephFS file
system but our radosgw is also impacted.
[0] -
ftp://ftp.umiacs.umd.edu/pub/derek/ceph-osd.681.log
[1] -
ftp://ftp.umiacs.umd.edu/pub/derek/ceph-osd.709.repair
Thanks,
derek