Hello Igor,
i updated all servers to latest 4.19.97 kernel but this doesn't fix the
situation.
I can provide you with all those logs - any idea where to upload / how
to sent them to you?
Greets,
Stefan
Am 20.01.20 um 13:12 schrieb Igor Fedotov:
> Hi Stefan,
>
> these lines are result of transaction dump performed on a failure during
> transaction submission (which is shown as
>
> "submit_transaction error: Corruption: block checksum mismatch code = 2"
>
> Most probably they are out of interest (checksum errors are unlikely to
> be caused by transaction content) and hence we need earlier stuff to
> learn what caused that
>
> checksum mismatch.
>
> It's hard to give any formal overview of what you should look for, from
> my troubleshooting experience generally one may try to find:
>
> - some previous error/warning indications (e.g. allocation, disk access,
> etc)
>
> - prior OSD crashes (sometimes they might have different causes/stack
> traces/assertion messages)
>
> - any timeout or retry indications
>
> - any uncommon log patterns which aren't present during regular running
> but happen each time before the crash/failure.
>
> Anyway I think the inspection depth should be much(?) deeper than
> presumably it is (from what I can see from your log snippets).
>
> Ceph keeps last 10000 log events with an increased log level and dumps
> them on crash with negative index starting at -9999 up to -1 as a prefix.
>
> -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb:
>
>
> It would be great If you share several log snippets for different
> crashes containing these last 10000 lines.
>
>
> Thanks,
>
> Igor
>
>
> On 1/19/2020 9:42 PM, Stefan Priebe - Profihost AG wrote:
>> Hello Igor,
>>
>> there's absolutely nothing in the logs before.
>>
>> What do those lines mean:
>> Put( Prefix = O key =
>>
0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff6f00120000'x'
>>
>> Value size = 480)
>> Put( Prefix = O key =
>>
0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff'o'
>>
>> Value size = 510)
>>
>> on the right size i always see 0xfffffffffffffffeffffffffffffffff on all
>> failed OSDs.
>>
>> greets,
>> Stefan
>> Am 19.01.20 um 14:07 schrieb Stefan Priebe - Profihost AG:
>>> Yes, except that this happens on 8 different clusters with different
>>> hw but same ceph version and same kernel version.
>>>
>>> Greets,
>>> Stefan
>>>
>>>> Am 19.01.2020 um 11:53 schrieb Igor Fedotov <ifedotov(a)suse.de>de>:
>>>>
>>>> So the intermediate summary is:
>>>>
>>>> Any OSD in the cluster can experience interim RocksDB checksum
>>>> failure. Which isn't present after OSD restart.
>>>>
>>>> No HW issues observed, no persistent artifacts (except OSD log)
>>>> afterwards.
>>>>
>>>> And looks like the issue is rather specific to the cluster as no
>>>> similar reports from other users seem to be present.
>>>>
>>>>
>>>> Sorry, I'm out of ideas other then collect all the failure logs and
>>>> try to find something common in them. May be this will shed some
>>>> light..
>>>>
>>>> BTW from my experience it might make sense to inspect OSD log prior
>>>> to failure (any error messages and/or prior restarts, etc) sometimes
>>>> this might provide some hints.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>>
>>>>> On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote:
>>>>> HI Igor,
>>>>>
>>>>>> Am 17.01.20 um 12:10 schrieb Igor Fedotov:
>>>>>> hmmm..
>>>>>>
>>>>>> Just in case - suggest to check H/W errors with dmesg.
>>>>> this happens on around 80 nodes - i don't expect all of those
have not
>>>>> identified hw errors. Also all of them are monitored - no dmesg
>>>>> outpout
>>>>> contains any errors.
>>>>>
>>>>>> Also there are some (not very much though) chances this is
another
>>>>>> incarnation of the following bug:
>>>>>>
https://tracker.ceph.com/issues/22464
>>>>>>
https://github.com/ceph/ceph/pull/24649
>>>>>>
>>>>>> The corresponding PR works around it for main device reads (user
data
>>>>>> only!) but theoretically it might still happen
>>>>>>
>>>>>> either for DB device or DB data at main device.
>>>>>>
>>>>>> Can you observe any bluefs spillovers? Are there any correlation
>>>>>> between
>>>>>> failing OSDs and spillover presence if any, e.g. failing OSDs
always
>>>>>> have a spillover. While OSDs without spillovers never face the
>>>>>> issue...
>>>>>>
>>>>>> To validate this hypothesis one can try to monitor/check (e.g.
once a
>>>>>> day for a week or something)
"bluestore_reads_with_retries"
>>>>>> counter over
>>>>>> OSDs to learn if the issue is happening
>>>>>>
>>>>>> in the system. Non-zero values mean it's there for user
data/main
>>>>>> device and hence is likely to happen for DB ones as well (which
>>>>>> doesn't
>>>>>> have any workaround yet).
>>>>> OK i checked bluestore_reads_with_retries on 360 osds but all of
>>>>> them say 0.
>>>>>
>>>>>
>>>>>> Additionally you might want to monitor memory usage as the above
>>>>>> mentioned PR denotes high memory pressure as potential trigger
for
>>>>>> these
>>>>>> read errors. So if such pressure happens the hypothesis becomes
>>>>>> more valid.
>>>>> we already do this heavily and have around 10GB of memory per OSD.
>>>>> Also
>>>>> no of those machines show any io pressure at all.
>>>>>
>>>>> All hosts show a constant rate of around 38GB to 45GB mem available
in
>>>>> /proc/meminfo.
>>>>>
>>>>> Stefan
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Igor
>>>>>>
>>>>>> PS. Everything above is rather a speculation for now. Available
>>>>>> information is definitely not enough for extensive
troubleshooting
>>>>>> the
>>>>>> cases which happens that rarely.
>>>>>>
>>>>>> You might want to start collecting failure-related information
>>>>>> (including but not limited to failure logs, perf counter dumps,
>>>>>> system
>>>>>> resource reports etc) for future analysis.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> answers inline.
>>>>>>>
>>>>>>> Am 16.01.20 um 21:34 schrieb Igor Fedotov:
>>>>>>>> you may want to run fsck against failing OSDs. Hopefully
it will
>>>>>>>> shed
>>>>>>>> some light.
>>>>>>> fsck just says everything fine:
>>>>>>>
>>>>>>> # ceph-bluestore-tool --command fsck --path
>>>>>>> /var/lib/ceph/osd/ceph-27/
>>>>>>> fsck success
>>>>>>>
>>>>>>>
>>>>>>>> Also wondering if OSD is able to recover (startup and
proceed
>>>>>>>> working)
>>>>>>>> after facing the issue?
>>>>>>> no recover needed. It just runs forever after restarting.
>>>>>>>
>>>>>>>> If so do you have any one which failed multiple times? Do
you
>>>>>>>> have logs
>>>>>>>> for these occurrences?
>>>>>>> may be but there are most probably weeks or month between
those
>>>>>>> failures
>>>>>>> - most probably logs are already deleted.
>>>>>>>
>>>>>>>> Also please note that patch you mentioned doesn't fix
previous
>>>>>>>> issues
>>>>>>>> (i.e. duplicate allocations), it prevents from new ones
only.
>>>>>>>>
>>>>>>>> But fsck should show them if any...
>>>>>>> None showed.
>>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Igor
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG
wrote:
>>>>>>>>> Hi Igor,
>>>>>>>>>
>>>>>>>>> ouch sorry. Here we go:
>>>>>>>>>
>>>>>>>>> -1> 2020-01-16 01:10:13.404090 7f3350a14700
-1 rocksdb:
>>>>>>>>> submit_transaction error: Corruption: block checksum
mismatch
>>>>>>>>> code = 2
>>>>>>>>> Rocksdb transaction:
>>>>>>>>> Put( Prefix = M key =
>>>>>>>>>
0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 97)
>>>>>>>>> Put( Prefix = M key =
>>>>>>>>>
0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 93)
>>>>>>>>> Put( Prefix = M key =
>>>>>>>>>
0x0000000000000916'.0000823257.00000000000073922044' Value size
>>>>>>>>> = 196)
>>>>>>>>> Put( Prefix = M key =
>>>>>>>>>
0x0000000000000916'.0000823257.00000000000073922045' Value size
>>>>>>>>> = 184)
>>>>>>>>> Put( Prefix = M key =
0x0000000000000916'._info' Value size = 899)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 418)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 474)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 392)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 317)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 521)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 558)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 649)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 449)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 580)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 435)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 569)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 465)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 710)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 599)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 372)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 130)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 540)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 47)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 1731)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 675)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 395)
>>>>>>>>> Put( Prefix = O key =
>>>>>>>>>
0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Value size = 1328)
>>>>>>>>> Put( Prefix = X key = 0x0000000018a38deb Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x0000000018a38dea Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a035b Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a035c Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0355 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0356 Value size =
17)
>>>>>>>>> Put( Prefix = X key = 0x000000001a54f6e4 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000001b1c061e Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a038f Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0389 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0358 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a035f Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0357 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0387 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a038a Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0388 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x00000000134c3fbe Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x00000000134c3fb5 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a036e Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a036d Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x00000000134c3fb8 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a0371 Value size =
14)
>>>>>>>>> Put( Prefix = X key = 0x000000000d7a036a Value size =
14)
>>>>>>>>> 0> 2020-01-16 01:10:13.413759 7f3350a14700
-1
>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In
function 'void
>>>>>>>>> BlueStore::_kv_sync_thread()' thread 7f3350a14700
time 2020-01-16
>>>>>>>>> 01:10:13.404113
>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808:
FAILED
>>>>>>>>> assert(r == 0)
>>>>>>>>>
>>>>>>>>> ceph version 12.2.12-11-gd3eae83543
>>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous
(stable)
>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char
const*, int,
>>>>>>>>> char
>>>>>>>>> const*)+0x102) [0x55c9a712d232]
>>>>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5)
[0x55c9a6fb54b5]
>>>>>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd)
[0x55c9a6ff608d]
>>>>>>>>> 4: (()+0x7494) [0x7f33615f9494]
>>>>>>>>> 5: (clone()+0x3f) [0x7f3360680acf]
>>>>>>>>>
>>>>>>>>> I already picked those:
>>>>>>>>>
https://github.com/ceph/ceph/pull/28644
>>>>>>>>>
>>>>>>>>> Greets,
>>>>>>>>> Stefan
>>>>>>>>> Am 16.01.20 um 17:00 schrieb Igor Fedotov:
>>>>>>>>>> Hi Stefan,
>>>>>>>>>>
>>>>>>>>>> would you please share log snippet prior the
assertions? Looks
>>>>>>>>>> like
>>>>>>>>>> RocksDB is failing during transaction
submission...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Igor
>>>>>>>>>>
>>>>>>>>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost
AG wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> does anybody know a fix for this ASSERT /
crash?
>>>>>>>>>>>
>>>>>>>>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1
>>>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In
function 'void
>>>>>>>>>>> BlueStore::_kv_sync_thread()' thread
7f8c3f5ab700 time
>>>>>>>>>>> 2020-01-16
>>>>>>>>>>> 02:02:31.304993
>>>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc:
8808: FAILED assert(r
>>>>>>>>>>> == 0)
>>>>>>>>>>>
>>>>>>>>>>> ceph version 12.2.12-11-gd3eae83543
>>>>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213)
luminous (stable)
>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char
const*, char const*,
>>>>>>>>>>> int, char
>>>>>>>>>>> const*)+0x102) [0x55e6df9d9232]
>>>>>>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5)
[0x55e6df8614b5]
>>>>>>>>>>> 3:
(BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d]
>>>>>>>>>>> 4: (()+0x7494) [0x7f8c50190494]
>>>>>>>>>>> 5: (clone()+0x3f) [0x7f8c4f217acf]
>>>>>>>>>>>
>>>>>>>>>>> all bluestore OSDs are randomly crashing
sometimes (once a
>>>>>>>>>>> week).
>>>>>>>>>>>
>>>>>>>>>>> Greets,
>>>>>>>>>>> Stefan
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>> ceph-users(a)lists.ceph.com
>>>>>>>>>>>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users(a)lists.ceph.com
>>>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>