Hi Jan,
indeed fsck logs for the OSDs other than osd.0 look good so it would be
interesting to see OSD startup logs for them. Preferably to have that
for multiple (e.g. 3-4) OSDs to get the pattern.
Original upgrade log(s) would be nice to see as well.
You might want to use Google Drive or any other publicly available file
sharing site for that.
Thanks,
Igor
On 05/01/2024 10:25, Jan Marek wrote:
> Hi Igor,
>
> I've tried to start only osd.1, which seems to be fsck'd OK, but
> it crashed :-(
>
> I search logs and I've found, that I have logs from 22.12.2023,
> when I've did a upgrade (I have set logging to journald).
>
> Would you be interested in those logs? This file have 30MB in
> bzip2 format, how I can share it with you?
>
> It contains crash log from start osd.1 too, but I can cut out
> from it and send it to list...
>
> Sincerely
> Jan Marek
>
> Dne Čt, led 04, 2024 at 02:43:48 CET napsal(a) Jan Marek:
>> Hi Igor,
>>
>> I've ran this oneliner:
>>
>> for i in {0..12}; do export CEPH_ARGS="--log-file osd."${i}".log
--debug-bluestore 5/20" ; ceph-bluestore-tool --path
/var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.${i} --command fsck ; done;
>>
>> On osd.0 it crashed very quickly, on osd.1 it is still working.
>>
>> I've send those logs in one e-mail.
>>
>> But!
>>
>> I've tried to list disk devices in monitor view, and I've got
>> very interesting screenshot - some part I've emphasized by red
>> rectangulars.
>>
>> I've got a json from syslog, which was as a part cephadm call,
>> where it seems to be correct (for my eyes).
>>
>> Can be this coincidence for this problem?
>>
>> Sincerely
>> Jan Marek
>>
>> Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov:
>>> Hi Jan,
>>>
>>> may I see the fsck logs from all the failing OSDs to see the pattern. IIUC
>>> the full node is suffering from the issue, right?
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>> On 1/2/2024 10:53 AM, Jan Marek wrote:
>>>> Hello once again,
>>>>
>>>> I've tried this:
>>>>
>>>> export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore
5/20"
>>>> ceph-bluestore-tool --path
/var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck
>>>>
>>>> And I've sending /tmp/osd.0.log file attached.
>>>>
>>>> Sincerely
>>>> Jan Marek
>>>>
>>>> Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov:
>>>>> Hi Jan,
>>>>>
>>>>> this doesn't look like RocksDB corruption but rather like some
BlueStore
>>>>> metadata inconsistency. Also assertion backtrace in the new log
looks
>>>>> completely different from the original one. So in an attempt to find
any
>>>>> systematic pattern I'd suggest to run fsck with verbose logging
for every
>>>>> failing OSD. Relevant command line:
>>>>>
>>>>> CEPH_ARGS="--log-file osd.N.log --debug-bluestore 5/20"
>>>>> bin/ceph-bluestore-tool --path <path-to-osd> --command fsck
>>>>>
>>>>> Unlikely this will fix anything it's rather a way to collect logs
to get
>>>>> better insight.
>>>>>
>>>>>
>>>>> Additionally you might want to run similar fsck for a couple of
healthy OSDs
>>>>> - curious if it succeeds as I have a feeling that the problem with
crashing
>>>>> OSDs had been hidden before the upgrade and revealed rather than
caused by
>>>>> it.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Igor
>>>>>
>>>>> On 12/29/2023 3:28 PM, Jan Marek wrote:
>>>>>> Hello Igor,
>>>>>>
>>>>>> I'm attaching a part of syslog creating while starting
OSD.0.
>>>>>>
>>>>>> Many thanks for help.
>>>>>>
>>>>>> Sincerely
>>>>>> Jan Marek
>>>>>>
>>>>>> Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov:
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> IIUC the attached log is for ceph-kvstore-tool, right?
>>>>>>>
>>>>>>> Can you please share full OSD startup log as well?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Igor
>>>>>>>
>>>>>>> On 12/27/2023 4:30 PM, Jan Marek wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I've problem: my ceph cluster (3x mon nodes, 6x osd
nodes, every
>>>>>>>> osd node have 12 rotational disk and one NVMe device for
>>>>>>>> bluestore DB). CEPH is installed by ceph orchestrator and
have
>>>>>>>> bluefs storage on osd.
>>>>>>>>
>>>>>>>> I've started process upgrade from version 17.2.6 to
18.2.1 by
>>>>>>>> invocating:
>>>>>>>>
>>>>>>>> ceph orch upgrade start --ceph-version 18.2.1
>>>>>>>>
>>>>>>>> After upgrade of mon and mgr processes orchestrator tried
to
>>>>>>>> upgrade the first OSD node, but they are falling down.
>>>>>>>>
>>>>>>>> I've stop the process of upgrade, but I have 1 osd
node
>>>>>>>> completely down.
>>>>>>>>
>>>>>>>> After upgrade I've got some error messages and
I've found
>>>>>>>> /var/lib/ceph/crashxxxx directories, I attach to this
message
>>>>>>>> files, which I've found here.
>>>>>>>>
>>>>>>>> Please, can you advice, what now I can do? It seems, that
rocksdb
>>>>>>>> is even non-compatible or corrupted :-(
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>> Sincerely
>>>>>>>> Jan Marek
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>>> --
>>>>>>> Igor Fedotov
>>>>>>> Ceph Lead Developer
>>>>>>>
>>>>>>> Looking for help with your Ceph cluster? Contact us at
https://croit.io
>>>>>>>
>>>>>>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>>>>>>> CEO: Martin Verges - VAT-ID: DE310638492
>>>>>>> Com. register: Amtsgericht Munich HRB 231263
>>>>>>> Web:
https://croit.io | YouTube:
https://goo.gl/PGE1Bx
>>>>>>>
>>>>> --
>>>>> Igor Fedotov
>>>>> Ceph Lead Developer
>>>>>
>>>>> Looking for help with your Ceph cluster? Contact us at
https://croit.io
>>>>>
>>>>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>>>>> CEO: Martin Verges - VAT-ID: DE310638492
>>>>> Com. register: Amtsgericht Munich HRB 231263
>>>>> Web:
https://croit.io | YouTube:
https://goo.gl/PGE1Bx
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> --
>>> Igor Fedotov
>>> Ceph Lead Developer
>>>
>>> Looking for help with your Ceph cluster? Contact us at
https://croit.io
>>>
>>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>>> CEO: Martin Verges - VAT-ID: DE310638492
>>> Com. register: Amtsgericht Munich HRB 231263
>>> Web:
https://croit.io | YouTube:
https://goo.gl/PGE1Bx
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> --
>> Ing. Jan Marek
>> University of South Bohemia
>> Academic Computer Centre
>> Phone: +420389032080
>>
http://www.gnu.org/philosophy/no-word-attachments.cs.html
>
>
>
>
>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>