Status update:
Finally we have the first patch to fix the issue in master:
https://github.com/ceph/ceph/pull/35201
And ticket has been updated with root cause
analysis:https://tracker.ceph.com/issues/45613On 5/21/2020 2:07 PM, Igor
Fedotov wrote:
@Chris - unfortunately it looks like the corruption is permanent since
valid WAL data are presumably overwritten with another stuff. Hence I
don't know any way to recover - perhaps you can try cutting
WAL file off which will allow OSD to start. With some latest ops lost.
Once can use exported BlueFS as a drop in replacement for regular DB
volume but I'm not aware of details.
And the above are just speculations, can't say for sure if it helps...
I can't explain why WAL doesn't have zero block in your case though.
Little chances this is a different issue. Just in case - could you
please search for 32K zero blocks over the whole file? And the same for
another OSD?
Thanks,
Igor
> Short update on the issue:
>
> Finally we're able to reproduce the issue in master (not octopus),
> investigating further..
>
> @Chris - to make sure you're facing the same issue could you please
> check the content of the broken file. To do so:
>
> 1) run "ceph-bluestore-tool --path <path-to-osd> --our-dir <target
> dir> --command bluefs-export
>
> This will export bluefs files to <target dir>
>
> 2) Check the content for file db.wal/002040.log at offset 0x470000
>
> This will presumably contain 32K of zero bytes. Is this the case?
>
>
> No hurry as I'm just making sure symptoms in Octopus are the same...
>
>
> Thanks,
>
> Igor
>
> On 5/20/2020 5:24 PM, Igor Fedotov wrote:
>> Chris,
>>
>> got them, thanks!
>>
>> Investigating....
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 5/20/2020 5:23 PM, Chris Palmer wrote:
>>> Hi Igor
>>> I've sent you these directly as they're a bit chunky. Let me know if
>>> you haven't got them.
>>> Thx, Chris
>>>
>>> On 20/05/2020 14:43, Igor Fedotov wrote:
>>>> Hi Cris,
>>>>
>>>> could you please share the full log prior to the first failure?
>>>>
>>>> Also if possible please set debug-bluestore/debug bluefs to 20 and
>>>> collect another one for failed OSD startup.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>>
>>>> On 5/20/2020 4:39 PM, Chris Palmer wrote:
>>>>> I'm getting similar errors after rebooting a node. Cluster was
>>>>> upgraded 15.2.1 -> 15.2.2 yesterday. No problems after rebooting
>>>>> during upgrade.
>>>>>
>>>>> On the node I just rebooted, 2/4 OSDs won't restart. Similar logs
>>>>> from both. Logs from one below.
>>>>> Neither OSDs have compression enabled, although there is a
>>>>> compression-related error in the log.
>>>>> Both are replicated x3. One has data on HDD & separate WAL/DB on
>>>>> NVMe partition, the other is everything on NVMe partition only.
>>>>>
>>>>> Feeling kinda nervous here - advice welcomed!!
>>>>>
>>>>> Thx, Chris
>>>>>
>>>>>
>>>>>
>>>>> 2020-05-20T13:14:00.837+0100 7f2e0d273700 3 rocksdb:
>>>>> [table/block_based_table_reader.cc:1117] Encountered error while
>>>>> reading data from compression dictionary block Corruption: block
>>>>> checksum mismatch: expected 0, got 3423870535 in db/000304.sst
>>>>> offset 18446744073709551615 size 18446744073709551615
>>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb:
>>>>> [db/version_set.cc:3757] Recovered from manifest
>>>>> file:db/MANIFEST-000312 succeeded,manifest_file_number is 312,
>>>>> next_file_number is 314, last_sequence is 22320582, log_number is
>>>>> 309,prev_log_number is 0,max_column_family is
>>>>> 0,min_log_number_to_keep is 0
>>>>>
>>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb:
>>>>> [db/version_set.cc:3766] Column family [default] (ID 0), log
>>>>> number is 309
>>>>>
>>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb: EVENT_LOG_v1
>>>>> {"time_micros": 1589976840843199, "job": 1,
"event":
>>>>> "recovery_started", "log_files": [313]}
>>>>> 2020-05-20T13:14:00.841+0100 7f2e1957ee00 4 rocksdb:
>>>>> [db/db_impl_open.cc:583] Recovering log #313 mode 0
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 3 rocksdb:
>>>>> [db/db_impl_open.cc:518] db.wal/000313.log: dropping 9044 bytes;
>>>>> Corruption: error in middle of record
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 3 rocksdb:
>>>>> [db/db_impl_open.cc:518] db.wal/000313.log: dropping 86 bytes;
>>>>> Corruption: missing start of fragmented record(2)
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 4 rocksdb:
>>>>> [db/db_impl.cc:390] Shutdown: canceling all background work
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 4 rocksdb:
>>>>> [db/db_impl.cc:563] Shutdown complete
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 -1 rocksdb: Corruption:
>>>>> error in middle of record
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 -1
>>>>> bluestore(/var/lib/ceph/osd/ceph-9) _open_db erroring opening db:
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 1 bluefs umount
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 1 fbmap_alloc
>>>>> 0x55daf2b3a900 shutdown
>>>>> 2020-05-20T13:14:00.937+0100 7f2e1957ee00 1 bdev(0x55daf3838700
>>>>> /var/lib/ceph/osd/ceph-9/block) close
>>>>> 2020-05-20T13:14:01.093+0100 7f2e1957ee00 1 bdev(0x55daf3838000
>>>>> /var/lib/ceph/osd/ceph-9/block) close
>>>>> 2020-05-20T13:14:01.341+0100 7f2e1957ee00 -1 osd.9 0 OSD:init:
>>>>> unable to mount object store
>>>>> 2020-05-20T13:14:01.341+0100 7f2e1957ee00 -1 ESC[0;31m ** ERROR:
>>>>> osd init failed: (5) Input/output errorESC[0m
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io