[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

29 May 2020

Simon, Harry,

so the log from the ticket I can see a huge ((400+ MB) bluefs log kept  
over many small non-adjustent extents.

Presumably it was caused by either setting small bluefs_alloc_size or 
high disk space fragmentation or both. Now I'd like more details on your 
OSDs.

Could you please collect OSD startup log with debug_bluefs set to 20?

Also please run the following commands for broken OSD (need results 
only, no need to collect the log unless they're failing):

ceph-bluestore-tool --path <path-to-osd> --command bluefs-bdev-sizes

ceph-bluestore-tool --path <path-to-osd> --command free-score

Thanks,

Igor

On 5/29/2020 1:05 PM, Simon Leinen wrote:
> Colleague of Harry's here...
>
> Harald Staub writes:
>> This is again about our bad cluster, with too much objects, and the
>> hdd OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3
>> GB usable). Now several OSDs do not come up any more.
>> Typical error message:
>> /build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED
>> ceph_assert(h->file->fnode.ino != 1)
> The context of that line is "we should never run out of log space here":
>
>    // previously allocated extents.
>    bool must_dirty = false;
>    if (allocated < offset + length) {
>      // we should never run out of log space here; see the min runway check
>      // in _flush_and_sync_log.
>      ceph_assert(h->file->fnode.ino != 1);
>
> So I guess we are violating that "should", and the Bluestore code
> doesn't handle that case.  And the "min runway" check may not be
> reliable.  Should we file a bug?
>
> Again, help on how to proceed would be greatly appreciated...

2024

2023

2022

2021

2020

2019

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)