[ceph-users] Re: ceph pgs inconsistent, always the same checksum

15 Sep 2020

Hi,

On Tue, Sep 8, 2020 at 11:20 PM David Orman &lt;ormandj(a)corenode.com&gt; wrote:

...

 Every time we look at them, we see the same checksum (0x6706be76):

 This looks a lot like: https://tracker.ceph.com/issues/22464

 Some more context on this as I've built the work-around for this issue:

* the checksum is for a block of all zeroes
* this seemed to happen when memory runs low
* it is *NOT* related to swap: this happened on systems with swap disabled
and no file-backed mmaped memory (BlueStore-only servers w/o non-OSD disks)
* only showed up on some kernel versions
* re-trying the read did solve it, very rare to see two consecutive read
failures, never saw it with 3 retries
* root cause was never found, as I never managed to reliably reproduce this
on test setups where I could play around with bisecting the kernel :(

Here's the patch that added the read retries:
https://github.com/ceph/ceph/pull/23273/files

What you can do is:

1. check the performance counter bluestore_reads_with_retries on affected
OSDs, should be non-zero
2. increase the setting bluestore_retry_disk_reads (default 3) to see if
that helps

Anyways, what you are seeing might be something completely different than
whatever caused this bug... but it's worth playing around with the retry
option

Paul

...
  That said, we've got the following versions in
play (cluster was created
 with 15.2.3):

 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
 (stable)

 This is a containerized cephadm installation, in case it's relevant.
 Distribution is Ubuntu 18.04.04, kernel is the HWE kernel:

 Linux ceph02 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24
 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

 A repair operation 'fixes' it. These are occurring across many PGs, on the
 various different servers, and we see no indication of any hardware related
 issues.

 Any ideas what to do next?
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: ceph pgs inconsistent, always the same checksum