On Fri, Jan 8, 2021 at 2:19 PM Gaël THEROND <gael.therond(a)bitswalk.com> wrote:
Hi everyone!
I'm facing a weird issue with one of my CEPH clusters:
OS: CentOS - 8.2.2004 (Core)
CEPH: Nautilus 14.2.11 - stable
RBD using erasure code profile (K=3; m=2)
When I want to format one of my RBD image (client side) I've got the
following kernel messages multiple time with different sector IDs:
*[2417011.790154] blk_update_request: I/O error, dev rbd23, sector
164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class
0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936
result -1 *
At first I thought about a faulty disk BUT the monitoring system is not
showing anything faulty so I decided to run manual tests on all my OSDs to
look at disk health using smartctl etc.
None of them is marked as not healthy and actually they don't get any
counter with faulty sectors/read or writes and the Wear Level is 99%
So, the only particularity of this image is it is a 80Tb image, but it
shouldn't be an issue as we already have that kind of image size used on
another pool.
If anyone have a clue at how I could sort this out, I'll be more than happy
Hi Gaël,
What command are you running to format the image?
Is it persistent? After the first formatting attempt fails, do the
following attempts fail too?
Is it always the same set of sectors?
Could you please attach the output of "rbd info" for that image and the
entire kernel log from the time that image is mapped?
Thanks,
Ilya