pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch" - ceph-users

21 Feb 2024

Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 
objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the PG 
after repair is finished, it still report scrub errors.

Why can't ceph pg repair repair this, it has 4 out of 6 should be able 
to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's forced 
to recreate them?

Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7: Backfilling 
deadlock / stall / stuck / standstill" [1].
   - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
   - Solution was to swap from mclock to wpq and restart alle OSD.
   - When all backfilling was finished all 4 OSD was replaced.
   - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.

PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run and 
reported errors
     ceph status
     -----------
     HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg 
inconsistent
     [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
     [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
         pg 404.bc is active+clean+inconsistent, acting 
[223,297,269,276,136,197]

I then run repair
     ceph pg repair 404.bc

And ceph status showed this
     ceph status
     -----------
     HEALTH_WARN Too many repaired reads on 2 OSDs
     [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
         osd.223 had 698 reads repaired
         osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART error or 
any I/O error in OS logs.
So I tried to run deep-scrub again on the PG.
     ceph pg deep-scrub 404.bc

And got this result.

     ceph status
     -----------
     HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs; 
Possible data damage: 1 pg inconsistent
     [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
     [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
         osd.223 had 698 reads repaired
         osd.269 had 698 reads repaired
     [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
         pg 404.bc is active+clean+scrubbing+deep+inconsistent+repair, 
acting [223,297,269,276,136,197]

698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

     HEALTH_WARN Too many repaired reads on 2 OSDs
     [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
         osd.223 had 1396 reads repaired
         osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they 
reappear again after a deep-scrub.

The log for osd.223 and osd.269 contain "got incorrect hash on read" and 
"candidate had an ec hash mismatch" for 698 unique objects.
But i only show the logs for 1 of the 698 object, the log is the same 
for the other 697 objects.

     osd.223 log (only showing 1 of 698 object named 
2021-11-08T19%3a43%3a50,145489260+00%3a00)
     -----------
     Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch: 
231235 pg[404.bcs0( v 231235'1636919 (231078'1632435,231235'1636919] 
local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 
les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=0 
lpr=226263 crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918 
active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0:  REQ_SCRUB ] 
  MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned REQ_SCRUB] _scan_list  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head

got incorrect hash on read 0xc5d1dd1b !=  expected 0x7c2f86d7
     Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) 
log [ERR] : 404.bc shard 223(0) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head

: candidate had an ec hash mismatch
     Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) 
log [ERR] : 404.bc shard 269(2) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head

: candidate had an ec hash mismatch
     Feb 20 10:31:01 ceph-hd-003 
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 
2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster) log 
[ERR] : 404.bc shard 223(0) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head

: candidate had an ec hash mismatch
     Feb 20 10:31:01 ceph-hd-003 
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 
2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster) log 
[ERR] : 404.bc shard 269(2) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head

: candidate had an ec hash mismatch

     osd.269 log (only showing 1 of 698 object named 
2021-11-08T19%3a43%3a50,145489260+00%3a00)
     -----------
     Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269 pg_epoch: 
231235 pg[404.bcs2( v 231235'1636919 (231078'1632435,231235'1636919] 
local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 
les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=2 
lpr=226263 luod=0'0 crt=231235'1636919 mlcod 231235'1636919 active 
mbc={}] _scan_list  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head

got incorrect hash on read 0x7c0871dc !=  expected 0xcf6f4c58

The log for the other osd in the PG osd.297, osd.276, osd.136 and 
osd.197 doesn't show any error.

If I try to get the object it failes
     $ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
     download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00' -> 
'./2021-11-08T19:43:50,145489260+00:00'  [1 of 1]
     ERROR: Download of './2021-11-08T19:43:50,145489260+00:00' failed 
(Reason: 500 (UnknownError))
     ERROR: S3 error: 500 (UnknownError)

And the RGW log show this
     Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new 
request req=0x7f94b744d660 =====
     Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING: set_req_state_err 
err_no=5 resorting to 500
     Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new 
request req=0x7f94b6e41660 =====
     Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== req done 
req=0x7f94b744d660 op status=-5 http_status=500 latency=0.020000568s 
======
     Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast: 0x7f94b744d660: 
110.2.0.46 - test1 [21/Feb/2024:08:27:06.021 +0000] "GET 
/benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00 HTTP/1.1" 500 
226 - - - latency=0.020000568s

[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5A…

--
Kai Stian Olstad