Hi,
I'm debating with myself if I should
1. Stop both OSD 223 and 269,
2. Just one of them.
I understand your struggle, I think I would stop them both just to
rule out a replication of corrupted data.
Zitat von Kai Stian Olstad <ceph+list(a)olstad.com>om>:
> Hi Eugen, thank you for the reply.
>
> The OSD was drained over the weekend, so OSD 223 and 269 have only
> the problematic PG 404.bc.
>
> I don't think moving the PG would help since I don't have any empty
> OSD to move it to, and a move would not fix the hash mismatch.
> The reason I just want to have the problematic PG on the OSDs is to
> reduce recovery time.
> I would need to set min_size to 4 in an EC 4+2, and stop them both
> at the same time to force a rebuild of the corrupted part of PG that
> is on osd 223 and 269, since repair doesn't fix it.
>
I'm debating with myself if I should
1. Stop both OSD 223 and 269,
2. Just one of them.
>
> Stopping them both, I'm guarantied that part of the PG on 223 and
> 269 is rebuild from the 4 other, 297, 276, 136 and 197 that doesn't
> have any errors.
>
> OSD 223 is the master in the EC, pg 404.bc acting [223,297,269,276,136,197]
> So maybe just stop that one, wait for recovery and the run
> deep-scrub to check if things look better.
> But would it then use corrupted data on osd 269 to rebuild.
>
>
> -
> Kai Stian Olstad
>
>
>
> On 26.02.2024 10:19, Eugen Block wrote:
>> Hi,
>>
>> I think your approach makes sense. But I'm wondering if moving only
>> the problematic PGs to different OSDs could have an effect as
>> well. I assume that moving the 2 PGs is much quicker than moving
>> all BUT those 2 PGs. If that doesn't work you could still fall
>> back to draining the entire OSDs (except for the problematic PG).
>>
>> Regards,
>> Eugen
>>
>> Zitat von Kai Stian Olstad <ceph+list(a)olstad.com>om>:
>>
>>> Hi,
>>>
>>> No one have any comment at all?
>>> I'm not picky so any speculation, guessing, I would, I wouldn't,
>>> should work and so one would be highly appreciated.
>>>
>>>
>>> Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve
>>> it I think the following might work.
>>>
>>> pg 404.bc acting [223,297,269,276,136,197]
>>>
>>> - Use pgremapper to move all PG on OSD 223 and 269 except 404.bc
>>> to other OSD.
>>> - Set min_since to 4, ceph osd pool set default.rgw.buckets.data min_size 4
>>> - Stop osd 223 and 269
>>>
>>> What I hope will happen is that Ceph then recreate 404.bc shard
>>> s0(osd.223) and s2(osd.269) since they are now down from the
>>> remaining shards
>>> s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)
>>>
>>>
>>> _Any_ comment is highly appreciated.
>>>
>>> -
>>> Kai Stian Olstad
>>>
>>>
>>> On 21.02.2024 13:27, Kai Stian Olstad wrote:
>>>> Hi,
>>>>
>>>> Short summary
>>>>
>>>> PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for
>>>> 698 objects.
>>>> Ceph pg repair doesn't fix it, because if you run deep-srub on
>>>> the PG after repair is finished, it still report scrub errors.
>>>>
>>>> Why can't ceph pg repair repair this, it has 4 out of 6 should be
>>>> able to reconstruct the corrupted shards?
>>>> Is there a way to fix this? Like delete object s0 and s2 so it's
>>>> forced to recreate them?
>>>>
>>>>
>>>> Long detailed summary
>>>>
>>>> A short backstory.
>>>> * This is aftermath of problems with mclock, post "17.2.7:
>>>> Backfilling deadlock / stall / stuck / standstill" [1].
>>>> - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
>>>> - Solution was to swap from mclock to wpq and restart alle OSD.
>>>> - When all backfilling was finished all 4 OSD was replaced.
>>>> - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.
>>>>
>>>>
>>>> PG / pool 404 is EC 4+2 default.rgw.buckets.data
>>>>
>>>> 9 days after the osd.223 og osd.269 was replaced, deep-scub was
>>>> run and reported errors
>>>> ceph status
>>>> -----------
>>>> HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg inconsistent
>>>> [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
>>>> [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
>>>> pg 404.bc is active+clean+inconsistent, acting
>>>> [223,297,269,276,136,197]
>>>>
>>>> I then run repair
>>>> ceph pg repair 404.bc
>>>>
>>>> And ceph status showed this
>>>> ceph status
>>>> -----------
>>>> HEALTH_WARN Too many repaired reads on 2 OSDs
>>>> [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
>>>> osd.223 had 698 reads repaired
>>>> osd.269 had 698 reads repaired
>>>>
>>>> But osd.223 and osd.269 is new disks and the disks has no SMART
>>>> error or any I/O error in OS logs.
>>>> So I tried to run deep-scrub again on the PG.
>>>> ceph pg deep-scrub 404.bc
>>>>
>>>> And got this result.
>>>>
>>>> ceph status
>>>> -----------
>>>> HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2
>>>> OSDs; Possible data damage: 1 pg inconsistent
>>>> [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
>>>> [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
>>>> osd.223 had 698 reads repaired
>>>> osd.269 had 698 reads repaired
>>>> [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
>>>> pg 404.bc is
>>>> active+clean+scrubbing+deep+inconsistent+repair, acting
>>>> [223,297,269,276,136,197]
>>>>
>>>> 698 + 698 = 1396 so the same amount of errors.
>>>>
>>>> Run repair again on 404.bc and ceph status is
>>>>
>>>> HEALTH_WARN Too many repaired reads on 2 OSDs
>>>> [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
>>>> osd.223 had 1396 reads repaired
>>>> osd.269 had 1396 reads repaired
>>>>
>>>> So even when repair finish it doesn't fix the problem since they
>>>> reappear again after a deep-scrub.
>>>>
>>>> The log for osd.223 and osd.269 contain "got incorrect hash on
>>>> read" and "candidate had an ec hash mismatch" for 698
unique
>>>> objects.
>>>> But i only show the logs for 1 of the 698 object, the log is the
>>>> same for the other 697 objects.
>>>>
>>>> osd.223 log (only showing 1 of 698 object named
>>>> 2021-11-08T19%3a43%3a50,145489260+00%3a00)
>>>> -----------
>>>> Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223
>>>> pg_epoch: 231235 pg[404.bcs0( v 231235'1636919
>>>> (231078'1632435,231235'1636919] local-lis/les=226263/226264
>>>> n=296580 ec=36041/27862 lis/c=226263/226263
>>>> les/c/f=226264/230954/0 sis=226263)
>>>> [223,297,269,276,136,197]p223(0) r=0 lpr=226263
>>>> crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918
>>>> active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0:
>>>> REQ_SCRUB ] MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned
>>>> REQ_SCRUB] _scan_list
>>>>
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head
got incorrect hash on read 0xc5d1dd1b != expected
>>>> 0x7c2f86d7
>>>> Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:
>>>> log_channel(cluster) log [ERR] : 404.bc shard 223(0) soid
>>>>
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head
: candidate had an ec hash
>>>> mismatch
>>>> Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]:
>>>> log_channel(cluster) log [ERR] : 404.bc shard 269(2) soid
>>>>
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head
: candidate had an ec hash
>>>> mismatch
>>>> Feb 20 10:31:01 ceph-hd-003
>>>> ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]:
>>>> 2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster)
>>>> log [ERR] : 404.bc shard 223(0) soid
>>>>
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head
: candidate had an ec hash
>>>> mismatch
>>>> Feb 20 10:31:01 ceph-hd-003
>>>> ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]:
>>>> 2024-02-20T10:31:01.117+0000 7f128a88d700 -1 log_channel(cluster)
>>>> log [ERR] : 404.bc shard 269(2) soid
>>>>
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head
: candidate had an ec hash
>>>> mismatch
>>>>
>>>> osd.269 log (only showing 1 of 698 object named
>>>> 2021-11-08T19%3a43%3a50,145489260+00%3a00)
>>>> -----------
>>>> Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269
>>>> pg_epoch: 231235 pg[404.bcs2( v 231235'1636919
>>>> (231078'1632435,231235'1636919] local-lis/les=226263/226264
>>>> n=296580 ec=36041/27862 lis/c=226263/226263
>>>> les/c/f=226264/230954/0 sis=226263)
>>>> [223,297,269,276,136,197]p223(0) r=2 lpr=226263 luod=0'0
>>>> crt=231235'1636919 mlcod 231235'1636919 active mbc={}]
>>>> _scan_list
>>>>
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head
got incorrect hash on read 0x7c0871dc != expected
>>>> 0xcf6f4c58
>>>>
>>>> The log for the other osd in the PG osd.297, osd.276, osd.136 and
>>>> osd.197 doesn't show any error.
>>>>
>>>> If I try to get the object it failes
>>>> $ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
>>>> download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00'
>>>> -> './2021-11-08T19:43:50,145489260+00:00' [1 of 1]
>>>> ERROR: Download of './2021-11-08T19:43:50,145489260+00:00'
>>>> failed (Reason: 500 (UnknownError))
>>>> ERROR: S3 error: 500 (UnknownError)
>>>>
>>>> And the RGW log show this
>>>> Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new
>>>> request req=0x7f94b744d660 =====
>>>> Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING:
>>>> set_req_state_err err_no=5 resorting to 500
>>>> Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== starting new
>>>> request req=0x7f94b6e41660 =====
>>>> Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: ====== req done
>>>> req=0x7f94b744d660 op status=-5 http_status=500
>>>> latency=0.020000568s ======
>>>> Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast:
>>>> 0x7f94b744d660: 110.2.0.46 - test1 [21/Feb/2024:08:27:06.021
>>>> +0000] "GET
>>>> /benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00
>>>> HTTP/1.1" 500 226 - - - latency=0.020000568s
>>>>
>>>> [1]
>>>>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5A…
>>>>
>>>> --
>>>> Kai Stian Olstad
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io