How to repair rbd image corruption - ceph-users

31 Aug 2020

Hi,
I have a rook cluster running with ceph 12.2.7 for almost one year.
Recently some pvc couldn’t be attached with error as below,
  Warning  FailedMount  7m19s               kubelet, 192.168.34.119  MountVolume.SetUp
failed for volume "pvc-8f4ca7ac-42ab-11ea-99d7-005056b84936" : mount command
failed, status: Failure, reason: failed to mount volume /dev/rbd1 [ext4] to
/var/lib/kubelet/plugins/rook.io/rook-ceph/mounts/pvc-8f4ca7ac-42ab-11ea-99d7-005056b84936,
error 'fsck' found errors on device /dev/rbd1 but could not correct them: fsck
from util-linux 2.23.2
/dev/rbd1: recovering journal
/dev/rbd1 contains a file system with errors, check forced.
/dev/rbd1: Inode 393244, end of extent exceeds allowed value
  (logical block 512, physical block 12091904, len 4388)

After mapping storage to block device and run "fsck -y" on it, it indicated that
filesystem was clean. Then retry to mount the storage, it still reported the same error as
above.

Response from "ceph status” is as below,
cluster:
    id:     54a729b6-7b59-4e5b-bc09-7dc99109cbad
    health: HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set
            Degraded data redundancy: 50711/152133 objects degraded (33.333%), 100 pgs
degraded, 100 pgs undersized
            mons rook-ceph-mon41,rook-ceph-mon44 are low on available space

  services:
    mon: 3 daemons, quorum rook-ceph-mon44,rook-ceph-mon47,rook-ceph-mon41
    mgr: a(active)
    osd: 3 osds: 3 up, 3 in
         flags noscrub,nodeep-scrub

  data:
    pools:   1 pools, 100 pgs
    objects: 50711 objects, 190 GB
    usage:   383 GB used, 1111 GB / 1495 GB avail
    pgs:     50711/152133 objects degraded (33.333%)
             100 active+undersized+degraded

  io:
    client:   78795 B/s wr, 0 op/s rd, 11 op/s wr

From output, why is scrub disabled? Could I trigger it manually?
And how could I check which pgs or objects is corrupted?

Any advice for fixing the issue?

Thanks for your help.
Jared