Data Missing with RBD-Mirror - ceph-users

16 Feb 2021

Hi Friends,

We have a very weird issue with rbd-mirror replication. As per the command
output, we are in sync but the OSD usage on DR side doesn't match the Prod
Side.

On Prod, we are using close to 52TB but on DR side we are only 22TB.

We took a snap on Prod and mounted the snap on DR side and compared the data
and we found lot of missing data. Please see the output below.

Please help us resolve this issue or point us in right direction.

Thanks,

-Vikas

DR# rbd --cluster cephdr mirror pool status cifs --verbose

health: OK

images: 1 total

    1 replaying

research_data:

  global_id:   69656449-61b8-446e-8b1e-6cf9bd57d94a

  state:       up+replaying

  description: replaying, master_position=[object_number=390133, tag_tid=4,
entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4,
entry_tid=447832541], entries_behind_master=0

  last_update: 2021-01-29 15:10:13

DR# ceph osd pool ls detail

pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins
pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0
application rbd

        removed_snaps [1~5]

PROD# ceph df detail

POOLS:

    NAME        ID     QUOTA OBJECTS     QUOTA BYTES     USED        %USED
MAX AVAIL     OBJECTS     DIRTY     READ        WRITE       RAW USED

    cifs        17     N/A               N/A             26.0TiB     30.10
60.4TiB     6860550     6.86M      873MiB      509MiB      52.1TiB

DR# ceph df detail

POOLS:

    NAME        ID     QUOTA OBJECTS     QUOTA BYTES     USED        %USED
MAX AVAIL     OBJECTS     DIRTY     READ        WRITE       RAW USED

    cifs        5      N/A               N/A             11.4TiB     15.78
60.9TiB     3043260     3.04M     2.65MiB      431MiB      22.8TiB

PROD#:/vol/research_data# du -sh *

11T     Flab1

346G    KLab

1.5T    More

4.4T    ReLabs

4.0T    WLab

DR#:/vol/research_data# du -sh *

2.6T    Flab1

14G     KLab

52K     More

8.0K    RLabs

202M    WLab