On Thu, Feb 18, 2021 at 03:28:11PM +0000, Eugen Block wrote:
Hi,
was there an interruption between those sites?
last_update: 2021-01-29 15:10:13
If there was an interruption you'll probably need to resync those images.
If your results shown below are not from that past then yes, it looks
like the rbd-mirror (at least the image replayer) got stuck for some
reason long time ago. Then I can't see though how could you mount a
newly created snap, because it would not be replayed.
Probably you had a snapshot with such name previously, it was
replayed, then the rbd-mirror got stuck, the snapshot was deleted on
the primary and a new one created recently. And on the secondary you
was still seeing and mounting the old snapshot?
This would also explain why you were able to mount it -- if data is
really missing I expect you are not able to mount the fs due to
corruption.
If the rbd-mirror just got stuck then you probably don't need to
resync. Just restarting the rbd-mirror should make it to start
replaying again. Though taking how long it was not replaying, if the
journal is very large, the resync might be faster.
You can try:
rbd journal info -p cifs --image research_data
to see how large the journal is currently (the difference in the
master and the rbd-mirror client positions).
And if this is really the case that rbd-mirror got stuck, any
additional info you could provide (rbd-mirror logs, the core dump)
might be helpful for fixing the bug. It is can be reported right to
the tracker.
What version are you running BTW?
--
Mykola Golub
> Zitat von Vikas Rana <vrana(a)vtiersys.com>om>:
>
> > Hi Friends,
> >
> >
> >
> > We have a very weird issue with rbd-mirror replication. As per the command
> > output, we are in sync but the OSD usage on DR side doesn't match the Prod
> > Side.
> >
> > On Prod, we are using close to 52TB but on DR side we are only 22TB.
> >
> > We took a snap on Prod and mounted the snap on DR side and compared the data
> > and we found lot of missing data. Please see the output below.
> >
> >
> >
> > Please help us resolve this issue or point us in right direction.
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> >
> >
> > DR# rbd --cluster cephdr mirror pool status cifs --verbose
> >
> > health: OK
> >
> > images: 1 total
> >
> > 1 replaying
> >
> >
> >
> > research_data:
> >
> > global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a
> >
> > state: up+replaying
> >
> > description: replaying, master_position=[object_number=390133, tag_tid=4,
> > entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4,
> > entry_tid=447832541], entries_behind_master=0
> >
> > last_update: 2021-01-29 15:10:13
> >
> >
> >
> > DR# ceph osd pool ls detail
> >
> > pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins
> > pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0
> > application rbd
> >
> > removed_snaps [1~5]
> >
> >
> >
> >
> >
> > PROD# ceph df detail
> >
> > POOLS:
> >
> > NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED
> > MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
> >
> > cifs 17 N/A N/A 26.0TiB 30.10
> > 60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB
> >
> >
> >
> > DR# ceph df detail
> >
> > POOLS:
> >
> > NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED
> > MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
> >
> > cifs 5 N/A N/A 11.4TiB 15.78
> > 60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB
> >
> >
> >
> >
> >
> >
> >
> > PROD#:/vol/research_data# du -sh *
> >
> > 11T Flab1
> >
> > 346G KLab
> >
> > 1.5T More
> >
> > 4.4T ReLabs
> >
> > 4.0T WLab
> >
> >
> >
> > DR#:/vol/research_data# du -sh *
> >
> > 2.6T Flab1
> >
> > 14G KLab
> >
> > 52K More
> >
> > 8.0K RLabs
> >
> > 202M WLab
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io