Hello Mykola/Eugen,
Here's the output. We also restarted the rbd-mirror process
# rbd journal info -p cifs --image research_data
rbd journal '11cb6c2ae8944a':
header_oid: journal.11cb6c2ae8944a
object_oid_prefix: journal_data.17.11cb6c2ae8944a.
order: 24 (16MiB objects)
splay_width: 4
We restarted the rbd-mirror process on the DR side
# rbd --cluster cephdr mirror pool status cifs --verbose
health: OK
images: 1 total
1 replaying
research_data:
global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a
state: up+replaying
description: replaying, master_position=[object_number=396351, tag_tid=4,
entry_tid=455084955], mirror_position=[object_number=396351, tag_tid=4,
entry_tid=455084955], entries_behind_master=0
last_update: 2021-02-19 15:36:30
Thanks,
-Vikas
-----Original Message-----
From: Vikas Rana <vrana(a)vtiersys.com>
Sent: Friday, February 19, 2021 2:00 PM
To: 'Mykola Golub' <to.my.trociny(a)gmail.com>om>; 'Eugen Block'
<eblock(a)nde.ag>
Cc: ceph-users(a)ceph.io
Subject: [ceph-users] Re: Data Missing with RBD-Mirror
Hello Mykola and Eugen,
There was no interruption and we are in a campus with 10G backbone.
We are on 12.2.10 I believe.
We wanted to check the data on DR side and then we created a snapshot on
primary which was available on DR side very quickly. It kind of gave me
feeling that rbd-mirror is not stuck.
I will run those commands and also restart the rbd-mirror and will report
back.
Thanks,
-Vikas
-----Original Message-----
From: Mykola Golub <to.my.trociny(a)gmail.com>
Sent: Thursday, February 18, 2021 2:51 PM
To: Vikas Rana <vrana(a)vtiersys.com>om>; Eugen Block <eblock(a)nde.ag>
Cc: ceph-users(a)ceph.io
Subject: Re: [ceph-users] Re: Data Missing with RBD-Mirror
On Thu, Feb 18, 2021 at 03:28:11PM +0000, Eugen Block wrote:
Hi,
was there an interruption between those sites?
last_update: 2021-01-29 15:10:13
If there was an interruption you'll probably need to resync those images.
If your results shown below are not from that past then yes, it looks like
the rbd-mirror (at least the image replayer) got stuck for some reason long
time ago. Then I can't see though how could you mount a newly created snap,
because it would not be replayed.
Probably you had a snapshot with such name previously, it was replayed, then
the rbd-mirror got stuck, the snapshot was deleted on the primary and a new
one created recently. And on the secondary you was still seeing and mounting
the old snapshot?
This would also explain why you were able to mount it -- if data is really
missing I expect you are not able to mount the fs due to corruption.
If the rbd-mirror just got stuck then you probably don't need to resync.
Just restarting the rbd-mirror should make it to start replaying again.
Though taking how long it was not replaying, if the journal is very large,
the resync might be faster.
You can try:
rbd journal info -p cifs --image research_data
to see how large the journal is currently (the difference in the master and
the rbd-mirror client positions).
And if this is really the case that rbd-mirror got stuck, any additional
info you could provide (rbd-mirror logs, the core dump) might be helpful for
fixing the bug. It is can be reported right to the tracker.
What version are you running BTW?
--
Mykola Golub
Zitat von Vikas Rana <vrana(a)vtiersys.com>om>:
> Hi Friends,
>
>
>
> We have a very weird issue with rbd-mirror replication. As per the
command
> output, we are in sync but the OSD usage on DR
side doesn't match
> the
Prod
> Side.
>
> On Prod, we are using close to 52TB but on DR side we are only 22TB.
>
> We took a snap on Prod and mounted the snap on DR side and compared
> the
data
> and we found lot of missing data. Please see the
output below.
>
>
>
> Please help us resolve this issue or point us in right direction.
>
>
>
> Thanks,
>
> -Vikas
>
>
>
> DR# rbd --cluster cephdr mirror pool status cifs --verbose
>
> health: OK
>
> images: 1 total
>
> 1 replaying
>
>
>
> research_data:
>
> global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a
>
> state: up+replaying
>
> description: replaying, master_position=[object_number=390133,
tag_tid=4,
> entry_tid=447832541],
mirror_position=[object_number=390133,
> tag_tid=4, entry_tid=447832541], entries_behind_master=0
>
> last_update: 2021-01-29 15:10:13
>
>
>
> DR# ceph osd pool ls detail
>
> pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins
> pg_num 128 pgp_num 128 last_change 1294 flags
hashpspool
> stripe_width 0 application rbd
>
> removed_snaps [1~5]
>
>
>
>
>
> PROD# ceph df detail
>
> POOLS:
>
> NAME ID QUOTA OBJECTS QUOTA BYTES USED
%USED
> MAX AVAIL OBJECTS DIRTY READ
WRITE RAW USED
>
> cifs 17 N/A N/A 26.0TiB
30.10
> 60.4TiB 6860550 6.86M 873MiB
509MiB 52.1TiB
>
>
>
> DR# ceph df detail
>
> POOLS:
>
> NAME ID QUOTA OBJECTS QUOTA BYTES USED
%USED
> MAX AVAIL OBJECTS DIRTY READ
WRITE RAW USED
>
> cifs 5 N/A N/A 11.4TiB
15.78
60.9TiB
3043260 3.04M 2.65MiB 431MiB 22.8TiB
PROD#:/vol/research_data# du -sh *
11T Flab1
346G KLab
1.5T More
4.4T ReLabs
4.0T WLab
DR#:/vol/research_data# du -sh *
2.6T Flab1
14G KLab
52K More
8.0K RLabs
202M WLab
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email
to ceph-users-leave(a)ceph.io