Ok, thanks. What could be the reason for this issue? How to rectify this?

On Fri, 23 Aug 2019, 17:18 Jason Dillaman, <jdillama@redhat.com> wrote:
On Fri, Aug 23, 2019 at 7:38 AM Ajitha Robert <ajitharobert01@gmail.com> wrote:
>
> Sir,
>
> I have a running DR setup with ceph.. but i did the same for another two sites.. Its actually direct L2 connectivity link between sites.. I m getting repeated error
>
> rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout

That is just indicative that you are having issues talking to your
local cluster. Assuming you only have a single rbd-mirror daemon
running, it seems like it cannot even sent a message through the OSD
to itself within 5 seconds. Perhaps your cluster is too slow to
respond?

>
> its coming continously. so not getting replicated to other site. Whether direct L2 connectivity is a concern?? whether rbd-mirror expects a L3 layer link for two sites?
>
> On Wed, Jul 24, 2019 at 12:42 AM Ajitha Robert <ajitharobert01@gmail.com> wrote:
>>
>> Thanks for your reply.
>>
>> Regarding rbd mirroring, Can you please check the logs for rbd image creation. Second[2] one started syncing and no progress further.
>>
>> 1)Log for manual rbd image creation
>>
>> http://paste.openstack.org/show/754766/
>>
>>
>> 2)Log for 16gb volume created from cinder, status in cinder volume is available
>>
>> http://paste.openstack.org/show/754767/
>>
>>
>> 3)Log for 100gb volume created from cinder, status in cinder volume is error
>>
>> http://paste.openstack.org/show/754769/
>>
>>
>> On Tue, Jul 23, 2019 at 1:13 AM Jason Dillaman <jdillama@redhat.com> wrote:
>>>
>>> On Mon, Jul 22, 2019 at 3:26 PM Ajitha Robert <ajitharobert01@gmail.com> wrote:
>>> >
>>> > Thanks for your reply
>>> >
>>> > 1) In scenario 1, I didnt attempt to delete the cinder volume. Please find the cinder volume log.
>>> > http://paste.openstack.org/show/754731/
>>>
>>> It might be better to ping Cinder folks about that one. It doesn't
>>> really make sense to me from a quick glance.
>>>
>>> >
>>> > 2) In scenario 2. I will try with debug. But i m having a test setup with one OSD in primary and one OSD in secondary. distance between two ceph clusters is 300 km
>>> >
>>> >
>>> > 3)I have disabled ceph authentication totally for all including rbd-mirror daemon. Also i have deployed the ceph cluster using ceph-ansible. Will these both  create any issue to the entire setup
>>>
>>> Not to my knowledge.
>>>
>>> > 4)The image which was in syncing mode, showed read only status in secondary.
>>>
>>> Mirrored images are either primary or non-primary. It is the expected
>>> (documented) behaviour that non-primary images are read-only.
>>>
>>> > 5)In a presentation i found as journaling feature is causing poor performance in IO operations and we can skip the journaling process for mirroring... Is it possible.. By enabling mirroring to entire cinder pool as pool mode instead of mirror mode of rbd mirroring.. And we can skip the replication_enabled is true spec in cinder type..
>>>
>>> Journaling is required for RBD mirroring.
>>>
>>> >
>>> >
>>> >
>>> > On Mon, Jul 22, 2019 at 11:13 PM Jason Dillaman <jdillama@redhat.com> wrote:
>>> >>
>>> >> On Mon, Jul 22, 2019 at 10:49 AM Ajitha Robert <ajitharobert01@gmail.com> wrote:
>>> >> >
>>> >> > No error log in rbd-mirroring except some connection timeout came once,
>>> >> > Scenario 1:
>>> >> >   when I create a bootable volume of 100 GB with a glance image.Image get downloaded and from cinder, volume log throws with "volume is busy deleting volume that has snapshot" . Image was enabled with exclusive lock, journaling, layering, object-map, fast-diff and deep-flatten
>>> >> > Cinder volume is in error state but the rbd image is created in primary but not in secondary.
>>> >>
>>> >> Any chance you know where in Cinder that error is being thrown? A
>>> >> quick grep of the code doesn't reveal that error message. If the image
>>> >> is being synced to the secondary site when you attempt to delete it,
>>> >> it's possible you could hit this issue. Providing debug log messages
>>> >> from librbd on the Cinder controller might also be helpful for this.
>>> >>
>>> >> > Scenario 2:
>>> >> > but when i create a 50gb volume with another glance image. Volume  get created. and in the backend i could see the rbd images both in primary and secondary
>>> >> >
>>> >> > From rbd mirror image status i found secondary cluster starts copying , and syncing was struck at around 14 %... It will be in 14 % .. no progress at all. should I set any parameters for this like timeout??
>>> >> >
>>> >> > I manually checked rbd --cluster primary object-map check <object-name>..  No results came for the objects and the command was in hanging.. Thats why got worried on the failed to map object key log. I couldnt even rebuild the object map.
>>> >>
>>> >> It sounds like one or more of your primary OSDs are not reachable from
>>> >> the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug
>>> >> rbd = 20", you should be able to see the last object it attempted to
>>> >> copy. From that, you could use "ceph osd map" to figure out the
>>> >> primary OSD for that object.
>>> >>
>>> >> > the image which was in syncing mode, showed read only status in secondary.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, 22 Jul 2019, 17:36 Jason Dillaman, <jdillama@redhat.com> wrote:
>>> >> >>
>>> >> >> On Sun, Jul 21, 2019 at 8:25 PM Ajitha Robert <ajitharobert01@gmail.com> wrote:
>>> >> >> >
>>> >> >> >  I have a rbd mirroring setup with primary and secondary clusters as peers and I have a pool enabled image mode.., In this i created a rbd image , enabled with journaling.
>>> >> >> >
>>> >> >> > But whenever i enable mirroring on the image,  I m getting error in osd.log. I couldnt trace it out. please guide me to solve this error.
>>> >> >> >
>>> >> >> > I think initially it worked fine. but after ceph process restart. these error coming
>>> >> >> >
>>> >> >> >
>>> >> >> > Secondary.osd.0.log
>>> >> >> >
>>> >> >> > 2019-07-22 05:36:17.371771 7ffbaa0e9700  0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap key: client_a5c76849-ba16-480a-a96b-ebfdb7f6ac65
>>> >> >> > 2019-07-22 05:36:17.388552 7ffbaa0e9700  0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set earlier than minimum: 0 < 1
>>> >> >> > 2019-07-22 05:36:17.413102 7ffbaa0e9700  0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap key: order
>>> >> >> > 2019-07-22 05:36:23.341490 7ffbab8ec700  0 <cls> /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id for global id '9e36b9f8-238e-4a54-a055-19b19447855e': (2) No such file or directory
>>> >> >> >
>>> >> >> >
>>> >> >> > primary-osd.0.log
>>> >> >> >
>>> >> >> > 2019-07-22 05:16:49.287769 7fae12db1700  0 log_channel(cluster) log [DBG] : 1.b deep-scrub ok
>>> >> >> > 2019-07-22 05:16:54.078698 7fae125b0700  0 log_channel(cluster) log [DBG] : 1.1b scrub starts
>>> >> >> > 2019-07-22 05:16:54.293839 7fae125b0700  0 log_channel(cluster) log [DBG] : 1.1b scrub ok
>>> >> >> > 2019-07-22 05:17:04.055277 7fae12db1700  0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set earlier than minimum: 0 < 1
>>> >> >> >
>>> >> >> > 2019-07-22 05:33:21.540986 7fae135b2700  0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set earlier than minimum: 0 < 1
>>> >> >> > 2019-07-22 05:35:27.447820 7fae12db1700  0 <cls> /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id for global id '8a61f694-f650-4ba1-b768-c5e7629ad2e0': (2) No such file or directory
>>> >> >>
>>> >> >> Those don't look like errors, but the log level should probably be
>>> >> >> reduced for those OSD cls methods. If you look at your rbd-mirror
>>> >> >> daemon log, do you see any errors? That would be the important place
>>> >> >> to look.
>>> >> >>
>>> >> >> >
>>> >> >> > --
>>> >> >> > Regards,
>>> >> >> > Ajitha R
>>> >> >> > _______________________________________________
>>> >> >> > ceph-users mailing list
>>> >> >> > ceph-users@lists.ceph.com
>>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Jason
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jason
>>> >
>>> >
>>> >
>>> > --
>>> > Regards,
>>> > Ajitha R
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>>
>> --
>> Regards,
>> Ajitha R
>
>
>
> --
> Regards,
> Ajitha R



--
Jason