It should take ~25 seconds by default to detect a network failure, the
config option that controls this is "osd heartbeat grace" (default 20
seconds, but it takes a little longer for it to really detect the
failure).
Check ceph -w while performing the test.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Fri, Nov 29, 2019 at 8:14 AM majia xiao <xiaomajia.st(a)gmail.com> wrote:
>
> Hello,
>
>
> We have a Ceph cluster (version 12.2.4) with 10 hosts, and there are 21 OSDs on each
host.
>
>
> An EC pool is created with the following commands:
>
>
> ceph osd erasure-code-profile set profile_jerasure_4_3_reed_sol_van \
>
> plugin=jerasure \
>
> k=4 \
>
> m=3 \
>
> technique=reed_sol_van \
>
> packetsize=2048 \
>
> crush-device-class=hdd \
>
> crush-failure-domain=host
>
>
> ceph osd pool create pool_jerasure_4_3_reed_sol_van 2048 2048 erasure
profile_jerasure_4_3_reed_sol_van
>
>
>
> Here are my questions:
>
> The EC pool is created using k=4, m=3, and crush-device-class=hdd, so we just disable
the network interfaces of some hosts (using "ifdown" command) to verify the
functionality of the EC pool while performing ‘rados bench’ command.
> However, the IO rate drops immediately to 0 when a single host goes offline, and it
takes a long time (~100 seconds) for the IO rate becoming normal.
> As far as I know, the default value of min_size is k+1 or 5, which means that the EC
pool can be still working even if there are two hosts offline.
> Is there something wrong with my understanding?
> According to our observations, it seems that the IO rate becomes normal when Ceph
detects all OSDs corresponding to the failed host.
> Is there any way to reduce the time needed for Ceph to detect all failed OSDs?
>
>
>
> Thanks for any help.
>
>
> Best regards,
>
> Majia Xiao
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io