Have you confirmed that all OSD hosts can see each other (on both the front
and back networks if you use split networks)? If there's not full
connectivity, then that can lead to the issues you see here.
Checking the logs on the mons can be helpful, as it will usually indicate
why a given OSD is being marked down (e.g. which OSDs are indicating that
it's down). The OSD logs may also be helpful.
Josh
On Thu, Jul 8, 2021 at 5:18 AM Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
wrote:
Hi,
I've added 4 nvme hosts with 2osd/nvme to my cluster and it made al the
ssd osds flapping I don't understand why.
It is under the same root but 2 different device classes, nvme and ssd.
The pools are on the ssd on the nvme nothing at the moment.
The only way to bring back the ssd osds alive to shutdown the nvmes.
The new nvme servers have 25GB nics the old servers and the mons have 10GB
but in aggregated mode.
This is the crush rule dump:
[
{
"rule_id": 0,
"rule_name": "replicated_ssd",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "replicated_nvme",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~nvme"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]
This is the osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-19 561.15057 root default
-1 38.03099 host server-2001
0 ssd 2.00000 osd.0 up 1.00000 1.00000
10 ssd 6.98499 osd.10 up 1.00000 1.00000
11 ssd 6.98599 osd.11 up 1.00000 1.00000
12 ssd 2.29799 osd.12 up 1.00000 1.00000
13 ssd 2.29799 osd.13 up 1.00000 1.00000
14 ssd 3.49300 osd.14 up 1.00000 1.00000
41 ssd 6.98499 osd.41 up 1.00000 1.00000
42 ssd 6.98599 osd.42 up 1.00000 1.00000
-3 38.03099 host server-2002
1 ssd 2.00000 osd.1 up 1.00000 1.00000
24 ssd 6.98499 osd.24 up 1.00000 1.00000
25 ssd 6.98599 osd.25 up 1.00000 1.00000
27 ssd 2.29799 osd.27 up 1.00000 1.00000
28 ssd 2.29799 osd.28 up 1.00000 1.00000
29 ssd 3.49300 osd.29 up 1.00000 1.00000
43 ssd 6.98499 osd.43 up 1.00000 1.00000
44 ssd 6.98599 osd.44 up 1.00000 1.00000
-6 38.03000 host server-2003
2 ssd 2.00000 osd.2 up 1.00000 1.00000
26 ssd 6.98499 osd.26 up 1.00000 1.00000
38 ssd 2.29999 osd.38 up 1.00000 1.00000
39 ssd 2.29500 osd.39 up 1.00000 1.00000
40 ssd 3.49300 osd.40 up 1.00000 1.00000
45 ssd 6.98499 osd.45 up 1.00000 1.00000
46 ssd 6.98599 osd.46 up 1.00000 1.00000
47 ssd 6.98599 osd.47 up 1.00000 1.00000
-17 111.76465 host server-2004
5 nvme 6.98529 osd.5 down 0 1.00000
9 nvme 6.98529 osd.9 down 0 1.00000
18 nvme 6.98529 osd.18 down 0 1.00000
22 nvme 6.98529 osd.22 down 0 1.00000
32 nvme 6.98529 osd.32 down 0 1.00000
36 nvme 6.98529 osd.36 down 0 1.00000
50 nvme 6.98529 osd.50 down 0 1.00000
54 nvme 6.98529 osd.54 down 0 1.00000
58 nvme 6.98529 osd.58 down 0 1.00000
62 nvme 6.98529 osd.62 down 0 1.00000
66 nvme 6.98529 osd.66 down 0 1.00000
70 nvme 6.98529 osd.70 down 0 1.00000
74 nvme 6.98529 osd.74 down 0 1.00000
78 nvme 6.98529 osd.78 down 0 1.00000
82 nvme 6.98529 osd.82 down 0 1.00000
86 nvme 6.98529 osd.86 down 0 1.00000
-14 111.76465 host server-2005
4 nvme 6.98529 osd.4 down 0 1.00000
8 nvme 6.98529 osd.8 down 0 1.00000
17 nvme 6.98529 osd.17 down 0 1.00000
21 nvme 6.98529 osd.21 down 0 1.00000
31 nvme 6.98529 osd.31 down 0 1.00000
35 nvme 6.98529 osd.35 down 0 1.00000
49 nvme 6.98529 osd.49 down 0 1.00000
53 nvme 6.98529 osd.53 down 0 1.00000
57 nvme 6.98529 osd.57 down 0 1.00000
61 nvme 6.98529 osd.61 down 0 1.00000
65 nvme 6.98529 osd.65 down 0 1.00000
69 nvme 6.98529 osd.69 down 0 1.00000
73 nvme 6.98529 osd.73 down 0 1.00000
77 nvme 6.98529 osd.77 down 0 1.00000
81 nvme 6.98529 osd.81 down 0 1.00000
85 nvme 6.98529 osd.85 down 0 1.00000
-22 111.76465 host server-2006
6 nvme 6.98529 osd.6 down 0 1.00000
15 nvme 6.98529 osd.15 down 0 1.00000
19 nvme 6.98529 osd.19 down 0 1.00000
23 nvme 6.98529 osd.23 down 0 1.00000
33 nvme 6.98529 osd.33 down 0 1.00000
37 nvme 6.98529 osd.37 down 0 1.00000
51 nvme 6.98529 osd.51 down 0 1.00000
55 nvme 6.98529 osd.55 down 0 1.00000
59 nvme 6.98529 osd.59 down 0 1.00000
63 nvme 6.98529 osd.63 up 0 1.00000
67 nvme 6.98529 osd.67 down 0 1.00000
71 nvme 6.98529 osd.71 up 0 1.00000
75 nvme 6.98529 osd.75 down 0 1.00000
79 nvme 6.98529 osd.79 down 0 1.00000
83 nvme 6.98529 osd.83 down 0 1.00000
87 nvme 6.98529 osd.87 down 0 1.00000
-11 111.76465 host server-2007
3 nvme 6.98529 osd.3 down 0 1.00000
7 nvme 6.98529 osd.7 down 0 1.00000
16 nvme 6.98529 osd.16 down 0 1.00000
20 nvme 6.98529 osd.20 down 0 1.00000
30 nvme 6.98529 osd.30 down 0 1.00000
34 nvme 6.98529 osd.34 down 0 1.00000
48 nvme 6.98529 osd.48 down 0 1.00000
52 nvme 6.98529 osd.52 down 0 1.00000
56 nvme 6.98529 osd.56 down 0 1.00000
60 nvme 6.98529 osd.60 down 0 1.00000
64 nvme 6.98529 osd.64 down 0 1.00000
68 nvme 6.98529 osd.68 down 0 1.00000
72 nvme 6.98529 osd.72 down 0 1.00000
76 nvme 6.98529 osd.76 down 0 1.00000
80 nvme 6.98529 osd.80 down 0 1.00000
84 nvme 6.98529 osd.84 down 0 1.00000
Pool info:
pool 21 'dbs-realtime-staging-client' replicated size 3 min_size 1
crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
last_change 27611 lfor 0/27512/27510 flags hashpspool,selfmanaged_snaps
max_bytes 9999757606912 stripe_width 0 application rbd
pool 24 'dbs-realtime-staging-w-financedb' replicated size 3 min_size 1
crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on
last_change 27613 flags hashpspool,selfmanaged_snaps max_bytes
19999515213824 stripe_width 0 application rbd
pool 25 'dbs-realtime-staging-w-dstest' replicated size 3 min_size 1
crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
last_change 27813 lfor 0/0/23856 flags hashpspool,selfmanaged_snaps
max_bytes 99857989632 stripe_width 0 application rbd
Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@agoda.com<mailto:istvan.szabo@agoda.com>
---------------------------------------------------
________________________________
This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by copyright
or other legal rules. If you have received it by mistake please let us know
by reply email and delete it from your system. It is prohibited to copy
this message or disclose its content to anyone. Any confidentiality or
privilege is not waived or lost by any mistaken delivery or unauthorized
disclosure of the message. All messages sent to and from Agoda may be
monitored to ensure compliance with company policies, to protect the
company's interests and to remove potential malware. Electronic messages
may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io