[ceph-users] Re: osd out vs crush reweight]

21 Jul 2020

Hi Dominiq

I must say that I inherited this cluster and did not develop the cursh
rule used. The rule reads:

        "rule_id": 1,
        "rule_name": "hdd",
        "ruleset": 1,
        "type": 1,
        "min_size": 2,
        "max_size": 3,
        "steps": [
            {
                "op": "take",
                "item": -31,
                "item_name": "DC3"
            },
            {
                "op": "choose_firstn",
                "num": 0,
                "type": "room"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 1,
                "type": "host"
            },

Doesn't that say it will choose DC3, then a room within DC3 and then a
host? (I agree that racks in the tree are superfluous, but it does not
harm either)

Anyway thanks for your effort. I hope someone else can explain why setting
the crushweight of an osd to 0 results in surprisingly much PG's going to
other osd;s on the same node instead of going to other nodes

Marcel

...
  Marcel;

 To answer your question, I don't see anything that would be keeping these
 PGs on the same node.  Someone with more knowledge of how the Crush rules
 are applied, and the code around these operations, would need to weigh in.

 I am somewhat curious though; you define racks, and even rooms in your
 tree, but your failure domain is set to host.  Is that intentional?

 Thank you,

 Dominic L. Hilsbos, MBA
 Director - Information Technology
 Perform Air International, Inc.
 DHilsbos(a)PerformAir.com
 www.PerformAir.com

 -----Original Message-----
 From: Marcel Kuiper [mailto:ceph@mknet.nl]
 Sent: Tuesday, July 21, 2020 10:14 AM
 To: ceph-users(a)ceph.io
 Cc: Dominic Hilsbos
 Subject: Re: [ceph-users] Re: osd out vs crush reweight]

 Dominic

 The crush rule dump and tree are attached (hope that works). All pools use
 crush_rule 1

 Marcel

  Marcel;

 Sorry, could also send the output of:
 ceph osd tree

 Thank you,

 Dominic L. Hilsbos, MBA
 Director - Information Technology
 Perform Air International, Inc.
 DHilsbos(a)PerformAir.com
 www.PerformAir.com

 -----Original Message-----
 From: DHilsbos(a)performair.com [mailto:DHilsbos@performair.com]
 Sent: Tuesday, July 21, 2020 9:41 AM
 To: ceph(a)mknet.nl; ceph-users(a)ceph.io
 Subject: [ceph-users] Re: osd out vs crush reweight]

 Marcel;

 Thank you for the information.

 Could you send the output of:
 ceph osd crush rule dump

 Thank you,

 Dominic L. Hilsbos, MBA
 Director - Information Technology
 Perform Air International, Inc.
 DHilsbos(a)PerformAir.com
 www.PerformAir.com

 -----Original Message-----
 From: Marcel Kuiper [mailto:ceph@mknet.nl]
 Sent: Tuesday, July 21, 2020 9:38 AM
 To: ceph-users(a)ceph.io
 Subject: [ceph-users] Re: osd out vs crush reweight]

 Hi Dominic,

 This cluster is running 14.2.8 (nautilus) There's 172 osds divided
 over 19 nodes.
 There are currently 10 pools.
 All pools have 3 replica's of data
 There are 3968 PG's (the cluster is not yet fully in use. The number
 of PGs is expected to grow)

 Marcel

  Marcel;

 Short answer; yes, it might be expected behavior.

 PG placement is highly dependent on the cluster layout, and CRUSH
 rules.
 So... Some clarifying questions.

 What version of Ceph are you running?
 How many nodes do you have?
 How many pools do you have, and what are their failure domains?

 Thank you,

 Dominic L. Hilsbos, MBA
 Director - Information Technology
 Perform Air International, Inc.
 DHilsbos(a)PerformAir.com
 www.PerformAir.com

 -----Original Message-----
 From: Marcel Kuiper [mailto:ceph@mknet.nl]
 Sent: Tuesday, July 21, 2020 6:52 AM
 To: ceph-users(a)ceph.io
 Subject: [ceph-users] osd out vs crush reweight

 Hi list,

 I ran a test with marking an osd out versus setting its crush weight
 to 0.
 I compared to what osds pages were send. The crush map has 3 rooms.
 This is what happened.

 On ceph osd out 111 (first room; this node has osds 108 - 116) pg's
 were send to the following osds

 NR PG's   OSD
       2   1
       1   4
       1   5
       1   6
       1   7
       2   8
       1   31
       1   34
       1   35
       1   56
       2   57
       1   58
       1   61
       1   83
       1   84
       1   88
       1   99
       1   100
       2   107
       1   114
       2   117
       1   118
       1   119
       1   121

 All PG's were send to osds on other nodes in the same room, except
 for 1 PG on osd 114. I think this works as expected

 Now I  marked the osd in and wait until all stabilized. Then I set
 the crush weight to 0. ceph osd crush reweight osd.111 0. I thought
 this lowers the crush weight of the node so even less chances that
 PG's end up on an osd of the same node. However the result are

 NR PG's   OSD
       1   61
       1   83
       1   86
       3   108
       4   109
       5   110
       2   112
       5   113
       7   114
       5   115
       2   116

 except for 3 PG's all other PG's ended up on an osd belonging to the
 same node :-O. Is this expected behaviour? Can someone explain?? This
 is on nautilus 14.2.8.

 Thanks

 Marcel
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
 email to ceph-users-leave(a)ceph.io
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
 email to ceph-users-leave(a)ceph.io

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
 email to ceph-users-leave(a)ceph.io
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
 email to ceph-users-leave(a)ceph.io
 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
 email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: osd out vs crush reweight]