I did not switch to upmap still, what I have crush-compat[1]. I just set
the crush reweight of a ssd osd 33 to 0.0 (changing from ceph-disk to
ceph-volume with dmcrypt). First I see only ssd pools remapping, then a
bit later 2 hdd pools. I thought at first it could be maybe the time the
crush rule was adapted for hdd classes, but now a pool fs_data.ec21 is
remapping, and I know for sure the hdd ec21 rule existed when this pool
was created.
[1]
[@ceph]# ceph balancer status
{
"last_optimize_duration": "0:00:00.647219",
"plans": [],
"mode": "crush-compat",
"active": true,
"optimize_result": "Unable to find further optimization, change
balancer mode and retry might help",
"last_optimize_started": "Wed Sep 30 17:10:27 2020"
}
[@ceph]# ceph osd crush rule dump replicated_ruleset
{
"rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
[@ceph]# ceph osd crush rule dump replicated_ruleset_ssd
{
"rule_id": 5,
"rule_name": "replicated_ruleset_ssd",
"ruleset": 5,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -15,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
-----Original Message-----
To: Marc Roos; eblock
Cc: ceph-users; nico.schottelius
Subject: Re: [ceph-users] Re: hdd pg's migrating when converting ssd
class osd's
Hi Nico and Mark,
your crush trees look indeed like they have been converted properly to
using device classes already. Changing something within one device class
should not influence placement in another. Maybe I'm overlooking
something?
The only other place I know of where such a mix-up could occur are the
crush rules. Do your rules look like this:
{
"rule_id": 5,
"rule_name": "sr-rbd-data-one",
"ruleset": 5,
"type": 3,
"min_size": 3,
"max_size": 8,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 50
},
{
"op": "set_choose_tries",
"num": 1000
},
{
"op": "take",
"item": -185,
"item_name": "ServerRoom~rbd_data"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
Notice the "~rbd_data" qualifier. It is important that the device class
is specified at the root selection.
I'm really surprised that with your crush tree you observe changes in
SSD implying changes in HDD placements. I was really rough on our mimic
cluster with moving disks in and out and between servers and I have
never seen this problem. Could it be a regression in nautilus? Is the
auto-balancer interfering?
we recently also noticed that rebuilding one pool
("ssd") influenced
speed on other pools, which was unexpected.
Could this be something else? Was PG/object placement influenced or
performance only?
I'm asking, because during one of our service windows we observed
something very strange. We have a multi-location cluster with pools with
completely isolated storage devices in different locations. On one of
these sub-clusters we run a ceph fs. During maintenance we needed to
shut down the ceph-fs. When our admin issued the umount command (ca.
1500 clients), we noticed that RBD pools seemed to have problems even
though there is absolutely no overlap in disks (disjoint crush trees),
they are not even in the same physical location and sit on their own
switches. The fs and RBD only share the MONs/MGRs. I'm not entirely sure
if we observed something real or only a network blip. However, nagios
went crazy on our VM environment for a few minutes.
Maybe there is another issue that causes unexpected cross-dependencies
that affect performance?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Marc Roos <M.Roos(a)f1-outsourcing.eu>
Sent: 30 September 2020 14:59:50
To: eblock; Frank Schilder
Cc: ceph-users; nico.schottelius
Subject: RE: [ceph-users] Re: hdd pg's migrating when converting ssd
class osd's
Hi Frank, thanks this 'root default' indeed looks different with these 0
there. I have also uploaded mine[1] because it looks very similar to
Nico's. I guess his hdd pg's can also start moving in some occassions.
Thanks for 'crushtool reclassify' hint, I guess I have missed this in
the release notes or so.
[1]
https://pastebin.com/PFx0V3S7
-----Original Message-----
To: Eugen Block
Cc: Marc Roos; ceph-users
Subject: Re: [ceph-users] Re: hdd pg's migrating when converting ssd
class osd's
This is how my crush tree including shadow hierarchies looks like (a
mess :):
https://pastebin.com/iCLbi4Up
Every device class has its own tree. Starting with mimic, this is
automatic when creating new device classes.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Eugen Block <eblock(a)nde.ag>
Sent: 30 September 2020 08:43:47
To: Frank Schilder
Cc: Marc Roos; ceph-users
Subject: Re: [ceph-users] Re: hdd pg's migrating when converting ssd
class osd's
Interesting, I also did this test on an upgraded cluster (L to N).
I'll repeat the test on a native Nautilus to see it for myself.
Zitat von Frank Schilder
Somebody on this list posted a script that can convert
pre-mimic crush
trees with buckets for different types of devices to
crush trees with
device classes with minimal data movement (trying to maintain IDs as
much as possible). Don't have a thread name right now, but could try
to find it tomorrow.
I can check tomorrow how our crush tree unfolds. Basically, for every
device class there is a full copy (shadow hierarchy) for each device
class with its own weights etc.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Marc Roos
Sent: 29 September 2020 22:19:33
To: eblock; Frank Schilder
Cc: ceph-users
Subject: RE: [ceph-users] Re: hdd pg's migrating when converting ssd
class osd's
Yes correct this is coming from Luminous or maybe even Kraken. How
does a default crush tree look like in mimic or octopus? Or is there
some manual how to bring this to the new 'default'?
-----Original Message-----
Cc: ceph-users
Subject: Re: [ceph-users] Re: hdd pg's migrating when converting ssd
class osd's
Are these crush maps inherited from pre-mimic versions? I have
re-balanced SSD and HDD pools in mimic (mimic deployed) where one
device class never influenced the placement of the other. I have mixed
hosts and went as far as introducing rbd_meta,
rbd_data and such
classes to sub-divide even further (all these devices have different
perf specs).
This worked like a charm. When adding devices of one
class, only pools
in this class were ever affected.
As far as I understand, starting with mimic, every shadow class
defines a separate tree (not just leafs/OSDs). Thus, device classes
are independent of each other.
________________________________________
Sent: 29 September 2020 20:54:48
To: eblock
Cc: ceph-users
Subject: [ceph-users] Re: hdd pg's migrating when converting ssd class
osd's
Yes correct, hosts have indeed both ssd's and hdd's combined. Is this
not more of a bug then? I would assume the goal of using device
classes is that you separate these and one does not affect the other,
even the host weight of the ssd and hdd class are already available.
The algorithm should just use that instead of the weight of the whole
host.
Or is there some specific use case, where these
classes combined is
required?
-----Original Message-----
Cc: ceph-users
Subject: *****SPAM***** Re: [ceph-users] Re: hdd pg's migrating when
converting ssd class osd's
They're still in the same root (default) and each host is member of
both device-classes, I guess you have a mixed setup (hosts c01/c02
have both HDDs and SSDs)? I don't think this separation is enough to
avoid remapping even if a different device-class is affected (your
report confirms that).
Dividing the crush tree into different subtrees might help here but
I'm not sure if that's really something you need. You might also just
deal with the remapping as long as it doesn't happen too often, I
guess. On the other hand, if your setup won't change (except adding
more OSDs) you might as well think about a different crush tree. It
really depends on your actual requirements.
We created two different subtrees when we got new hardware and it
helped us a lot moving the data only once to the new hardware avoiding
multiple remappings, now the older hardware is our EC
environment
except for some SSDs on those old hosts that had to stay in the main
subtree. So our setup is also very individual but it works quite nice.
:-)
Zitat von :
> I have practically a default setup. If I do a 'ceph osd crush tree
> --show-shadow' I have a listing like this[1]. I would assume from the
hosts being
listed within the default~ssd and default~hdd, they are
separate (enough)?
[1]
root default~ssd
host c01~ssd
..
..
host c02~ssd
..
root default~hdd
host c01~hdd
..
host c02~hdd
..
root default
-----Original Message-----
To: ceph-users(a)ceph.io
Subject: [ceph-users] Re: hdd pg's migrating when converting ssd
class
> osd's
>
> Are all the OSDs in the same crush root? I would think that since the
crush weight
of hosts change as soon as OSDs are out it impacts the
whole crush tree. If you separate the SSDs from the HDDs logically
(e.g.
different bucket type in the crush tree) the
ramapping wouldn't
affect
> the HDDs.
>
>
>
>
>> I have been converting ssd's osd's to dmcrypt, and I have noticed
>> that
>
>> pg's of pools are migrated that should be (and are?) on hdd class.
>>
>> On a healthy ok cluster I am getting, when I set the crush reweight
>> to
>
>> 0.0 of a ssd osd this:
>>
>> 17.35 10415 0 0 9907 0
>> 36001743890 0 0 3045 3045
>> active+remapped+backfilling 2020-09-27 12:55:49.093054
>> active+remapped+83758'20725398
>> 83758:100379720 [8,14,23] 8 [3,14,23] 3
>> 83636'20718129 2020-09-27 00:58:07.098096 83300'20689151 2020-09-24
>> 21:42:07.385360 0
>>
>> However osds 3,14,23,8 are all hdd osd's
>>
>> Since this is a cluster from Kraken/Luminous, I am not sure if the
>> device class of the replicated_ruleset[1] was set when the pool 17
>> was
>
>> created.
>> Weird thing is that all pg's of this pool seem to be on hdd osd[2]
>>
>> Q. How can I display the definition of 'crush_rule 0' at the time of
the pool creation? (To be sure it had already this
device class hdd
configured)
[1]
[@~]# ceph osd pool ls detail | grep 'pool 17'
pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
[@~]# ceph osd crush rule dump replicated_ruleset {
"rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
[2]
[@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17"
"$19}'
> | grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush
> | get-device-class
> osd.$osd ; done | sort -u
> dumped pgs
> hdd
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io