Does seem like a bug, actually in more than just this command. The `ceph
orch host ls` with the --label and/or --host-pattern flag just piggybacks
off of the existing filtering done for placements in service specs. I've
just taken a look and you actually can create the same behavior with the
placement of an actual service, for example, with
[ceph: root@vm-00 /]# ceph orch host ls
HOST ADDR LABELS STATUS
vm-00 192.168.122.7 _admin
vm-01 192.168.122.171 foo
vm-02 192.168.122.147 foo
3 hosts in cluster
and spec
[ceph: root@vm-00 /]# cat ne.yaml
service_type: node-exporter
service_name: node-exporter
placement:
host_pattern: 'vm-0[0-1]'
you get the expected placement on vm-00 and vm-01
[ceph: root@vm-00 /]# ceph orch ps --daemon-type node-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE
MEM LIM VERSION IMAGE ID CONTAINER ID
node-exporter.vm-00 vm-00 *:9100 running (23s) 17s ago 23s 3636k
- 1.5.0 0da6a335fe13 f83e88caa7e0
node-exporter.vm-01 vm-01 *:9100 running (21h) 2m ago 21h 16.1M
- 1.5.0 0da6a335fe13 a5153c378449
but if I add label to the placement, while still leaving in the host pattern
[ceph: root@vm-00 /]# cat ne.yaml
service_type: node-exporter
service_name: node-exporter
placement:
label: foo
host_pattern: 'vm-0[0-1]'
you would expect to only get vm-01 at this point, as it's the only host
that matches both pieces of the placement, but instead you get both vm-01
and vm-02
[ceph: root@vm-00 /]# ceph orch ps --daemon-type node-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE
MEM LIM VERSION IMAGE ID CONTAINER ID
node-exporter.vm-01 vm-01 *:9100 running (21h) 4m ago 21h 16.1M
- 1.5.0 0da6a335fe13 a5153c378449
node-exporter.vm-02 vm-02 *:9100 running (23s) 18s ago 23s 5410k
- 1.5.0 0da6a335fe13 ddd1e643e341
Looking at the scheduling implementation, it seems currently it selects
candidates based on attributes in this order: Explicit host list, label,
host pattern (with some additional handling for count that happens in all
cases). When it finds the first thing in that list, in this case the label,
that is present in the placement, it uses that to select the candidates and
then bails out without any additional filtering on the host pattern
attribute. Since the placement spec validation doesn't allow applying specs
with both host_pattern/label and an explicit host list, this case with the
label and host pattern is the only one you can hit where this is an issue,
and I guess was just overlooked. Will take a look at making a patch to fix
this.
On Tue, Feb 13, 2024 at 7:09 PM Alex <mr.alexey(a)gmail.com> wrote:
Hello Ceph Gurus!
I'm running Ceph Pacific version.
if I run
ceph orch host ls --label osds
shows all hosts label osds
or
ceph orch host ls --host-pattern host1
shows just host1
it works as expected
But combining the two the label tag seems to "take over"
ceph orch host ls --label osds --host-pattern host1
6 hosts in cluster who had label osds whose hostname matched host1
shows all host with the label osds instead of only host1.
So at first the flags seem to act like an OR instead of an AND.
ceph orch host ls --label osds --host-pattern foo
6 hosts in cluster who had label osds whose hostname matched foo
even though "foo" doesn't even exist
ceph orch host ls --label bar --host-pattern host1
0 hosts in cluster who had label bar whose hostname matched host1
if the label and host combo was an OR this should have worked
there is no label bar but host1 exists so it just disregards the
host-pattern.
This started because the osd deployment task had both label and
host_pattern.
The cluster was attempting to deploy OSDS on all the servers with the
given tag instead of the one host we needed,
which caused it to go into warning state.
If I ran
ceph orch ls --export --service_name host1
it also showed both tags and host_pattern.
unmanaged: false
placement:
host_pattern:
label:
The issue persisted until I removed the label tag.
Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io