On Wed, Oct 23, 2019 at 9:56 AM Sage Weil <sweil(a)redhat.com> wrote:
I'm trying to implement MDS daemon management for mgr/ssh and am
confused by the intent of the orchestrator interface.
- The add_mds() method takes a 'spec' StatelessServiceSpec that has
a ctor like
def __init__(self, name, placement=None, count=None):
but it is constructed only with a name:
@_write_cli('orchestrator mds add',
"name=svc_arg,type=CephString",
'Create an MDS service')
def _mds_add(self, svc_arg):
spec = orchestrator.StatelessServiceSpec(svc_arg)
That means count=1 and placement is unspecified. That's fine for Rook,
sort of, as long as you want exactly 1 MDS for each file system.
- Given that, can we rename the 'svg_arg' arg to 'name'?
- The 'name' here, IIUC, is the name of the grouping of daemons. I think
it was intended to be a file system, as per the docs:
The ``name`` parameter is an identifier of the group of instances:
* a CephFS file system for a group of MDS daemons,
* a zone name for a group of RGWs
but IIRC the new CephFS behavior is that all standby daemons go into the
same pool and are doled out to file systems that need them arbitrarily.
In that case, I think the only thing we would want to specify (in the rook
case where we don't pick daemon location) is the count of MDSs... and
then have a singel name grouping. Is that right for CephFS?
Yes. One issue we need to consider is that when we have the mgr
creating/deleting MDS daemons based on the needs of the file systems,
we will need to delete a specific standby and not just any daemon.
Otherwise, we have unnecessary failovers.
Perhaps the MDS name should just be a random short string of letters
and not identify a "group" of MDS daemons.
I have a
feeling it won't work for the other daemon types, though, like NFS
servers, which *do* care what they are serving up.
- For SSH, none of that works, since we need to pass a location when
adding daemons. It seems like we want somethign closer to nfs_add,
which is
@_write_cli('orchestrator nfs add',
"name=svc_arg,type=CephString "
"name=pool,type=CephString "
"name=namespace,type=CephString,req=false",
'Create an NFS service')
i.e.,
* 'add' takes a 'name' (the actual daemon name) and a location (if
the
orch needs it).
* 'rm' takes the same name and removes it.
* 'update' does the smarts of adding ($want - $have) daemons for a
given group and generating names for them. Something else organizes these
into groups (a common name prefix?). I.e., 'update' basically builds on
'add' and 'rm'.
And/or, we introduce some basic scheduling into ssh orchestrator (or
orchestrator_cli). I'm not sure this is actually that smart since we can
probably get away with something quite simple: round-robin assignment of
daemons to hosts, and the ability to label nodes for a daemon type or
daemon type + grouping. This would basically give ssh orch what ansible
does as far as mapping out the deployment, and gracefully degrade to
something that "just works" (well enough) when you don't know/care
where things land. Obviously having a real scheduler like that in k8s
do this is better, but for non-kube deployments, there is still a need for
placing daemons to hosts to make things easy for the human operator.
Agreed.
--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D