Am 08.02.20 um 17:25 schrieb Sage Weil:
The serve() one is the most important, IMO: we need it
to (1) be parallel,
(2) gracefully handle errors for each host and raise appropriate health
alerts, and (3) update the cache as appropriate. For the CLI case,
whether it triggers the scrape synchrnously or somehow kicks serve() and
waits is an probably-not-so-important detail.
The only concern I have is: We have to prevent to scrape in parallel: in
serve() and from the cli. We simply don't have enough connections to
spare. I've seen this for other calls as well: if serve() is busy doing
some background task, the cli basically hangs.
On the other hand, the remaining internal _get_services() callers should I
think all just use the latest cached state.
+1
Right now the way the code is
structured makes it very confusing which path is used for which, and the
use of the async_map_completion help (currently, at least) makes it hard
to tell which host failed.
The exception (with indeed very little detail) should be forwarded to
the completion.
As for additional services (monitoring, nfs, etc.), I think that can
proceed more quickly once we have the CLI and add/remove/update issues
sorted out. I may start with a RFC PR on that, but I would really
like some feedback on whether the proposal makes sense.
https://github.com/ceph/ceph/pull/33205/files should also help with new
services.
Thanks!
sage
--
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer