rgw, grafana, prom, haproxy, etc are all optional
components. The
Is this Prometheus stateful? Where is this data stored?
Early on the team building the container images opted
for a single
image that includes all of the daemons for simplicity. We could build
stripped down images for each daemon type, but that's an investment in
developer time and complexity and we haven't heard any complaints
about the container size. (Usually a few hundred MB on a large scale
storage server isn't a problem.)
To me it looks like you do not take the containerization seriously, a container
development team that does not want to spend time on container images. You create
something >10x slower to start, >10x more disk space used (times 2 when upgrading).
Haproxy is 9MB. Your osd is 350MB.
5. I have been
writing this previously on the mailing list here. Is
each rgw still requiring its own dedicated client id? Is it still true,
that if you want to spawn 3 rgw instances, they need to authorize like
client.rgw1, client.rgw2 and client.rgw3?
This does not allow for auto scaling. The idea of
using an OC is that
you launch a task, and that you can scale this task automatically when
necessary. So you would get multiple instances of rgw1. If this is still
and issue with rgw, mds and mgr etc. Why even bother doing something
with an OC and containers?
The orchestrator automates the creation and cleanup of credentials for
each rgw instance. (It also trivially scales them up/down, ala k8s.)
I do not understand this. This sounds more to me like creating a new task, instead of
scaling a second instance of an existing task. Are you currently able to automatically
scale up/down instances of a rgw or is your statement hypothetical?
I can remember on the mesos mailing list/issue tracker talk about the difficulty of
determining a tasks 'number' . Because tasks are being killed/started at random,
based on resource offers. Thus supplying them with the correct different credentials is
not as trivial as it would seem.
So I wonder how you are scaling this? If there are already so many differences between
OC's, I would even recon they differ in this area quite a lot. So the most plausible
solution would be fixing this in at the rgw daemon.
If you have an autoscaler, you just need to tell
cephadm how many you
want and it will add/remove daemons. If you are using cephadm's
ingress (haproxy) capability, the LB configuration will be adjusted
for you. If you are using an external LB, you can query cephadm for a
description of the current daemons and their endpoints and feed that
info into your own ingress solution.
Forgive me for not looking at all the video links before writing this. But from the
video's I saw about cephadm it was more always like a command reference. Would be nice
to maybe show the above in ceph tech talk or so. I think a lot of people would be
interested seeing this.
6. As I wrote
before I do not want my rgw or haproxy running in a OC
that has the ability to give tasks capability SYSADMIN. So that would
mean I have to run my osd daemons/containers separately.
Only the OSD containers get extra caps to deal with the storage
hardware.
I know, that is why I choose to run drivers that require such SYSADMIN rights, to run
outside of my OC environment. My OC environment does not allow any tasks to use the
SYSADMIN.
Memory limits are partially implemented; we
haven't gotten to CPU
limits yet. It's on the list!
To me it is sort of clear what the focus of the cephadm team is.
I humbly contend that most users,
Hmmmm, most, most, most is not most mostly the average? Most people drive a Toyota, less
people drive Porsche and even less drive a Ferrari. It is your choice who your target
audience is and what you are 'selling' them.
especially those with small
clusters, would rather issue a single command and have the cluster
upgrade itself--with all of the latest and often version-specific
safety checks and any special per-release steps implemented for
them--than to do it themselves.
The flip side to this approach is. That if you guys make a mistake in some script, lots of
ceph clusters could go down.
Is this not a bit of a paradox, a team that has problems with their software dependencies
(ceph-ansible/ceph-deploy?), I should blindly trust to script the update of my cluster?
I know I have been very critical/sceptical about this cephadm. Please do also note I just
love this ceph storage, and I am advertising whenever possible. So a big thanks to the
whole team still!!!