Running different rgw daemon with same cephxuser

List overview All Threads
Download

newer

older

Is replacing OSD whose data is on...

how far can we go using vstart.sh...

Jiffin Thottan

4 Feb 2021 4 Feb '21

11:38 p.m.

Hi all, In OCS(Rook) env workflow for RGW daemons as follows, Normally for creating ceph object-store, the first Rook creates pools for rgw daemon with the specified configuration. Then depending on the no of instances, Rook create cephxuser and then rgw spawn daemon in the container(pod) using its id with following arguments for radosgw binary Args: --fsid=91501490-4b55-47db-b226-f9d9968774c1 --keyring=/etc/ceph/keyring-store/keyring --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=$(ROOK_CEPH_MON_HOST) --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) --id=rgw.my.store.a --setuser=ceph --setgroup=ceph --foreground --rgw-frontends=beast port=8080 --host=$(POD_NAME) --rgw-mime-types-file=/etc/ceph/rgw/mime.types --rgw-realm=my-store --rgw-zonegroup=my-store --rgw-zone=my-store And here cephxuser will be "client.rgw.my.store.a" and all the pools for rgw will be created as my-store*. Normally if there is a request for another instance in the config file for a ceph-object-store config file[1] for rook, another user "client.rgw.mystore.b" will be created by rook and will consume the same pools. There is a feature in Kubernetes known as autoscale in which pods can be automatically scaled based on specified metrics. If we apply that feature for rgw pods, Kubernetes will automatically scale the rgw pods(like a clone of the existing pod) with the same argument for "--id" based on the metrics, but ceph cannot distinguish those as different rgw daemons even though multiple pods of rgw are running simultaneously. In "ceph status" shows only one daemon rgw as well In vstart or ceph ansible(Ali help me to figure it out), I can see for each rgw daemon a cephxuser is getting created as well Is this behaviour intended ? or am I hitting any corner case which was never tested before? There is no point of autoscaling of rgw pod if it considered to the same daemon, the s3 client will talk to only one of the pods and ceph mgr provides metrics can give incorrect data as well which can affect the autoscale feature Also opened an issue in rook for the time being [2] [1] https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/o… [2] https://github.com/rook/rook/issues/6943 Regards, Jiffin

Show replies by date

Sebastien Han

8 Feb 8 Feb

11:47 p.m.

Hi Jiffin, From my perspective, one simple way to fix this (although we must be careful with backward compatibility) would be for rgw to register to service map differently. Today it is using the daemon name like rgw.foo, then it will register as foo. Essentially, if you try to run that pod twice you would still see a single instance in the service map as well as the prometheus metrics. It would be nice to register with RADOS client session ID instead , just like rbd-mirror does by using instance_id. Something like: std::string instance_id = stringify(rados->get_instance_id()); int ret = rados.service_daemon_register(daemon_type, name, metadata); Here https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L1139 With that we can re-use the same cephx user and scale to any number, all instances will use the same cephx to authenticate to the cluster but they will show up as N in the service map. I guess one downside is that as soon as the daemon restart, we get a new RADOS client session ID, and thus our name changes, which means we are losing all the metrics... Thoughts? Thanks! ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Thu, Feb 4, 2021 at 3:39 PM Jiffin Thottan <jthottan(a)redhat.com> wrote:

...

Matt Benjamin

11:52 p.m.

HI Sebastien, That seems like a concise and reasonable solution to me. It seems like the metrics from a single instance should in fact be transient (leaving the problem of maintaining aggregate values to prometheus or even downstream of that? Matt On Mon, Feb 8, 2021 at 9:47 AM Sebastien Han <shan(a)redhat.com> wrote:

...

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

Sebastien Han

10 Feb 10 Feb

1:41 a.m.

Thank Matt, I just sent this to kick in the discussion https://github.com/ceph/ceph/pull/39380 If someone wants to take over it's preferable I guess, this is mainly due to my limited C++ knowledge. So feel free to assign someone from your team to take over so we can move faster with this one. Thanks! ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Mon, Feb 8, 2021 at 3:53 PM Matt Benjamin <mbenjami(a)redhat.com> wrote:

...

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

Matt Benjamin

1:44 a.m.

sure. assuming this compiles, I'm fine with it; I'll try it, and make it do if needed :) Matt On Tue, Feb 9, 2021 at 11:42 AM Sebastien Han <shan(a)redhat.com> wrote:

...

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

Jiffin Thottan

2:29 p.m.

Hey Seb, I will test the PR against HPA and let u know the results (within one or two days). -- Jiffin ----- Original Message ----- From: "Sebastien Han" <shan(a)redhat.com> To: "Matt Benjamin" <mbenjami(a)redhat.com> Cc: "Jiffin Thottan" <jthottan(a)redhat.com>om>, "ceph-rgw-eng" <ceph-rgw-eng(a)redhat.com>om>, "ceph-tech-list" <ceph-tech-list(a)redhat.com>om>, "dev" <dev(a)ceph.io>io>, "Matt Benjamin" <mbenjamin(a)redhat.com>om>, "Kaleb Keithley" <kkeithle(a)redhat.com>om>, "Orit Wasserman" <owasserm(a)redhat.com>om>, "Travis Nielsen" <tnielsen(a)redhat.com> Sent: Tuesday, February 9, 2021 10:11:47 PM Subject: Re: Running different rgw daemon with same cephxuser Thank Matt, I just sent this to kick in the discussion https://github.com/ceph/ceph/pull/39380 If someone wants to take over it's preferable I guess, this is mainly due to my limited C++ knowledge. So feel free to assign someone from your team to take over so we can move faster with this one. Thanks! ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Mon, Feb 8, 2021 at 3:53 PM Matt Benjamin <mbenjami(a)redhat.com> wrote:

...

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

Sebastien Han

4:50 p.m.

Sounds good, thanks guys! It does compile so go for it :) ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Wed, Feb 10, 2021 at 6:29 AM Jiffin Thottan <jthottan(a)redhat.com> wrote:

...

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

Jiffin Thottan

11 Feb 11 Feb

8:46 p.m.

I was able to test the PR against HPA in minikube and it is working as expected. # ceph status cluster: id: c7a87662-dccb-4143-bf68-58ff676a0362 health: HEALTH_WARN mon a is low on available space 8 pool(s) have no replicas configured services: mon: 1 daemons, quorum a (age 20m) mgr: a(active, since 19m) osd: 1 osds: 1 up (since 19m), 1 in (since 19m) rgw: 3 daemons active (my.store.a.my-store.my-store.4383, my.store.a.my-store.my-store.4715, my.store.a.my-store.my-store.4717) data: pools: 8 pools, 96 pgs objects: 2.57k objects, 8.5 MiB usage: 85 MiB used, 20 GiB / 20 GiB avail pgs: 96 active+clean io: client: 611 KiB/s rd, 386 KiB/s wr, 696 op/s rd, 1.27k op/s wr even metrics separated shown from ceph mgr. @Matt @Casey : I saw following wrt s3 client I created HPA for rgw pod which will scale pods based on no of requests, I trigger recursive directory(4480 directories, 67705 files) copy from s3 client using the following command aws s3 cp <directory> --no-verify-ssl --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT s3://$BUCKET_NAME even hpa scaled the rgw pods, requests were not sending to new created rgw pods(daemons) but when I triggered another recursive copy it was sent to all the pods. Is this behaviour expected?? -- Jiffin ----- Original Message ----- From: "Sebastien Han" <shan(a)redhat.com> To: "Jiffin Thottan" <jthottan(a)redhat.com> Cc: "Matt Benjamin" <mbenjami(a)redhat.com>om>, "ceph-rgw-eng" <ceph-rgw-eng(a)redhat.com>om>, "ceph-tech-list" <ceph-tech-list(a)redhat.com>om>, "dev" <dev(a)ceph.io>io>, "Matt Benjamin" <mbenjamin(a)redhat.com>om>, "Kaleb Keithley" <kkeithle(a)redhat.com>om>, "Orit Wasserman" <owasserm(a)redhat.com>om>, "Travis Nielsen" <tnielsen(a)redhat.com> Sent: Wednesday, February 10, 2021 1:20:14 PM Subject: Re: Running different rgw daemon with same cephxuser Sounds good, thanks guys! It does compile so go for it :) ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Wed, Feb 10, 2021 at 6:29 AM Jiffin Thottan <jthottan(a)redhat.com> wrote:

...

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

Kyle Bader

14 Feb 14 Feb

9:49 a.m.

You would need new tcp connections for kube proxy to send to new hosts On Thu, Feb 11, 2021 at 03:47 Jiffin Thottan <jthottan(a)redhat.com> wrote:

...

I was able to test the PR against HPA in minikube and it is working as expected. # ceph status cluster: id: c7a87662-dccb-4143-bf68-58ff676a0362 health: HEALTH_WARN mon a is low on available space 8 pool(s) have no replicas configured services: mon: 1 daemons, quorum a (age 20m) mgr: a(active, since 19m) osd: 1 osds: 1 up (since 19m), 1 in (since 19m) rgw: 3 daemons active (my.store.a.my-store.my-store.4383, my.store.a.my-store.my-store.4715, my.store.a.my-store.my-store.4717) data: pools: 8 pools, 96 pgs objects: 2.57k objects, 8.5 MiB usage: 85 MiB used, 20 GiB / 20 GiB avail pgs: 96 active+clean io: client: 611 KiB/s rd, 386 KiB/s wr, 696 op/s rd, 1.27k op/s wr even metrics separated shown from ceph mgr. @Matt @Casey : I saw following wrt s3 client I created HPA for rgw pod which will scale pods based on no of requests, I trigger recursive directory(4480 directories, 67705 files) copy from s3 client using the following command aws s3 cp <directory> --no-verify-ssl --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT s3://$BUCKET_NAME even hpa scaled the rgw pods, requests were not sending to new created rgw pods(daemons) but when I triggered another recursive copy it was sent to all the pods. Is this behaviour expected?? -- Jiffin ----- Original Message ----- From: "Sebastien Han" <shan(a)redhat.com> To: "Jiffin Thottan" <jthottan(a)redhat.com> Cc: "Matt Benjamin" <mbenjami(a)redhat.com>om>, "ceph-rgw-eng" < ceph-rgw-eng(a)redhat.com>gt;, "ceph-tech-list" <ceph-tech-list(a)redhat.com>om>, "dev" <dev(a)ceph.io>io>, "Matt Benjamin" <mbenjamin(a)redhat.com>om>, "Kaleb Keithley" <kkeithle(a)redhat.com>om>, "Orit Wasserman" <owasserm(a)redhat.com>om>, "Travis Nielsen" <tnielsen(a)redhat.com> Sent: Wednesday, February 10, 2021 1:20:14 PM Subject: Re: Running different rgw daemon with same cephxuser Sounds good, thanks guys! It does compile so go for it :) ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Wed, Feb 10, 2021 at 6:29 AM Jiffin Thottan <jthottan(a)redhat.com> wrote:

Hey Seb, I will test the PR against HPA and let u know the results (within one or

two days).

-- Jiffin ----- Original Message ----- From: "Sebastien Han" <shan(a)redhat.com> To: "Matt Benjamin" <mbenjami(a)redhat.com> Cc: "Jiffin Thottan" <jthottan(a)redhat.com>om>, "ceph-rgw-eng" <

ceph-rgw-eng(a)redhat.com>gt;, "ceph-tech-list" <ceph-tech-list(a)redhat.com>om>, "dev" <dev(a)ceph.io>io>, "Matt Benjamin" <mbenjamin(a)redhat.com>om>, "Kaleb Keithley" <kkeithle(a)redhat.com>om>, "Orit Wasserman" <owasserm(a)redhat.com>om>, "Travis Nielsen" <tnielsen(a)redhat.com>

Sent: Tuesday, February 9, 2021 10:11:47 PM Subject: Re: Running different rgw daemon with same cephxuser Thank Matt, I just sent this to kick in the discussion https://github.com/ceph/ceph/pull/39380 If someone wants to take over it's preferable I guess, this is mainly due to my limited C++ knowledge. So feel free to assign someone from your team to take over so we can move faster with this one. Thanks! ––––––––– Sébastien Han Senior Principal Software Engineer, Storage Architect "Always give 100%. Unless you're giving blood." On Mon, Feb 8, 2021 at 3:53 PM Matt Benjamin <mbenjami(a)redhat.com>

wrote:

> > HI Sebastien, > > That seems like a concise and reasonable solution to me. It seems > like the metrics from a single instance should in fact be transient > (leaving the problem of maintaining aggregate values to prometheus or > even downstream of that? > > Matt > > On Mon, Feb 8, 2021 at 9:47 AM Sebastien Han <shan(a)redhat.com> wrote: > > > > Hi Jiffin, > > > > From my perspective, one simple way to fix this (although we must be > > careful with backward compatibility) would be for rgw to register to > > service map differently. > > Today it is using the daemon name like rgw.foo, then it will register > > as foo. Essentially, if you try to run that pod twice you would still > > see a single instance in the service map as well as the prometheus > > metrics. > > > > It would be nice to register with RADOS client session ID instead , > > just like rbd-mirror does by using instance_id. Something like: > > > > std::string instance_id = stringify(rados->get_instance_id()); > > int ret = rados.service_daemon_register(daemon_type, name, metadata); > > > > Here

https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L1139

> > With that we can re-use the same cephx user and scale to any number, > > all instances will use the same cephx to authenticate to the cluster > > but they will show up as N in the service map. > > > > I guess one downside is that as soon as the daemon restart, we get a > > new RADOS client session ID, and thus our name changes, which means

> > are losing all the metrics... > > Thoughts? > > > > Thanks! > > ––––––––– > > Sébastien Han > > Senior Principal Software Engineer, Storage Architect > > > > "Always give 100%. Unless you're giving blood." > > > > On Thu, Feb 4, 2021 at 3:39 PM Jiffin Thottan <jthottan(a)redhat.com>

wrote:

> > > > > > Hi all, > > > > > > In OCS(Rook) env workflow for RGW daemons as follows, > > > > > > Normally for creating ceph object-store, the first Rook creates

pools for rgw daemon with the specified configuration.

> > > > > > Then depending on the no of instances, Rook create cephxuser and

then rgw spawn daemon in the container(pod) using its id

> > > with following arguments for radosgw binary > > > Args: > > > --fsid=91501490-4b55-47db-b226-f9d9968774c1 > > > --keyring=/etc/ceph/keyring-store/keyring > > > --log-to-stderr=true > > > --err-to-stderr=true > > > --mon-cluster-log-to-stderr=true > > > --log-stderr-prefix=debug > > > --default-log-to-file=false > > > --default-mon-cluster-log-to-file=false > > > --mon-host=$(ROOK_CEPH_MON_HOST) > > > --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) > > > --id=rgw.my.store.a > > > --setuser=ceph > > > --setgroup=ceph > > > --foreground > > > --rgw-frontends=beast port=8080 > > > --host=$(POD_NAME) > > > --rgw-mime-types-file=/etc/ceph/rgw/mime.types > > > --rgw-realm=my-store > > > --rgw-zonegroup=my-store > > > --rgw-zone=my-store > > > > > > And here cephxuser will be "client.rgw.my.store.a" and all the

pools for rgw will be created as my-store*. Normally if there is

> > > a request for another instance in the config file for a

ceph-object-store config file[1] for rook, another user "client.rgw.mystore.b"

> > > will be created by rook and will consume the same pools. > > > > > > There is a feature in Kubernetes known as autoscale in which pods

can be automatically scaled based on specified metrics. If we apply that

> > > feature for rgw pods, Kubernetes will automatically scale the rgw

pods(like a clone of the existing pod) with the same argument for "--id"

> > > based on the metrics, but ceph cannot distinguish those as

different rgw daemons even though multiple pods of rgw are running simultaneously.

> > > In "ceph status" shows only one daemon rgw as well > > > > > > In vstart or ceph ansible(Ali help me to figure it out), I can

see for each rg <https://www.google.com/maps/search/ansible(Ali+help+me+to+figure+it+out),+I+can+see+for+each+rg?entry=gmail&source=g>w daemon a cephxuser is getting created as well

> > > > > > Is this behaviour intended ? or am I hitting any corner case which

was never tested before?

> > > > > > There is no point of autoscaling of rgw pod if it considered to

the same daemon, the s3 client will talk to only one of the pods and ceph mgr

> > > provides metrics can give incorrect data as well which can affect

the autoscale feature

> > > > > > Also opened an issue in rook for the time being [2] > > > > > > [1]

https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/o…

> [2] https://github.com/rook/rook/issues/6943 > > Regards, > Jiffin >

-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309

_______________________________________________ Dev mailing list -- dev(a)ceph.io To unsubscribe send an email to dev-leave(a)ceph.io

Jiffin Thottan

15 Feb 15 Feb

2:18 p.m.

Thanks Kyle for the confirmation ----- Original Message ----- From: "Kyle Bader" <kyle.bader(a)gmail.com> To: "Jiffin Thottan" <jthottan(a)redhat.com> Cc: "Kaleb Keithley" <kkeithle(a)redhat.com>om>, "Matt Benjamin" <mbenjamin(a)redhat.com>om>, "Matt Benjamin" <mbenjami(a)redhat.com>om>, "Orit Wasserman" <owasserm(a)redhat.com>om>, "Sebastien Han" <shan(a)redhat.com>om>, "Travis Nielsen" <tnielsen(a)redhat.com>om>, "ceph-rgw-eng" <ceph-rgw-eng(a)redhat.com>om>, "ceph-tech-list" <ceph-tech-list(a)redhat.com>om>, "dev" <dev(a)ceph.io> Sent: Sunday, February 14, 2021 6:19:29 AM Subject: Re: Running different rgw daemon with same cephxuser You would need new tcp connections for kube proxy to send to new hosts On Thu, Feb 11, 2021 at 03:47 Jiffin Thottan <jthottan(a)redhat.com> wrote: