June 2023 - ceph-users - lists.ceph.io

by Yixin Jin

Hi folks, Has anyone experienced the situation where "radosgw-admin period get" shows the latest zonegroup information but "radosgw-admin zonegroup get" shows the old one? I am now seeing this with v17.2.6. I used to play with v17.2.5 but never saw this situation. I tried to do various "period update --commit" and "period pull" to see if I could get the update happen. No avail. Any ideas? Thanks,Yixin

10 months, 3 weeks

1
0
0 0

cephadm, new OSD

by Shashi Dahal

Hi, I added new OSD on ceph servers. ( orch is cephadm) Its recognized as osd.12 and osd.13 ceph pg dump shows no pg are there in osd 12 and 13 .. they are all empty .. ceph osd tree shows them that they are up. ceph osd df shows them to be all 0 in reweight and size etc. ceph orch device ls shows the status as "Insufficient space (<10 extents) on vgs, LVM detected, locked" However, this is the same status shown in all the existing osds ( working ones also ) I tried ceph osd crush reweight osd.12 1 reweighted item id 12 name 'osd.12' to 1 in crush map but the reweight is still 0 . and these disks are not in the cluster yet. How to add these disks so that they are part of the cluster ? Thanks

10 months, 3 weeks

2
1
0 0

Applying crush rule to existing live pool

by Rok Jaklič

Hi, I want to place an existing pool with data to ssd-s. I've created crush rule: ceph osd crush rule create-replicated replicated_ssd default host ssd If I apply this rule to the existing pool default.rgw.buckets.index with 180G of data with command: ceph osd pool set default.rgw.buckets.index crush_rule replicated_ssd Will the rgw/cluster be available in the meantime while the cluster moves data? Kind regards, Rok

10 months, 3 weeks

2
1
0 0

Re: ceph-users Digest, Vol 108, Issue 88

by hui chen

I want to unsubscribe the general mailing list for Ceph users. Sincerely Yours, ------------------------------------------ Ivan E_: chenhui0228(a)gmail.com A_: Wuhan, Hubei, China ------------------------------------------ <ceph-users-request(a)ceph.io> 于2023年6月28日周三 01:54写道： > Send ceph-users mailing list submissions to > ceph-users(a)ceph.io > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > ceph-users-request(a)ceph.io > > You can reach the person managing the list at > ceph-users-owner(a)ceph.io > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of ceph-users digest..." > > Today's Topics: > > 1. Re: ceph orch host label rm : does not update label removal > (Adiga, Anantha) > 2. Re: RBD with PWL cache shows poor performance compared to cache > device > (Josh Baergen) > 3. Re: RBD with PWL cache shows poor performance compared to cache > device > (Matthew Booth) > > > ---------------------------------------------------------------------- > > Date: Tue, 27 Jun 2023 16:04:05 +0000 > From: "Adiga, Anantha" <anantha.adiga(a)intel.com> > Subject: [ceph-users] Re: ceph orch host label rm : does not update > label removal > To: "Adiga, Anantha" <anantha.adiga(a)intel.com>, "ceph-users(a)ceph.io" > <ceph-users(a)ceph.io> > Message-ID: <CY5PR11MB62118438F851BE59AC91F85BF627A(a)CY5PR11MB6211.namp > rd11.prod.outlook.com> > Content-Type: text/plain; charset="us-ascii" > > Hello, > > This issue is resolved. > > The syntax of providing the labels was not correct. > > -----Original Message----- > From: Adiga, Anantha <anantha.adiga(a)intel.com> > Sent: Thursday, June 22, 2023 1:08 PM > To: ceph-users(a)ceph.io > Subject: [ceph-users] ceph orch host label rm : does not update label > removal > > Hi , > > Not sure if the lables are really removed or the update is not working? > This was taken as a single label: mgrs,ceph osd,rgws.ceph > > > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > > root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 mgrs,ceph > osd,rgws.ceph > Removed label mgrs,ceph osd,rgws.ceph from host fl31ca104ja0302 > > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin > 4 hosts in cluster > > Thank you, > Anantha > > > > > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > root@fl31ca104ja0201:/# > root@fl31ca104ja0201:/# > root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph > Removed label rgws.ceph from host fl31ca104ja0302 root@fl31ca104ja0201:/# > ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph > --force Removed label rgws.ceph from host fl31ca104ja0302 > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > > Regards, > Anantha > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > > ------------------------------ > > Date: Tue, 27 Jun 2023 11:20:01 -0600 > From: Josh Baergen <jbaergen(a)digitalocean.com> > Subject: [ceph-users] Re: RBD with PWL cache shows poor performance > compared to cache device > To: Matthew Booth <mbooth(a)redhat.com> > Cc: ceph-users(a)ceph.io > Message-ID: > < > CA+5zLQ+R4Yq+4F4GwTZKqz4VYj5sN7gniSf3X5wnsk06F_UFtQ(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Hi Matthew, > > We've done a limited amount of work on characterizing the pwl and I think > it suffers the classic problem of some writeback caches in that, once the > cache is saturated, it's actually worse than just being in writethrough. > IIRC the pwl does try to preserve write ordering (unlike the other > writeback/writearound modes) which limits it in the concurrency it can > issue to the backend, which means that even an iodepth=1 test can saturate > the pwl, assuming the backend latency is higher than the pwl latency. > > I _think_ that if you were able to devise a burst test with bursts smaller > than the pwl capacity and gaps in between large enough for the cache to > flush, or if you were to ratelimit I/Os to the pwl, that you should see > closer to the lower latencies that you would expect. > > Josh > > On Tue, Jun 27, 2023 at 9:04 AM Matthew Booth <mbooth(a)redhat.com> wrote: > > > ** TL;DR > > > > In testing, the write latency performance of a PWL-cache backed RBD > > disk was 2 orders of magnitude worse than the disk holding the PWL > > cache. > > > > ** Summary > > > > I was hoping that PWL cache might be a good solution to the problem of > > write latency requirements of etcd when running a kubernetes control > > plane on ceph. Etcd is extremely write latency sensitive and becomes > > unstable if write latency is too high. The etcd workload can be > > characterised by very small (~4k) writes with a queue depth of 1. > > Throughput, even on a busy system, is normally very low. As etcd is > > distributed and can safely handle the loss of un-flushed data from a > > single node, a local ssd PWL cache for etcd looked like an ideal > > solution. > > > > My expectation was that adding a PWL cache on a local SSD to an > > RBD-backed would improve write latency to something approaching the > > write latency performance of the local SSD. However, in my testing > > adding a PWL cache to an rbd-backed VM increased write latency by > > approximately 4x over not using a PWL cache. This was over 100x more > > than the write latency performance of the underlying SSD. > > > > My expectation was based on the documentation here: > > https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/ > > > > “The cache provides two different persistence modes. In > > persistent-on-write mode, the writes are completed only when they are > > persisted to the cache device and will be readable after a crash. In > > persistent-on-flush mode, the writes are completed as soon as it no > > longer needs the caller’s data buffer to complete the writes, but does > > not guarantee that writes will be readable after a crash. The data is > > persisted to the cache device when a flush request is received.” > > > > ** Method > > > > 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other > > running libvirt and mounting a VM’s disk with librbd (also 17.2.6) > > from the first node. > > > > All performance testing is from the libvirt system. I tested write > > latency performance: > > > > * Inside the VM without a PWL cache > > * Of the PWL device directly from the host (direct to filesystem, no VM) > > * Inside the VM with a PWL cache > > > > I am testing with fio. Specifically I am running a containerised test, > > executed with: > > podman run --volume .:/var/lib/etcd:Z > quay.io/openshift-scale/etcd-perf > > > > This container runs: > > fio --rw=write --ioengine=sync --fdatasync=1 > > --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf > > --output-format=json --runtime=60 --time_based=1 > > > > And extracts sync.lat_ns.percentile["99.000000"] > > > > ** Results > > > > All results were stable across multiple runs within a small margin of > > error. > > > > * rbd no cache: 1417216 ns > > * pwl cache device: 44288 ns > > * rbd with pwl cache: 5210112 ns > > > > Note that by adding a PWL cache we increase write latency by > > approximately 4x, which is more than 100x than the underlying device. > > > > ** Hardware > > > > 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM > > Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC > > H730P Mini (Embedded) > > > > OS installed on rotational disks > > > > N.B. Linux incorrectly detects these disks as rotational, which I > > assume relates to weird behaviour by the PERC controller. I remembered > > to manually correct this on the ‘client’ machine for the PWL cache, > > but at OSD configuration time ceph would have detected them as > > rotational. They are not rotational. > > > > ** Ceph Configuration > > > > CentOS Stream 9 > > > > # ceph version > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > (stable) > > > > Single node installation with cephadm. 2 OSDs, one on each SSD. > > 1 pool with size 2 > > > > ** Client Configuration > > > > Fedora 38 > > Librbd1-17.2.6-3.fc38.x86_64 > > > > PWL cache is XFS filesystem with 4k block size, matching the > > underlying device. The filesystem uses the whole block device. There > > is no other load on the system. > > > > ** RBD Configuration > > > > # rbd config image list libvirt-pool/pwl-test | grep cache > > rbd_cache true > > config > > rbd_cache_block_writes_upfront false > > config > > rbd_cache_max_dirty 25165824 > > config > > rbd_cache_max_dirty_age 1.000000 > > config > > rbd_cache_max_dirty_object 0 > > config > > rbd_cache_policy writeback > > pool > > rbd_cache_size 33554432 > > config > > rbd_cache_target_dirty 16777216 > > config > > rbd_cache_writethrough_until_flush true > > pool > > rbd_parent_cache_enabled false > > config > > rbd_persistent_cache_mode ssd > > pool > > rbd_persistent_cache_path /var/lib/libvirt/images/pwl > > pool > > rbd_persistent_cache_size 1073741824 > > config > > rbd_plugins pwl_cache > > pool > > > > # rbd status libvirt-pool/pwl-test > > Watchers: > > watcher=10.1.240.27:0/1406459716 client.14475 > > cookie=140282423200720 > > Persistent cache state: > > host: dell-r640-050 > > path: > > /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool > > size: 1 GiB > > mode: ssd > > stats_timestamp: Mon Jun 26 11:29:21 2023 > > present: true empty: false clean: true > > allocated: 180 MiB > > cached: 135 MiB > > dirty: 0 B > > free: 844 MiB > > hits_full: 1 / 0% > > hits_partial: 3 / 0% > > misses: 21952 > > hit_bytes: 6 KiB / 0% > > miss_bytes: 349 MiB > > -- > > Matthew Booth > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > ------------------------------ > > Date: Tue, 27 Jun 2023 18:50:07 +0100 > From: Matthew Booth <mbooth(a)redhat.com> > Subject: [ceph-users] Re: RBD with PWL cache shows poor performance > compared to cache device > To: Josh Baergen <jbaergen(a)digitalocean.com> > Cc: ceph-users(a)ceph.io > Message-ID: > < > CAEkQehcGq-88heq3UN8tsWsTyOfTcm1122Ffxy7PEhXF7Hj1mA(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > On Tue, 27 Jun 2023 at 18:20, Josh Baergen <jbaergen(a)digitalocean.com> > wrote: > > > > Hi Matthew, > > > > We've done a limited amount of work on characterizing the pwl and I > think it suffers the classic problem of some writeback caches in that, once > the cache is saturated, it's actually worse than just being in > writethrough. IIRC the pwl does try to preserve write ordering (unlike the > other writeback/writearound modes) which limits it in the concurrency it > can issue to the backend, which means that even an iodepth=1 test can > saturate the pwl, assuming the backend latency is higher than the pwl > latency. > > What do you mean by saturated here? FWIW I was using the default cache > size of 1G and each test run only wrote ~100MB of data, so I don't > think I ever filled the cache, even with multiple runs. > > > I _think_ that if you were able to devise a burst test with bursts > smaller than the pwl capacity and gaps in between large enough for the > cache to flush, or if you were to ratelimit I/Os to the pwl, that you > should see closer to the lower latencies that you would expect. > > My goal is to characterise the requirements of etcd. Unfortunately I > don't think changing the test would do that. Incidentally, note that > the total bandwidth of an extremely busy etcd is usually very low. > >From memory, the etcd write rate for a system we were debugging whose > etcd was occasionally falling over due to load was only about 5MiB/s. > It's all about write latency of really small writes, not bandwidth. > > Matt > > > > > Josh > > > > On Tue, Jun 27, 2023 at 9:04 AM Matthew Booth <mbooth(a)redhat.com> wrote: > >> > >> ** TL;DR > >> > >> In testing, the write latency performance of a PWL-cache backed RBD > >> disk was 2 orders of magnitude worse than the disk holding the PWL > >> cache. > >> > >> ** Summary > >> > >> I was hoping that PWL cache might be a good solution to the problem of > >> write latency requirements of etcd when running a kubernetes control > >> plane on ceph. Etcd is extremely write latency sensitive and becomes > >> unstable if write latency is too high. The etcd workload can be > >> characterised by very small (~4k) writes with a queue depth of 1. > >> Throughput, even on a busy system, is normally very low. As etcd is > >> distributed and can safely handle the loss of un-flushed data from a > >> single node, a local ssd PWL cache for etcd looked like an ideal > >> solution. > >> > >> My expectation was that adding a PWL cache on a local SSD to an > >> RBD-backed would improve write latency to something approaching the > >> write latency performance of the local SSD. However, in my testing > >> adding a PWL cache to an rbd-backed VM increased write latency by > >> approximately 4x over not using a PWL cache. This was over 100x more > >> than the write latency performance of the underlying SSD. > >> > >> My expectation was based on the documentation here: > >> https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/ > >> > >> “The cache provides two different persistence modes. In > >> persistent-on-write mode, the writes are completed only when they are > >> persisted to the cache device and will be readable after a crash. In > >> persistent-on-flush mode, the writes are completed as soon as it no > >> longer needs the caller’s data buffer to complete the writes, but does > >> not guarantee that writes will be readable after a crash. The data is > >> persisted to the cache device when a flush request is received.” > >> > >> ** Method > >> > >> 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other > >> running libvirt and mounting a VM’s disk with librbd (also 17.2.6) > >> from the first node. > >> > >> All performance testing is from the libvirt system. I tested write > >> latency performance: > >> > >> * Inside the VM without a PWL cache > >> * Of the PWL device directly from the host (direct to filesystem, no VM) > >> * Inside the VM with a PWL cache > >> > >> I am testing with fio. Specifically I am running a containerised test, > >> executed with: > >> podman run --volume .:/var/lib/etcd:Z > quay.io/openshift-scale/etcd-perf > >> > >> This container runs: > >> fio --rw=write --ioengine=sync --fdatasync=1 > >> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf > >> --output-format=json --runtime=60 --time_based=1 > >> > >> And extracts sync.lat_ns.percentile["99.000000"] > >> > >> ** Results > >> > >> All results were stable across multiple runs within a small margin of > error. > >> > >> * rbd no cache: 1417216 ns > >> * pwl cache device: 44288 ns > >> * rbd with pwl cache: 5210112 ns > >> > >> Note that by adding a PWL cache we increase write latency by > >> approximately 4x, which is more than 100x than the underlying device. > >> > >> ** Hardware > >> > >> 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM > >> Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC > >> H730P Mini (Embedded) > >> > >> OS installed on rotational disks > >> > >> N.B. Linux incorrectly detects these disks as rotational, which I > >> assume relates to weird behaviour by the PERC controller. I remembered > >> to manually correct this on the ‘client’ machine for the PWL cache, > >> but at OSD configuration time ceph would have detected them as > >> rotational. They are not rotational. > >> > >> ** Ceph Configuration > >> > >> CentOS Stream 9 > >> > >> # ceph version > >> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable) > >> > >> Single node installation with cephadm. 2 OSDs, one on each SSD. > >> 1 pool with size 2 > >> > >> ** Client Configuration > >> > >> Fedora 38 > >> Librbd1-17.2.6-3.fc38.x86_64 > >> > >> PWL cache is XFS filesystem with 4k block size, matching the > >> underlying device. The filesystem uses the whole block device. There > >> is no other load on the system. > >> > >> ** RBD Configuration > >> > >> # rbd config image list libvirt-pool/pwl-test | grep cache > >> rbd_cache true > config > >> rbd_cache_block_writes_upfront false > config > >> rbd_cache_max_dirty 25165824 > config > >> rbd_cache_max_dirty_age 1.000000 > config > >> rbd_cache_max_dirty_object 0 > config > >> rbd_cache_policy writeback > pool > >> rbd_cache_size 33554432 > config > >> rbd_cache_target_dirty 16777216 > config > >> rbd_cache_writethrough_until_flush true > pool > >> rbd_parent_cache_enabled false > config > >> rbd_persistent_cache_mode ssd > pool > >> rbd_persistent_cache_path > /var/lib/libvirt/images/pwl pool > >> rbd_persistent_cache_size 1073741824 > config > >> rbd_plugins pwl_cache > pool > >> > >> # rbd status libvirt-pool/pwl-test > >> Watchers: > >> watcher=10.1.240.27:0/1406459716 client.14475 > cookie=140282423200720 > >> Persistent cache state: > >> host: dell-r640-050 > >> path: > /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool > >> size: 1 GiB > >> mode: ssd > >> stats_timestamp: Mon Jun 26 11:29:21 2023 > >> present: true empty: false clean: true > >> allocated: 180 MiB > >> cached: 135 MiB > >> dirty: 0 B > >> free: 844 MiB > >> hits_full: 1 / 0% > >> hits_partial: 3 / 0% > >> misses: 21952 > >> hit_bytes: 6 KiB / 0% > >> miss_bytes: 349 MiB > >> -- > >> Matthew Booth > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > -- > Matthew Booth > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > > ------------------------------ > > End of ceph-users Digest, Vol 108, Issue 88 > ******************************************* >

10 months, 3 weeks

1
0
0 0

RBD with PWL cache shows poor performance compared to cache device

by Matthew Booth

** TL;DR In testing, the write latency performance of a PWL-cache backed RBD disk was 2 orders of magnitude worse than the disk holding the PWL cache. ** Summary I was hoping that PWL cache might be a good solution to the problem of write latency requirements of etcd when running a kubernetes control plane on ceph. Etcd is extremely write latency sensitive and becomes unstable if write latency is too high. The etcd workload can be characterised by very small (~4k) writes with a queue depth of 1. Throughput, even on a busy system, is normally very low. As etcd is distributed and can safely handle the loss of un-flushed data from a single node, a local ssd PWL cache for etcd looked like an ideal solution. My expectation was that adding a PWL cache on a local SSD to an RBD-backed would improve write latency to something approaching the write latency performance of the local SSD. However, in my testing adding a PWL cache to an rbd-backed VM increased write latency by approximately 4x over not using a PWL cache. This was over 100x more than the write latency performance of the underlying SSD. My expectation was based on the documentation here: https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/ “The cache provides two different persistence modes. In persistent-on-write mode, the writes are completed only when they are persisted to the cache device and will be readable after a crash. In persistent-on-flush mode, the writes are completed as soon as it no longer needs the caller’s data buffer to complete the writes, but does not guarantee that writes will be readable after a crash. The data is persisted to the cache device when a flush request is received.” ** Method 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other running libvirt and mounting a VM’s disk with librbd (also 17.2.6) from the first node. All performance testing is from the libvirt system. I tested write latency performance: * Inside the VM without a PWL cache * Of the PWL device directly from the host (direct to filesystem, no VM) * Inside the VM with a PWL cache I am testing with fio. Specifically I am running a containerised test, executed with: podman run --volume .:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf This container runs: fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf --output-format=json --runtime=60 --time_based=1 And extracts sync.lat_ns.percentile["99.000000"] ** Results All results were stable across multiple runs within a small margin of error. * rbd no cache: 1417216 ns * pwl cache device: 44288 ns * rbd with pwl cache: 5210112 ns Note that by adding a PWL cache we increase write latency by approximately 4x, which is more than 100x than the underlying device. ** Hardware 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC H730P Mini (Embedded) OS installed on rotational disks N.B. Linux incorrectly detects these disks as rotational, which I assume relates to weird behaviour by the PERC controller. I remembered to manually correct this on the ‘client’ machine for the PWL cache, but at OSD configuration time ceph would have detected them as rotational. They are not rotational. ** Ceph Configuration CentOS Stream 9 # ceph version ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Single node installation with cephadm. 2 OSDs, one on each SSD. 1 pool with size 2 ** Client Configuration Fedora 38 Librbd1-17.2.6-3.fc38.x86_64 PWL cache is XFS filesystem with 4k block size, matching the underlying device. The filesystem uses the whole block device. There is no other load on the system. ** RBD Configuration # rbd config image list libvirt-pool/pwl-test | grep cache rbd_cache true config rbd_cache_block_writes_upfront false config rbd_cache_max_dirty 25165824 config rbd_cache_max_dirty_age 1.000000 config rbd_cache_max_dirty_object 0 config rbd_cache_policy writeback pool rbd_cache_size 33554432 config rbd_cache_target_dirty 16777216 config rbd_cache_writethrough_until_flush true pool rbd_parent_cache_enabled false config rbd_persistent_cache_mode ssd pool rbd_persistent_cache_path /var/lib/libvirt/images/pwl pool rbd_persistent_cache_size 1073741824 config rbd_plugins pwl_cache pool # rbd status libvirt-pool/pwl-test Watchers: watcher=10.1.240.27:0/1406459716 client.14475 cookie=140282423200720 Persistent cache state: host: dell-r640-050 path: /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool size: 1 GiB mode: ssd stats_timestamp: Mon Jun 26 11:29:21 2023 present: true empty: false clean: true allocated: 180 MiB cached: 135 MiB dirty: 0 B free: 844 MiB hits_full: 1 / 0% hits_partial: 3 / 0% misses: 21952 hit_bytes: 6 KiB / 0% miss_bytes: 349 MiB -- Matthew Booth

10 months, 3 weeks

2
3
0 0

Fix for incorrect available space with stretched cluster

by ceph＠paulusma.eu

Hi all! I was wondering if some could checkout https://tracker.ceph.com/issues/56650 about the incorrect calculation of the cluster size? It's a waist of storage if I can't use more than 50% and it looks like someone got it fixed with the mentioned Pull Request. Best regards, Sake

10 months, 3 weeks

1
0
0 0

ceph orch host label rm : does not update label removal

by Adiga, Anantha

Hi , Not sure if the lables are really removed or the update is not working? root@fl31ca104ja0201:/# ceph orch host ls HOST ADDR LABELS STATUS fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons osds rgws fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph 4 hosts in cluster root@fl31ca104ja0201:/# root@fl31ca104ja0201:/# root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph Removed label rgws.ceph from host fl31ca104ja0302 root@fl31ca104ja0201:/# ceph orch host ls HOST ADDR LABELS STATUS fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons osds rgws fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph 4 hosts in cluster root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph --force Removed label rgws.ceph from host fl31ca104ja0302 root@fl31ca104ja0201:/# ceph orch host ls HOST ADDR LABELS STATUS fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons osds rgws fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph 4 hosts in cluster Regards, Anantha

10 months, 3 weeks

1
1
0 0

Bluestore compression - Which algo to choose? Zstd really still that bad?

by Christian Rohmann

Hey ceph-users, we've been using the default "snappy" to have Ceph compress data on certain pools - namely backups / copies of volumes of a VM environment. So it's write once, and no random access. I am now wondering if switching to another algo (there is snappy, zlib, lz4, or zstd) would improve the compression ratio (significantly)? * Does anybody have any real world data on snappy vs. $anyother? Using zstd is tempting as it's used in various other applications (btrfs, MongoDB, ...) for inline-compression with great success. For Ceph though there is a warning ([1]), about it being not recommended in the docs still. But I am wondering if this still stands with e.g. [2] merged. And there was [3] trying to improve the performance, this this reads as it only lead to a dead-end and no code changes? In any case does anybody have any numbers to help with the decision on the compression algo? Regards Christian [1] https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#c… [2] https://github.com/ceph/ceph/pull/33790 [3] https://github.com/facebook/zstd/issues/910

10 months, 3 weeks

3
3
0 0

Radogw ignoring HTTP_X_FORWARDED_FOR header

by Yosr Kchaou

Hello, We are working on setting up an nginx sidecar container running along a RadosGW container inside the same kubernetes pod. We are facing an issue with getting the right value for the header HTTP_X_FORWARDED_FOR when getting client requests. We need this value to do the source ip check validation. Currently, RGW sees that all requests come from 127.0.0.1. So it is still considering the nginx ip address and not the client who made the request. Even though, we configured our RGW the following rgw_remote_addr_param = HTTP_X_FORWARDED_FOR And from the nginx side, we have the following configuration to set the headers. location / { proxy_pass http://localhost:7480; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-Port $server_port; Any idea what is the issue here ? Thank you in advance, Yoke

10 months, 3 weeks

3
3
0 0

Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

by Jorge JP

Hello, After deep-scrub my cluster shown this error: HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 2/77158878 objects degraded (0.000%), 1 pg degraded [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) pg 32.15c has 1 unfound objects [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting [49,47], 1 unfound [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded (0.000%), 1 pg degraded pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting [49,47], 1 unfound I searching in internet how it solves, but I'm confusing.. Anyone can help me? Thank you! (Sorry for my english)

10 months, 4 weeks

3
6
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2023