April 2023 - ceph-users - lists.ceph.io

Dead node (watcher) won't timeout on RBD

by max＠netpunt.nl

Hey all, I recently had a k8s node failure in my homelab, and even though I powered it off (and it's done for, so it won't get back up), it still shows up as watcher in rbd status. ``` root@node0:~# rbd status kubernetes/csi-vol-3e7af8ae-ceb6-4c94-8435-2f8dc29b313b Watchers: watcher=10.0.0.103:0/1520114202 client.1697844 cookie=140289402510784 watcher=10.0.0.103:0/39967552 client.1805496 cookie=140549449430704 root@node0:~# ceph osd blocklist ls 10.0.0.103:0/0 2023-04-15T13:15:39.061379+0200 listed 1 entries ``` Even though the node is down & I have blocked it multiple times for hours, it won't disappear. Meaning, ceph-csi-rbd claims the image is mounted already (manually binding works fine, and can cleanly unbind as well, but can't unbind from a node that doesn't exist anymore). Is there any possibility to force kick an rbd client / watcher from ceph (e.g. switching the mgr / mon) or to see why this is not timing out? I found some historical mails & issues (related to rook, which I don't use) regarding a param `osd_client_watch_timeout` but can't find how that relates to the RBD images. Cheers, Max.

1 year, 1 month

2
1
0 0

Ovirt integration with Ceph

by kushagra.gupta＠hsc.com

Hi Team, We are trying to integrate ceph with ovirt. We have deployed ovirt 4.4. We want to create a storage domain of POSIX compliant type for mounting a ceph based infrastructure in ovirt. We have done SRV based resolution in our DNS server for ceph mon nodes but we are unable to create a storage domain using that. We are able to manually mount the ceph-mon nodes using the following command on the deployment hosts: sudo mount -t ceph :/volumes/xyz/conf/00593e1d-b674-4b00-a289-20bec06761c9 /rhev/data-center/mnt/:_volumes_xyz_conf_00593e1d-b674-4b00-a289-20bec06761c9 -o rw,name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwULnx6qX/VDA== [root@deployment-host mnt]# df -kh df: /run/user/0/gvfs: Transport endpoint is not connected Filesystem Size Used Avail Use% Mounted on [abcd:abcd:abcd::51]:6789,[abcd:abcd:abcd::52]:6789,[abcd:abcd:abcd::53]:6789:/volumes/xyz/conf/00593e1d-b674-4b00-a289-20bec06761c9 19G 0 19G 0% /rhev/data-center/mnt/:_volumes_xyz_conf_00593e1d-b674-4b00-a289-20bec06761c9 Query: 1. Could anyone help us out with storage domain creation in ovirt for SRV resolved ceph-mon nodes.

1 year, 1 month

2
1
0 0

Could you please explain the PG concept

by wodel youchi

Hi, I am learning Ceph and I am having a hard time understanding PG and PG calculus . I know that a PG is a collection of objects, and that PG are replicated over the hosts to respect the replication size, but... In traditional storage, we use size in Gb, Tb and so on, we create a pool from a bunch of disks or raid arrays of some size then we create volumes of a certain size and use them. If the storage is full we add disks, then we extend our pools/volumes. The idea of size is simple to understand. Ceph, although it supports the notion of pool size in Gb, Tb ...etc. Pools are created using PGs, and now there is also the notion of % of data. When I use pg calc from ceph or from redhat, the generated yml file contains the % variable, but the commands file contains only the PGs, and when you are configuring 15% and 18% have the same number of PGs !!!!!!!!!!!!??? The pg calc encourages you to create a %data multiple of 100, in other words, it assumes that you know all your pools from the start. What if you won't consume all your raw disk space. What happens when you need to add a new pool? Also when you create several pools, and then execute ceph osd df tree, you can see that all pools show the raw size as a free space, it is like all pools share the same raw space regardless of their PG number. If someone can put some light on this concept and how to manage it wisely, because the documentation keeps saying that it's an important concept, that you have to pay attention when choosing the number of PGs for a pool from the start. Regards. <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> Virus-free.www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

1 year, 1 month

3
2
0 0

Reset a bucket in a zone

by Yixin Jin

Hi folks, Within a zonegroup, once a bucket is created, its metadata is sync-ed over to all zones. With bucket-level sync policy, however, its content may or may not be sync-ed over. To simplify the sync process, sometime I'd like to pick the bucket in a zone as the absolute truth and sync its content over to the bucket in another zone, which may have done some local deletions since the last time they were sync-ed. I don't want those local deletions to mess up with the planned sync. Is it possible to reset the bucket in this zone so it is in a "pristine" state and will receive everything from the source? Thanks,Yixin

1 year, 1 month

1
0
0 0

Rados gateway lua script-package error lib64

by Thomas Bennett

Hi, I've noticed that when my lua script runs I get the following error on my radosgw container. It looks like the lib64 directory is not included in the path when looking for shared libraries. Copying the content of lib64 into the lib directory solves the issue on the running container. Here are more details: Apr 25 20:26:59 xxx-ceph-xxxx radosgw[60901]: req 2268223694354647302 0.000000000s Lua ERROR: /tmp/luarocks/client.rgw.xxxxxx.xxx-xxxx-xxxx.pcoulb/*share*/lua/5.3/socket.lua:12: module 'socket.core' not found: no field package.preload['socket.core'] no file '/tmp/luarocks/client.rgw.xxxxxx.xxx-xxxx-xxxx.pcoulb/*share* /lua/5.3/socket/core.lua' no file '/tmp/luarocks/client.rgw.xxxxxx.xxx-xxxx-xxxx.pcoulb/*lib* /lua/5.3/socket/core.so' no file '/tmp/luarocks/client.rgw.xxxxxx.xxx-xxxx-xxxx.pcoulb/*lib* /lua/5.3/socket.so' As mentioned the following command on the running radosgw container solves the issue for the running container: cp -ru /tmp/luarocks/client.rgw.xxxxxx.xxx-xxxx-xxxx.pcoulb/lib64/* /tmp/luarocks/client.rgw.xxxxxx.xxx-xxxx-xxxx.pcoulb/lib/ Cheers, Tom

1 year, 1 month

1
0
0 0

cephadm grafana per host certificate

by Eugen Block

Hi *, I've set up grafana, prometheus and node-exporter on an adopted cluster (currently running 16.2.10) and was trying to enable ssl for grafana. As stated in the docs [1] there's a way to configure individual certs and keys per host: ceph config-key set mgr/cephadm/{hostname}/grafana_key -i $PWD/key.pem ceph config-key set mgr/cephadm/{hostname}/grafana_crt -i $PWD/certificate.pem So I did that, then ran 'ceph orch reconfig grafana' but I still get a bad cert error message: Apr 20 10:21:19 ceph01 conmon[3772491]: server.go:3160: http: TLS handshake error from <IP>:46084: remote error: tls: bad certificate It seems like the cephadm generated cert/key pair (mgr/cephadm/grafana_key; mgr/cephadm/grafana_crt) supersedes the per-host certs, and even after removing the generated cert/key (and then reconfigure) cephadm regenerates a them and leaves me with the same problem. Is this a known issue and what would be the fix? I didn't find anything on tracker, but I might have missed it. To confirm that my custom certs actually work I replaced the general cert with my custom cert and the error doesn't appear, I can see the grafana graphs in the dashboard. I could leave it like this, but if grafana would failover it wouldn't work anymore, of course. Any hints are greatly appreciated. Thanks, Eugen [1] https://docs.ceph.com/en/latest/cephadm/services/monitoring/#configuring-ss…

1 year, 1 month

2
4
0 0

Move ceph to new addresses and hostnames

by Jan Marek

Hi there, I have ceph cluster created by ceph-volume - bluestore, in every node is 12 HDD and 1 NVMe, which is divided to 24 LVM partition for DB and WAL. I've turned this cluster to 'ceph orch' management, then I've moved to quincy release (now I'm using a 17.2.5 version). I had to move whole cluster to another addreses and another hostnames. MON, MGR and MDS goes without problem, but OSD was really pain process :-( Now I have cluster with this problem: # ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mds.cephfs.mon1.ulytsa mon1 running (11w) 3m ago 11w 5609M - 17.2.5 cc65afd6173a db1aa336263a mds.cephfs.mon2.zxhxqk mon2 running (11w) 3m ago 11w 33.1M - 17.2.5 cc65afd6173a 5b9ced4a4b71 mds.cephfs.mon3.rpkvlt mon3 running (11w) 3m ago 11w 32.4M - 17.2.5 cc65afd6173a 045e23f124aa mgr.mon1.buqyga mon1 *:8080 running (11w) 3m ago 11w 2300M - 17.2.5 cc65afd6173a 9577239527b5 mgr.mon2.goghws mon2 *:8080 running (11w) 3m ago 11w 495M - 17.2.5 cc65afd6173a 4fb1ae26765e mgr.mon3.slpgay mon3 *:8080 running (11w) 3m ago 11w 495M - 17.2.5 cc65afd6173a 06e491084a5e mon.mon1 mon1 running (11w) 3m ago 11w 1576M 2048M 17.2.5 cc65afd6173a 2f18c737faa9 mon.mon2 mon2 running (11w) 3m ago 11w 1598M 2048M 17.2.5 cc65afd6173a 31091cbbfb8e mon.mon3 mon3 running (11w) 3m ago 11w 1463M 2048M 17.2.5 cc65afd6173a 4d0b094c9ca1 osd.0 osd1 running (9w) 3m ago 10w 5133M 3745M 17.2.5 cc65afd6173a 3b28e48d3630 osd.1 osd1 running (7w) 3m ago 10w 5425M 3745M 17.2.5 cc65afd6173a 3336ccdfd232 osd.2 osd1 running (9w) 3m ago 10w 5223M 3745M 17.2.5 cc65afd6173a e8fc077aef59 osd.3 osd1 running (9w) 3m ago 10w 5050M 3745M 17.2.5 cc65afd6173a 4fbf34450237 osd.4 osd1 running (9w) 3m ago 10w 7526M 3745M 17.2.5 cc65afd6173a a4875c354540 osd.5 osd1 running (9w) 3m ago 10w 4854M 3745M 17.2.5 cc65afd6173a b006526228ae osd.6 osd1 running (9w) 3m ago 10w 6498M 3745M 17.2.5 cc65afd6173a 4c326271e188 osd.7 osd1 running (9w) 3m ago 10w 4410M 3745M 17.2.5 cc65afd6173a ca0f3ce31031 osd.8 osd1 running (9w) 3m ago 10w 7337M 3745M 17.2.5 cc65afd6173a 99269a832819 osd.9 osd1 running (9w) 3m ago 10w 4717M 3745M 17.2.5 cc65afd6173a f39ce0bb5316 osd.10 osd1 running (9w) 3m ago 10w 4295M 3745M 17.2.5 cc65afd6173a 0871793fa261 osd.11 osd1 running (9w) 3m ago 10w 5552M 3745M 17.2.5 cc65afd6173a 32a8b589b3bd osd.24 osd3 running (109m) 3m ago 6M 3306M 3745M 17.2.5 cc65afd6173a 466d80a55d96 osd.25 osd3 running (109m) 3m ago 6M 3145M 3745M 17.2.5 cc65afd6173a b1705621116a osd.26 osd3 running (109m) 3m ago 6M 3063M 3745M 17.2.5 cc65afd6173a c30253a1a83f osd.27 osd3 running (109m) 3m ago 6M 3257M 3745M 17.2.5 cc65afd6173a aa0a647d93f1 osd.28 osd3 running (109m) 3m ago 6M 2244M 3745M 17.2.5 cc65afd6173a d3c68ed6572b osd.29 osd3 running (109m) 3m ago 6M 3509M 3745M 17.2.5 cc65afd6173a 2c425b17abf7 osd.30 osd3 running (109m) 3m ago 6M 3814M 3745M 17.2.5 cc65afd6173a 44747256b34a osd.31 osd3 running (109m) 3m ago 6M 2958M 3745M 17.2.5 cc65afd6173a b7b7946fa24e osd.32 osd3 running (109m) 3m ago 6M 3016M 3745M 17.2.5 cc65afd6173a fc9c024fed4f osd.33 osd3 running (109m) 3m ago 6M 5366M 3745M 17.2.5 cc65afd6173a edc2dbd9c556 osd.34 osd3 running (109m) 3m ago 6M 4577M 3745M 17.2.5 cc65afd6173a 46d7668742cf osd.35 osd3 running (109m) 3m ago 6M 2538M 3745M 17.2.5 cc65afd6173a 96a15a9ad3d7 osd.36 osd4 running (103m) 3m ago 8w 2707M 3745M 17.2.5 cc65afd6173a adf884af609b osd.37 osd4 running (103m) 3m ago 6M 3347M 3745M 17.2.5 cc65afd6173a 8f824026c6ae osd.38 osd4 running (103m) 3m ago 6M 3377M 3745M 17.2.5 cc65afd6173a 2a70c0b860ff osd.39 osd4 running (103m) 3m ago 6M 2814M 3745M 17.2.5 cc65afd6173a 4d5833f1faaf osd.40 osd4 running (103m) 3m ago 6M 2633M 3745M 17.2.5 cc65afd6173a e2e492c4f4a6 osd.41 osd4 running (103m) 3m ago 6M 2141M 3745M 17.2.5 cc65afd6173a 80f67faf1238 osd.42 osd4 running (103m) 3m ago 6M 1527M 3745M 17.2.5 cc65afd6173a c1aef5891ad5 osd.43 osd4 running (103m) 3m ago 6M 1927M 3745M 17.2.5 cc65afd6173a b092b03f211d osd.44 osd4 running (103m) 3m ago 6M 3332M 3745M 17.2.5 cc65afd6173a 9309d93c80da osd.45 osd4 running (103m) 3m ago 6M 3172M 3745M 17.2.5 cc65afd6173a 8a47d7c155cd osd.46 osd4 running (103m) 3m ago 6M 2545M 3745M 17.2.5 cc65afd6173a 0550b66a837f osd.47 osd4 running (103m) 3m ago 6M 3010M 3745M 17.2.5 cc65afd6173a 52328564e503 osd.48 osd5 running (8w) 8m ago 8w 6054M 3745M 17.2.5 cc65afd6173a c5494c95566e osd.49 osd5 running (8w) 8m ago 8w 5648M 3745M 17.2.5 cc65afd6173a cebc3405950e osd.50 osd5 running (8w) 8m ago 8w 4898M 3745M 17.2.5 cc65afd6173a 44bd2f018476 osd.51 osd5 running (8w) 8m ago 8w 5066M 3745M 17.2.5 cc65afd6173a 008cb9d89e68 osd.52 osd5 running (8w) 8m ago 8w 6325M 3745M 17.2.5 cc65afd6173a db3d01bbfce7 osd.53 osd5 running (8w) 8m ago 8w 6040M 3745M 17.2.5 cc65afd6173a 215c70cf7ca3 osd.54 osd5 running (8w) 8m ago 8w 5147M 3745M 17.2.5 cc65afd6173a 92a4bc3eae20 osd.55 osd5 running (8w) 8m ago 8w 6590M 3745M 17.2.5 cc65afd6173a ce07fe507b47 osd.56 osd5 running (8w) 8m ago 8w 3838M 3745M 17.2.5 cc65afd6173a 540cf969419d osd.57 osd5 running (8w) 8m ago 8w 4820M 3745M 17.2.5 cc65afd6173a 89653e57e85c osd.58 osd5 running (8w) 8m ago 8w 5016M 3745M 17.2.5 cc65afd6173a c19d06b09533 osd.59 osd5 running (8w) 8m ago 8w 6547M 3745M 17.2.5 cc65afd6173a 8c387d3d06a6 osd.60 osd6 running (99m) 3m ago 6M 2050M 3745M 17.2.5 cc65afd6173a 90c1d1745480 osd.61 osd6 running (99m) 3m ago 6M 2694M 3745M 17.2.5 cc65afd6173a 32da4b4cec45 osd.62 osd6 running (99m) 3m ago 6M 2608M 3745M 17.2.5 cc65afd6173a 43788b932105 osd.63 osd6 running (99m) 3m ago 6M 4048M 3745M 17.2.5 cc65afd6173a 4a6863748bc5 osd.64 osd6 running (99m) 3m ago 6M 2542M 3745M 17.2.5 cc65afd6173a c49dd095aa64 osd.65 osd6 running (99m) 3m ago 6M 2796M 3745M 17.2.5 cc65afd6173a cdb2bed58cf3 osd.66 osd6 running (99m) 3m ago 6M 2129M 3745M 17.2.5 cc65afd6173a c17db58e047f osd.67 osd6 running (99m) 3m ago 6M 3559M 3745M 17.2.5 cc65afd6173a 46f2527c6112 osd.68 osd6 running (99m) 3m ago 6M 2552M 3745M 17.2.5 cc65afd6173a c4491c1b8de9 osd.69 osd6 running (99m) 3m ago 6M 2750M 3745M 17.2.5 cc65afd6173a 9e03c1d5351e osd.70 osd6 running (99m) 3m ago 6M 1849M 3745M 17.2.5 cc65afd6173a 3454e09aab70 osd.72 osd6 running (99m) 3m ago 8w 5368M 3745M 17.2.5 cc65afd6173a 0c1ca9a8677d # ceph orch host ls HOST ADDR LABELS STATUS iscsi1 192.168.6.166 iscsi2 192.168.6.167 mon1 192.168.7.208 mon2 192.168.7.209 mon3 192.168.7.210 osd1 192.168.7.214 osd2 192.168.7.215 osd3 192.168.7.216 osd4 192.168.7.217 osd5 192.168.7.218 osd6 192.168.7.219 11 hosts in cluster # ceph health detail HEALTH_WARN 12 stray daemon(s) not managed by cephadm; 3 stray host(s) with 35 daemon(s) not managed by cephadm [WRN] CEPHADM_STRAY_DAEMON: 12 stray daemon(s) not managed by cephadm stray daemon osd.12 on host osd2 not managed by cephadm stray daemon osd.13 on host osd2 not managed by cephadm stray daemon osd.14 on host osd2 not managed by cephadm stray daemon osd.15 on host osd2 not managed by cephadm stray daemon osd.16 on host osd2 not managed by cephadm stray daemon osd.17 on host osd2 not managed by cephadm stray daemon osd.18 on host osd2 not managed by cephadm stray daemon osd.19 on host osd2 not managed by cephadm stray daemon osd.20 on host osd2 not managed by cephadm stray daemon osd.21 on host osd2 not managed by cephadm stray daemon osd.22 on host osd2 not managed by cephadm stray daemon osd.23 on host osd2 not managed by cephadm [WRN] CEPHADM_STRAY_HOST: 3 stray host(s) with 35 daemon(s) not managed by cephadm stray host c-osd3 has 12 stray daemons: ['osd.24', 'osd.25', 'osd.26', 'osd.27', 'osd.28', 'osd.29', 'osd.30', 'osd.31', 'osd.32', 'osd.33', 'osd.34', 'osd.35'] stray host c-osd4 has 12 stray daemons: ['osd.36', 'osd.37', 'osd.38', 'osd.39', 'osd.40', 'osd.41', 'osd.42', 'osd.43', 'osd.44', 'osd.45', 'osd.46', 'osd.47'] stray host c-osd6 has 11 stray daemons: ['osd.60', 'osd.61', 'osd.62', 'osd.63', 'osd.64', 'osd.65', 'osd.66', 'osd.67', 'osd.68', 'osd.69', 'osd.70'] Comments: 1) c-osd3, c-osd4, c-osd6 are old names (with old addresses) which are not in cluster now, as you can see from 'ceph orch ps' listing. 2) On the osd1 machine I've tried create osd daemons bych 'ceph orch', but I cannot create with one NVMe for DB and WAL together, thus I have OSD without cached DB and WAL. 3) On the osd2, there is a OSD daemons created by ceph-volume 4) On the other hosts I've renamed host, moved to another IP address, 'ceph orch rm c-osdx', 'ceph orch add osdx', in LVM changed all c-osdx tags to osdx tags, now I've changed in /var/lib/ceph/uuid/osd.x/unit.run and unit.poststop every c-osdx to osdx. And now questions: Is there a way to definitelly remove c-osd3, c-osd4 and c-osd6 from 'ceph health detail' and manage this host via 'ceph orch'? On the osd2, there is a problem to adopt OSD created by ceph-volume to ceph orchestrator: cephadm adopt --style legacy -n osd.12 Pulling container image quay.io/ceph/ceph:v17... Found online OSD at //var/lib/ceph/osd/ceph-12/fsid objectstore_type is bluestore Stopping old systemd unit ceph-osd@12... Disabling old systemd unit ceph-osd@12... Moving data... Traceback (most recent call last): File "/usr/sbin/cephadm", line 9468, in <module> main() File "/usr/sbin/cephadm", line 9456, in main r = ctx.func(ctx) File "/usr/sbin/cephadm", line 2135, in _default_image return func(ctx) File "/usr/sbin/cephadm", line 6582, in command_adopt command_adopt_ceph(ctx, daemon_type, daemon_id, fsid) File "/usr/sbin/cephadm", line 6755, in command_adopt_ceph os.rmdir(data_dir_src) OSError: [Errno 39] Directory not empty: '//var/lib/ceph/osd/ceph-12' Thanks a lot. Sincerely Jan Marek -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html

1 year, 1 month

1
0
0 0

Re: Bucket sync policy

by Yixin Jin

Actually, "bucket sync run" somehow made it worse since now the destination zone shows "bucket is caught up with source" from "bucket sync status" even though it clearly missed an object. On Monday, April 24, 2023 at 02:37:46 p.m. EDT, Yixin Jin <yjin77(a)yahoo.ca> wrote: An update: After creating and enabling the bucket sync policy, I ran "bucket sync markers" and saw that each shard had the status of "init". The run "bucket sync run" in the end marked the status to be "incremental-sync", which seems to go through full-sync stage. However, the lone object in the source zone wasn't synced over to the destination zone. I actually used gdb to walk through radosgw-admin to run "bucket sync run". It seems not to do anything for full-sync and it printed a log saying "finished iterating over all available prefixes:...", which actually broke off the do-while loop after the call to prefix_handler.revalidate_marker(&list_marker). This call returned false because it couldn't find rules from the sync pipe. I haven't drilled deeper to see why it didn't get rules, whatever it means. Nevertheless, the workaround with "bucket sync run" doesn't seem to work, at least not with Quincy. Regards,Yixin On Monday, April 24, 2023 at 12:37:24 p.m. EDT, Soumya Koduri <skoduri(a)redhat.com> wrote: On 4/24/23 21:52, Yixin Jin wrote: > Hello ceph gurus, > > We are trying bucket-specific sync policy feature with Quincy release and we encounter something strange. Our test setup is very simple. I use mstart.sh to spin up 3 clusters, configure them with a single realm, a single zonegroup and 3 zones – z0, z1, z2, with z0 being the master. I created a zonegroup-level sync policy with “allowed”, a symmetrical flow among all 3 zones and a pipe allowing all zones to all zones. I created a single bucket “test-bucket” at z0 and uploaded a single object to it. By now, there should be no sync since the policy is “allowed” only and I can see the single file only exist in z0 and “bucket sync status” shows the sync is actually disabled. Finally, I created a bucket-specific sync policy being “enabled” and a pipe between z0 and z1 only. I expected that sync should be kicked off between z0 and z1 and I did see from “sync info” that there are sources/dests being z0/z1. “bucket sync status” also shows the source zone and source bucket. At z0, it shows everything is caught up but at z1 it shows one shard is behind, which is expected since that only object exists in z0 but not in z1. > > > > Now, here comes the strange part. Although z1 shows there is one shard behind, it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. There hasn’t been any full sync since otherwise, z1 should have that only object. It is stuck in this condition forever until I make another upload on the same object. I suspect the update of the object triggers a new data log, which triggers the sync. Why wasn’t there a full sync and how can one force a full sync? yes this is known_issue yet to be addressed with bucket level sync policy ( - https://tracker.ceph.com/issues/57489 ). The interim workaround to sync existing objects is to either * create new objects (or) * execute "bucket sync run" after creating/enabling the bucket policy. Please note that this issue is specific to only bucket policy but doesn't exist for sync-policy set at zonegroup level. Thanks, Soumya _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

1 year, 1 month

4
3
0 0

Bucket sync policy

by Yixin Jin

Hello ceph gurus, We are trying bucket-specific sync policy feature with Quincy release and we encounter something strange. Our test setup is very simple. I use mstart.sh to spin up 3 clusters, configure them with a single realm, a single zonegroup and 3 zones – z0, z1, z2, with z0 being the master. I created a zonegroup-level sync policy with “allowed”, a symmetrical flow among all 3 zones and a pipe allowing all zones to all zones. I created a single bucket “test-bucket” at z0 and uploaded a single object to it. By now, there should be no sync since the policy is “allowed” only and I can see the single file only exist in z0 and “bucket sync status” shows the sync is actually disabled. Finally, I created a bucket-specific sync policy being “enabled” and a pipe between z0 and z1 only. I expected that sync should be kicked off between z0 and z1 and I did see from “sync info” that there are sources/dests being z0/z1. “bucket sync status” also shows the source zone and source bucket. At z0, it shows everything is caught up but at z1 it shows one shard is behind, which is expected since that only object exists in z0 but not in z1. Now, here comes the strange part. Although z1 shows there is one shard behind, it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. There hasn’t been any full sync since otherwise, z1 should have that only object. It is stuck in this condition forever until I make another upload on the same object. I suspect the update of the object triggers a new data log, which triggers the sync. Why wasn’t there a full sync and how can one force a full sync? I also tried “sync error list” and they are all empty. I also tried to apply the fix in https://tracker.ceph.com/issues/57853, although I am not sure if it is relevant. The fix didn’t change the behavior that we observed. I also tried "bucket sync init" and "bucket sync run" via radosgw-admin. They don't seem to do what I expected. They simply mark z1 as not behind anymore but the single object still lives in z0 only. I wonder how mature this sync policy feature is for production use. Thanks,Yixin

1 year, 1 month

3
2
0 0

pacific 16.2.13 point release

by Yuri Weinstein

We want to do the next urgent point release for pacific 16.2.13 ASAP. The tip of the current pacific branch will be used as a base for this release and we will build it later today. Dev leads - if you have any outstanding PRs that must be included pls merged them now. Thx YuriW

1 year, 1 month

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2023