June 2020 - ceph-users - lists.ceph.io

by Marco Pizzolo

Hello Everyone, We're working on a new cluster and seeing some oddities. The crush map viewer is not showing all hosts or OSDs. Cluster is NVMe w/4 hosts, each having 8 NVMe. Using 2 OSDs per NVMe and Encryption. Using Max size of 3, Min size of 2: [image: image.png] All OSDs appear to exist in: ceph osd tree root@prdhcistonode01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 372.60156 root default -3 93.15039 host prdhcistonode01 0 ssd 5.82190 osd.0 up 1.00000 1.00000 1 ssd 5.82190 osd.1 up 1.00000 1.00000 2 ssd 5.82190 osd.2 up 1.00000 1.00000 3 ssd 5.82190 osd.3 up 1.00000 1.00000 4 ssd 5.82190 osd.4 up 1.00000 1.00000 5 ssd 5.82190 osd.5 up 1.00000 1.00000 6 ssd 5.82190 osd.6 up 1.00000 1.00000 7 ssd 5.82190 osd.7 up 1.00000 1.00000 8 ssd 5.82190 osd.8 up 1.00000 1.00000 9 ssd 5.82190 osd.9 up 1.00000 1.00000 10 ssd 5.82190 osd.10 up 1.00000 1.00000 11 ssd 5.82190 osd.11 up 1.00000 1.00000 12 ssd 5.82190 osd.12 up 1.00000 1.00000 13 ssd 5.82190 osd.13 up 1.00000 1.00000 14 ssd 5.82190 osd.14 up 1.00000 1.00000 15 ssd 5.82190 osd.15 up 1.00000 1.00000 -5 93.15039 host prdhcistonode02 17 ssd 5.82190 osd.17 up 1.00000 1.00000 18 ssd 5.82190 osd.18 up 1.00000 1.00000 19 ssd 5.82190 osd.19 up 1.00000 1.00000 20 ssd 5.82190 osd.20 up 1.00000 1.00000 21 ssd 5.82190 osd.21 up 1.00000 1.00000 22 ssd 5.82190 osd.22 up 1.00000 1.00000 23 ssd 5.82190 osd.23 up 1.00000 1.00000 24 ssd 5.82190 osd.24 up 1.00000 1.00000 25 ssd 5.82190 osd.25 up 1.00000 1.00000 26 ssd 5.82190 osd.26 up 1.00000 1.00000 27 ssd 5.82190 osd.27 up 1.00000 1.00000 28 ssd 5.82190 osd.28 up 1.00000 1.00000 29 ssd 5.82190 osd.29 up 1.00000 1.00000 30 ssd 5.82190 osd.30 up 1.00000 1.00000 48 ssd 5.82190 osd.48 up 1.00000 1.00000 49 ssd 5.82190 osd.49 up 1.00000 1.00000 -7 93.15039 host prdhcistonode03 16 ssd 5.82190 osd.16 up 1.00000 1.00000 31 ssd 5.82190 osd.31 up 1.00000 1.00000 32 ssd 5.82190 osd.32 up 1.00000 1.00000 33 ssd 5.82190 osd.33 up 1.00000 1.00000 34 ssd 5.82190 osd.34 up 1.00000 1.00000 35 ssd 5.82190 osd.35 up 1.00000 1.00000 36 ssd 5.82190 osd.36 up 1.00000 1.00000 37 ssd 5.82190 osd.37 up 1.00000 1.00000 38 ssd 5.82190 osd.38 up 1.00000 1.00000 39 ssd 5.82190 osd.39 up 1.00000 1.00000 40 ssd 5.82190 osd.40 up 1.00000 1.00000 41 ssd 5.82190 osd.41 up 1.00000 1.00000 42 ssd 5.82190 osd.42 up 1.00000 1.00000 43 ssd 5.82190 osd.43 up 1.00000 1.00000 44 ssd 5.82190 osd.44 up 1.00000 1.00000 45 ssd 5.82190 osd.45 up 1.00000 1.00000 -9 93.15039 host prdhcistonode04 46 ssd 5.82190 osd.46 up 1.00000 1.00000 47 ssd 5.82190 osd.47 up 1.00000 1.00000 50 ssd 5.82190 osd.50 up 1.00000 1.00000 51 ssd 5.82190 osd.51 up 1.00000 1.00000 52 ssd 5.82190 osd.52 up 1.00000 1.00000 53 ssd 5.82190 osd.53 up 1.00000 1.00000 54 ssd 5.82190 osd.54 up 1.00000 1.00000 55 ssd 5.82190 osd.55 up 1.00000 1.00000 56 ssd 5.82190 osd.56 up 1.00000 1.00000 57 ssd 5.82190 osd.57 up 1.00000 1.00000 58 ssd 5.82190 osd.58 up 1.00000 1.00000 59 ssd 5.82190 osd.59 up 1.00000 1.00000 60 ssd 5.82190 osd.60 up 1.00000 1.00000 61 ssd 5.82190 osd.61 up 1.00000 1.00000 62 ssd 5.82190 osd.62 up 1.00000 1.00000 63 ssd 5.82190 osd.63 up 1.00000 1.00000 Any suggestions would be greatly appreciated. Thanks, Marco

3 years, 10 months

2
3
0 0

RBD logs

by 陈旭

Hi guys, I deploy an efk cluster and use ceph as block storage in kubernetes, but RBD write iops sometimes becomes zero and last for a few minutes. I want to check logs about RBD so I add some config to ceph.conf and restart ceph. Here is my ceph.conf: [global] fsid = 53f4e1d5-32ce-4e9c-bf36-f6b54b009962 mon_initial_members = db-16-4-hzxs, db-16-5-hzxs, db-16-6-hzxs mon_host = 10.25.16.4,10.25.16.5,10.25.16.6 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd pool default size = 3 [client] debug rbd = 20 debug rbd mirror = 20 debug rbd replay = 20 log file = /var/log/ceph/client_rbd.log I can not get any logs in /var/log/ceph/client_rbd.log. I also try to execute 'ceph daemon osd.* config set debug_rbd 20’ and there is also no related logs in ceph-osd.log. How can I get useful logs about this question or How can I analyze this problem? Look forward to your reply. Thanks ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 声明：此邮件可能包含依图公司保密或特权信息，并且仅应发送至有权接收该邮件的收件人。如果您无权收取该邮件，您应当立即删除该邮件并通知发件人，您并被禁止传播、分发或复制此邮件以及附件。对于此邮件可能携带的病毒引起的任何损害，本公司不承担任何责任。此外，本公司不保证已正确和完整地传输此信息，也不接受任何延迟收件的赔偿责任。 ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Notice: This email may contain confidential or privileged information of Yitu and was sent solely to the intended recipients. If you are unauthorized to receive this email, you should delete the email and contact the sender immediately. Any unauthorized disclosing, distribution, or copying of this email and attachment thereto is prohibited. Yitu does not accept any liability for any loss caused by possibly viruses in this email. E-mail transmission cannot be guaranteed to be secure or error-free and Yitu is not responsible for any delayed transmission.

3 years, 10 months

2
1
0 0

Help! ceph-mon is blocked after shutting down and ip address changed

by occj＠qq.com

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) os :CentOS Linux release 7.7.1908 (Core) single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used. ceph -s is blocked after shutting down the machine (192.168.0.104), then ip address changed to 192.168.1.6 I created the monmap with monmap tool and update the ceph.conf , hosts file and then start ceph-mon. and the ceph-mon log: ... 2019-12-11 08:57:45.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s 2019-12-11 08:57:50.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s 2019-12-11 08:57:55.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s 2019-12-11 08:58:00.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s 2019-12-11 08:58:05.172 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s 2019-12-11 08:58:10.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s 2019-12-11 08:58:15.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s 2019-12-11 08:58:20.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s 2019-12-11 08:58:25.174 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s ... I changed IP back to 192.168.0.104 yeasterday, but all the same. # cat /etc/ceph/ceph.conf [client.libvirt] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor [client.rgw.ceph-node1.rgw0] host = ceph-node1 keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-node1.rgw0/keyring log file = /var/log/ceph/ceph-rgw-ceph-node1.rgw0.log rgw frontends = beast endpoint=192.168.1.6:8080 rgw thread pool size = 512 # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] cluster network = 192.168.1.0/24 fsid = e384e8e6-94d5-4812-bfbb-d1b0468bdef5 mon host = [v2:192.168.1.6:3300,v1:192.168.1.6:6789] mon initial members = ceph-node1 osd crush chooseleaf type = 0 osd pool default crush rule = -1 public network = 192.168.1.0/24 [osd] osd memory target = 7870655146

3 years, 10 months

2
1
0 0

Multiple outages when disabling scrubbing

by Bryan Stillwell

The last two days we've experienced a couple short outages shortly after setting both 'noscrub' and 'nodeep-scrub' on one of our largest Ceph clusters (~2,200 OSDs). This cluster is running Nautilus (14.2.6) and setting/unsetting these flags has been done many times in the past without a problem. One thing I've noticed is that on both days right after setting 'noscrub' or 'nodeep-scrub' that a do_prune message shows up in the monitor logs followed by a timeout. About 30 seconds later we start seeing OSDs getting marked down: 2020-06-03 08:06:53.914 7fcc3ed57700 0 mon.p3cephmon004@0(leader) e11 handle_command mon_command({"prefix": "osd set", "key": "noscrub"} v 0) v1 2020-06-03 08:06:53.914 7fcc3ed57700 0 log_channel(audit) log [INF] : from='client.5773023471 10.2.128.8:0/523139029' entity='client.admin' cmd=[{"prefix": "osd set", "key": "noscrub"}]: dispatch 2020-06-03 08:06:54.231 7fcc4155c700 1 mon.p3lcephmon004(a)0(leader).osd e1535232 do_prune osdmap full prune enabled 2020-06-03 08:06:54.318 7fcc3f558700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7fcc3f558700' had timed out after 0 2020-06-03 08:06:54.319 7fcc4055a700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7fcc4055a700' had timed out after 0 2020-06-03 08:06:54.319 7fcc40d5b700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7fcc40d5b700' had timed out after 0 2020-06-03 08:06:54.319 7fcc3fd59700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7fcc3fd59700' had timed out after 0 ... 2020-06-03 08:07:16.049 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 prepare_failure osd.736 [v2:10.6.170.130:6816/1294580,v1:10.6.170.130:6817/1294580] from osd.1165 is reporting failure:1 2020-06-03 08:07:16.049 7fcc3ed57700 0 log_channel(cluster) log [DBG] : osd.736 reported failed by osd.1165 2020-06-03 08:07:16.304 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 prepare_failure osd.736 [v2:10.6.170.130:6816/1294580,v1:10.6.170.130:6817/1294580] from osd.127 is reporting failure:1 2020-06-03 08:07:16.304 7fcc3ed57700 0 log_channel(cluster) log [DBG] : osd.736 reported failed by osd.127 2020-06-03 08:07:16.693 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 prepare_failure osd.736 [v2:10.6.170.130:6816/1294580,v1:10.6.170.130:6817/1294580] from osd.1455 is reporting failure:1 2020-06-03 08:07:16.693 7fcc3ed57700 0 log_channel(cluster) log [DBG] : osd.736 reported failed by osd.1455 2020-06-03 08:07:16.695 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 we have enough reporters to mark osd.736 down 2020-06-03 08:07:16.696 7fcc3ed57700 0 log_channel(cluster) log [INF] : osd.736 failed (root=default,rack=S06-06,chassis=S06-06-17,host=p3cephosd386) (3 reporters from different host after 20.389591 >= grace 20.025280) 2020-06-03 08:07:16.696 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 prepare_failure osd.1463 [v2:10.7.208.30:6824/3947672,v1:10.7.208.30:6825/3947672] from osd.1455 is reporting failure:1 2020-06-03 08:07:16.696 7fcc3ed57700 0 log_channel(cluster) log [DBG] : osd.1463 reported failed by osd.1455 2020-06-03 08:07:16.758 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 prepare_failure osd.1463 [v2:10.7.208.30:6824/3947672,v1:10.7.208.30:6825/3947672] from osd.2108 is reporting failure:1 2020-06-03 08:07:16.758 7fcc3ed57700 0 log_channel(cluster) log [DBG] : osd.1463 reported failed by osd.2108 2020-06-03 08:07:16.800 7fcc3ed57700 1 mon.p3cephmon004(a)0(leader).osd e1535234 prepare_failure osd.1463 [v2:10.7.208.30:6824/3947672,v1:10.7.208.30:6825/3947672] from osd.1166 is reporting failure:1 2020-06-03 08:07:16.800 7fcc3ed57700 0 log_channel(cluster) log [DBG] : osd.1463 reported failed by osd.1166 2020-06-03 08:07:16.835 7fcc4155c700 1 mon.p3cephmon004(a)0(leader).osd e1535234 do_prune osdmap full prune enabled ... Does any one know why setting the no scrubbing flags would cause such an issue? Or if this is a known issue with a fix in 14.2.9 or 14.2.10 (when it comes out)? Thanks, Bryan

3 years, 10 months

1
0
0 0

pg-upmap-items

by Thomas Bennett

Hi, I've been using pg-upmap items both in the ceph balancer and by hand running osdmaptool for a while now (on Ceph 12.2.13). But I've noticed a side effect of up-map-items which can sometimes lead to some unnecessary data movement. My understanding is that the ceph osdmap keeps track of upmap-items that I undo (in my case using the CERN scrip upmap-remapped.py). These can be seen in the osdmap (or osd dump) json output. It looks, for example, like this: "pg_upmap_items": [ { "pgid": "9.10", "mappings": [ { "from": 1761, "to": 6 } ] }, When upmapping pg 9.10 I first need to clear this pg_upmap_item by executing an rm-upmap-item command: ceph rm-pg-upmap-items 9.10 All this does is unmap the from/to osds (here from osd.6 to osd.1761) which is sometimes not useful. I would prefer to rather "forget" this upmap. I.e remove it permanently from pg_upmap_items. Is there any way to do this? Cheers, Toms

3 years, 10 months

1
0
0 0

Deploy nfs from cephadm

by Simon Sutter

Hello Ceph users, I'm trying to deploy nfs-ganesha with cephadm on octopus. According to the docs, it's as easy as running the command in the docs: https://docs.ceph.com/docs/master/cephadm/install/#deploying-nfs-ganesha

3 years, 10 months

4
5
0 0

Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130

by Gencer W. Genç

Hi All, I am trying to upgrade ceph 15.2.1 to 15.2.3. I've two node setup on small environment for test only. I ran the following commands: $ ceph mon ok-to-stop mon.vx-rg23-rk65-u43-130 >> quorum should be preserved (vx-rg23-rk65-u43-130,vx-rg23-rk65-u43-130-1) after stopping [mon.vx-rg23-rk65-u43-130] $ ceph orch upgrade start --ceph-version 15.2.3 However, Ceph says it is NOT safe to stop mon.vx-rg23-rk65-u43-130. Debug log: 2020-06-01T22:09:21.967310+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [INF] Upgrade: Checking mon daemons... 2020-06-01T22:09:21.967426+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] daemon mon.vx-rg23-rk65-u43-130 not correct (docker.io/ceph/ceph:v15, bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2, 15.2.1) 2020-06-01T22:09:21.967563+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] Have connection to vx-rg23-rk65-u43-130 2020-06-01T22:09:21.967668+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] None container image docker.io/ceph/ceph:v15.2.3 2020-06-01T22:09:21.967778+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] args: --image docker.io/ceph/ceph:v15.2.3 inspect-image 2020-06-01T22:09:23.400842+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] code: 0 2020-06-01T22:09:23.401062+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] out: { "ceph_version": "ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)", "image_id": "d72755c420bcbdae08d063de6035d060ea0487f8a43f777c75bdbfcd9fd907fa" } 2020-06-01T22:09:23.404700+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] mon_command: 'mon ok-to-stop' -> -16 in 0.002s 2020-06-01T22:09:23.405002+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [INF] Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130 2020-06-01T22:09:38.416475+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] mon_command: 'mon ok-to-stop' -> -16 in 0.003s 2020-06-01T22:09:38.417296+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [INF] Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130 2020-06-01T22:09:53.421473+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] mon_command: 'mon ok-to-stop' -> -16 in 0.003s 2020-06-01T22:09:53.422350+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [INF] Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130 2020-06-01T22:10:08.440422+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [DBG] mon_command: 'mon ok-to-stop' -> -16 in 0.003s 2020-06-01T22:10:08.441122+0000 mgr.vx-rg23-rk65-u43-130-1.pxmyie [INF] Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130 How can I solve this and upgrade? Thanks, Gencer.

3 years, 10 months

1
1
0 0

Re: professional services and support for newest Ceph

by Patrick Calhoun

Kevin, Ignazio, Marc, Thanks for the information. I now consider myself well-advised. -Patrick On Tue, Jun 2, 2020 at 1:21 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > Ceph is from redhat and redhat is owned by IBM. I think the best > training you could get would be from RedHat. > > I would not advise to learn how to use a mouse with a web interface nor > this ansible or some other deploy tool. Do it from scratch manually so > you know the basics. If you know those, go for some tools that make your > life easier. (and never install the newest stable release ;)) > > > > > -----Original Message----- > Subject: [ceph-users] Re: professional services and support for newest > Ceph > > Hello, I am testing ceph from croit and it works fine: very easy web > interface for installing and managing ceph and very clear support > pricing. > Ignazio > > Il Mar 2 Giu 2020, 19:36 <response(a)ifastnet.com> ha scritto: > > > and theres > > > > https://croit.io/consulting > > > > best regards > > Kevin M > > > > ----- Original Message ----- > > From: "Patrick Calhoun" <phineas(a)ou.edu> > > To: ceph-users(a)ceph.io > > Sent: Tuesday, June 2, 2020 5:29:11 PM > > Subject: [ceph-users] professional services and support for newest > > Ceph > > > > Are there reputable training/support options for Ceph that are not > > geared toward a specific commercial product (e.g. "Red Hat Ceph > > Storage,") but instead would cover the newest open source stable > release? > > > > Thanks, > > Patrick > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > > email to ceph-users-leave(a)ceph.io > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > > email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > > > -- Patrick Calhoun, RHCE Petascale Storage Administrator OU Supercomputing Center for Education and Research Department of Information Technology University of Oklahoma (405) 325-4210

3 years, 10 months

1
0
0 0

professional services and support for newest Ceph

by Patrick Calhoun

Are there reputable training/support options for Ceph that are not geared toward a specific commercial product (e.g. "Red Hat Ceph Storage,") but instead would cover the newest open source stable release? Thanks, Patrick

3 years, 10 months

5
5
0 0

OSD upgrades

by Brent Kennedy

We are rebuilding servers and before luminous our process was: 1. Reweight the OSD to 0 2. Wait for rebalance to complete 3. Out the osd 4. Crush remove osd 5. Auth del osd 6. Ceph osd rm # Seems the luminous documentation says that you should: 1. Out the osd 2. Wait for the cluster rebalance to finish 3. Stop the osd 4. Osd purge # Is reweighting to 0 no longer suggested? Side note: I tried our existing process and even after reweight, the entire cluster restarted the balance again after step 4 ( crush remove osd ) of the old process. I should also note, by reweighting to 0, when I tried to run "ceph osd out #", it said it was already marked out. I assume the docs are correct, but just want to make sure since reweighting had been previously recommended. Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways

3 years, 10 months

4
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2020