March 2020 - ceph-users - lists.ceph.io

osd can not start at boot after upgrade to octopus

by Lomayani S. Laizer

Hello, I have upgraded nautilus cluster to octopus few days ago. the cluster was running ok and even after to octopus everything was running ok the issue came when i rebooted the servers for updating the kernel. Two servers out of 6 osd's servers osd cant start. No error reported in ceph-volume.log and ceph-volume-systemd.log Starting osd with /usr/bin/ceph-osd -f --cluster ceph --id 30 --setuser ceph --setgroup ceph works just fine. the issue is starting osd in systemd ceph-volume-systemd.log 16:36:28,193][systemd][WARNING] failed activating OSD, retries left: 30 [2020-03-31 16:36:28,196][systemd][WARNING] command returned non-zero exit status: 1 [2020-03-31 16:36:28,196][systemd][WARNING] failed activating OSD, retries left: 30 [2020-03-31 16:41:25,054][systemd][INFO ] raw systemd input received: lvm-28-7f4113c8-c5cf-4f70-9f7a-7a32de9d6587 [2020-03-31 16:41:25,054][systemd][INFO ] raw systemd input received: lvm-30-8a70ad95-1c79-4502-a9a3-d5d7b9df84b6 [2020-03-31 16:41:25,054][systemd][INFO ] raw systemd input received: lvm-31-a8efb7db-686b-4789-a9c4-01442c28577f [2020-03-31 16:41:25,096][systemd][INFO ] parsed sub-command: lvm, extra data: 28-7f4113c8-c5cf-4f70-9f7a-7a32de9d6587 [2020-03-31 16:41:25,096][systemd][INFO ] parsed sub-command: lvm, extra data: 30-8a70ad95-1c79-4502-a9a3-d5d7b9df84b6 [2020-03-31 16:41:25,054][systemd][INFO ] raw systemd input received: lvm-33-7d688fc1-ed7b-45ae-ac0e-7b1787e0b64f [2020-03-31 16:41:25,096][systemd][INFO ] parsed sub-command: lvm, extra data: 31-a8efb7db-686b-4789-a9c4-01442c28577f [2020-03-31 16:41:25,068][systemd][INFO ] raw systemd input received: lvm-29-3e52d340-5416-46e6-b697-c15ca85f6883 [2020-03-31 16:41:25,096][systemd][INFO ] parsed sub-command: lvm, extra data: 33-7d688fc1-ed7b-45ae-ac0e-7b1787e0b64f [2020-03-31 16:41:25,068][systemd][INFO ] raw systemd input received: lvm-32-3841a62d-d6bc-404a-8762-163530b2d5d4 [2020-03-31 16:41:25,096][systemd][INFO ] parsed sub-command: lvm, extra data: 29-3e52d340-5416-46e6-b697-c15ca85f6883 [2020-03-31 16:41:25,096][systemd][INFO ] parsed sub-command: lvm, extra data: 32-3841a62d-d6bc-404a-8762-163530b2d5d4 [2020-03-31 16:41:25,108][ceph_volume.process][INFO ] Running command: /usr/sbin/ceph-volume lvm trigger 29-3e52d340-5416-46e6-b697-c15ca85f6883 ceph-volume.log 2-163530b2d5d4 [2020-03-31 17:17:23,679][ceph_volume.process][INFO ] Running command: /bin/systemctl enable --runtime ceph-osd@31 [2020-03-31 17:17:23,863][ceph_volume.process][INFO ] Running command: /bin/systemctl enable --runtime ceph-osd@30 [2020-03-31 17:17:24,045][ceph_volume.process][INFO ] Running command: /bin/systemctl enable --runtime ceph-osd@33 [2020-03-31 17:17:24,241][ceph_volume.process][INFO ] Running command: /bin/systemctl enable --runtime ceph-osd@32 [2020-03-31 17:17:24,449][ceph_volume.process][INFO ] Running command: /bin/systemctl enable --runtime ceph-osd@28 [2020-03-31 17:17:24,629][ceph_volume.process][INFO ] Running command: /bin/systemctl enable --runtime ceph-osd@29 [2020-03-31 17:17:24,652][ceph_volume.process][INFO ] stderr Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)31.service → /lib/systemd/system/ceph-osd@.service. [2020-03-31 17:17:24,664][ceph_volume.process][INFO ] stderr Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)30.service → /lib/systemd/system/ceph-osd@.service. [2020-03-31 17:17:24,872][ceph_volume.process][INFO ] Running command: /bin/systemctl start ceph-osd@31 [2020-03-31 17:17:24,875][ceph_volume.process][INFO ] stderr Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)33.service → /lib/systemd/system/ceph-osd@.service. [2020-03-31 17:17:25,072][ceph_volume.process][INFO ] stderr Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)32.service → /lib/systemd/system/ceph-osd@.service. [2020-03-31 17:17:25,075][ceph_volume.process][INFO ] Running command: /bin/systemctl start ceph-osd@30 [2020-03-31 17:17:25,282][ceph_volume.process][INFO ] Running command: /bin/systemctl start ceph-osd@33 [2020-03-31 17:17:25,497][ceph_volume.process][INFO ] stderr Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)28.service → /lib/systemd/system/ceph-osd@.service. [2020-03-31 17:17:25,499][ceph_volume.process][INFO ] Running command: /bin/systemctl start ceph-osd@32 [2020-03-31 17:17:25,520][ceph_volume.process][INFO ] stderr Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)29.service → /lib/systemd/system/ceph-osd@.service. [2020-03-31 17:17:25,705][ceph_volume.process][INFO ] Running command: /bin/systemctl start ceph-osd@28 [2020-03-31 17:17:25,887][ceph_volume.process][INFO ] Running command: /bin/systemctl start ceph-osd@29 -- Lomayani

4 years

2
2
0 0

No reply or very slow reply from Prometheus plugin - ceph-mgr 13.2.8 mimic

by Paul Choi

Hello, We are running Mimic 13.2.8 with our cluster, and since upgrading to 13.2.8 the Prometheus plugin seems to hang a lot. It used to respond under 10s but now it often hangs. Restarting the mgr processes helps temporarily but within minutes it gets stuck again. The active mgr doesn't exit when doing `systemctl stop ceph-mgr.target" and needs to be kill -9'ed. Is there anything I can do to address this issue, or at least get better visibility into the issue? We only have a few plugins enabled: $ ceph mgr module ls { "enabled_modules": [ "balancer", "prometheus", "zabbix" ], 3 mgr processes, but it's a pretty large cluster (near 4000 OSDs) and it's a busy one with lots of rebalancing. (I don't know if a busy cluster would seriously affect the mgr's performance, but just throwing it out there) services: mon: 5 daemons, quorum woodenbox0,woodenbox2,woodenbox4,woodenbox3,woodenbox1 mgr: woodenbox2(active), standbys: woodenbox0, woodenbox1 mds: cephfs-1/1/1 up {0=woodenbox6=up:active}, 1 up:standby-replay osd: 3964 osds: 3928 up, 3928 in; 831 remapped pgs rgw: 4 daemons active Thanks in advance for your help, -Paul Choi

4 years

4
13
0 0

Multiple CephFS creation

by Jarett DeAngelis

Hi guys, This is documented as an experimental feature, but it doesn’t explain how to ensure that affinity for a given MDS sticks to the second filesystem you create. Has anyone had success implementing a second CephFS? In my case it will be based on a completely different pool from my first one. Thanks. J

4 years

4
8
0 0

Re: Questions on Ceph cluster without OS disks

by Eric Petit

It works well for me, been running a couple clusters for 1-2 years where all OSD hosts (~200) have no system disks and instead netboot from PXE. No NFS server involved, each host loads the same system image (Debian Live squashfs) into memory on boot and runs independently from there on out. Takes some trickery to configure and bring the OSDs up on boot (using puppet in my case), though that might get easier with the containerized approach in Ceph 15+. Best, Eric > On 21 Mar 2020, at 14:18, huxiaoyu(a)horebdata.cn wrote: > > Hi, Marc, > > Indeed PXE boot makes a lot sense in large cluster, cuting down OS deployment and management burden, but only iff no single of failure is guaranteed... > > best regards, > > samuel > > > > huxiaoyu(a)horebdata.cn > > From: Marc Roos > Date: 2020-03-21 14:13 > To: ceph-users; huxiaoyu; martin.verges > Subject: RE: [ceph-users] Questions on Ceph cluster without OS disks > > I would say it is not a 'proven technology' otherwise you would see a > wide spread implementation and adaptation of this method. However if you > really need the physical disk space, it is a solution. Although I also > would have questions on creating an extra redundant environment to > service remote booting, just to spare a os disk position. Maybe this > makes more sence in really big environments. > > > > > > -----Original Message----- > From: huxiaoyu(a)horebdata.cn [mailto:huxiaoyu@horebdata.cn] > Sent: 21 March 2020 13:54 > To: Martin Verges; ceph-users > Subject: [ceph-users] Questions on Ceph cluster without OS disks > > Hello， Martin， > > I notice that Croit advocate the use of ceph cluster without OS disks, > but with PXE boot. > > Do you use a NFS server to serve the root file system for each node? > such as hosting configuration files, user and password, log files, etc. > My question is, will the NFS server be a single point of failure? If the > NFS server goes down, the network experience any outage, ceph nodes may > not be able to write to the local file systems, possibly leading to > service outage. > > How do you deal with the above potential issues in production? I am a > bit worried... > > best regards, > > samuel > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

4 years

1
0
0 0

Ceph WAL/DB disks - do they really only use 3GB, or 30Gb, or 300GB

by victorhooi＠yahoo.com

Hi, I'm using Intel Optane disks to provide WAL/DB capacity for my Ceph cluster (which is part of Proxmox - for VM hosting). I've read that WAL/DB partitions only use either 3GB, or 30GB, or 300GB - due to the way that RocksDB works. Is this true? My current partition for WAL/DB is 145 GB - does this mean that 115Gb of that will be permanently wasted? Is this behaviour documented somewhere, or is there some background, so I can understand a bit more about how it works? Thanks, Victor

4 years

2
1
0 0

Stop logging ceph-mgr every 2s

by Marc Roos

How to get rid of this logging?? Mar 31 13:40:03 c01 ceph-mgr: 2020-03-31 13:40:03.521 7f554edc8700 0 log_channel(cluster) log [DBG] : pgmap v672067: 384 pgs: 384 active+clean;

4 years

1
0
0 0

[ceph][radosgw] nautilus multisite problems

by Ignazio Cassano

Hello, I have configured a multisite ceph. The master zone has not changed but on the destination zone I had some problems. On the destination zone I cleaned and reinstalled the radosgw, but trying to assing the same zone name it had before reinstallation does not work (radosgw does not start). I changed its zone name and radosgw starts, but the master is using the old name when I try to execute: radosgw-admin period update commit Please, how can I solve this problem ? Thanks Ignazi

4 years

2
2
0 0

samba ceph-vfs and scrubbing interval

by Marco Savoca

Hi all, i‘m running a 3 node ceph cluster setup with collocated mons and mds for actually 3 filesystems at home since mimic. I’m planning to downgrade to one FS and use RBD in the future, but this is another story. I’m using the cluster as cold storage on spindles with EC-pools for archive purposes. The cluster usually does not run 24/7. I actually managed to upgrade to octopus without problems yesterday. So first of all: great job with the release. Now I have a little problem and a general question to address. I have tried to share the CephFS via samba and the ceph-vfs module but I could not manage to get write access (read access is not a problem) to the share (even with the admin key). When I share the mounted path (kernel module or fuser mount) instead as usual there are no problems at all. Is ceph-vfs generally read only and I missed this point? Furthermore I suppose, that there is no possibility to choose between the different mds namespaces, right? Now the general question. Since the cluster does not run 24/7 as stated and is turned on perhaps once a week for a couple of hours on demand, what are reasonable settings for the scrubbing intervals? As I said, the storage is cold and there is mostly read i/o. The archiving process adds approximately 0.5 % of new data of the cluster’s total storage capacity. Stay healthy and regards, Marco Savoca

4 years

3
3
0 0

Unable to use iscsi gateway with https | iscsi-gateway-add returns errors

by givemeone

Hi all, I am installing ceph Nautilus and getting constantly errors while adding iscsi gateways It was working using http schema but after moving to https with wildcard certs gives API errors Below some of my configurations Thanks for your help Command: ceph --cluster ceph dashboard iscsi-gateway-add https://myadmin:admin.01@1.2.3.4:5050 Error: Error EINVAL: iscsi REST API cannot be reached. Please check your configuration and that the API endpoint is accessible Tried also disabling ssl verify # ceph dashboard set-rgw-api-ssl-verify False Option RGW_API_SSL_VERIFY updated "/etc/ceph/iscsi-gateway.cfg" 23L, 977C # Ansible managed [config] api_password = admin.01 api_port = 5050 # API settings. # The API supports a number of options that allow you to tailor it to your # local environment. If you want to run the API under https, you will need to # create cert/key files that are compatible for each iSCSI gateway node, that is # not locked to a specific node. SSL cert and key files *must* be called # 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' directory # on *each* gateway node. With the SSL files in place, you can use 'api_secure = true' # to switch to https mode. # To support the API, the bear minimum settings are: api_secure = True # Optional settings related to the CLI/API service api_user = myadmin cluster_name = ceph loop_delay = 1 trusted_ip_list = 1.2.3.3,1.2.3.4 Log file ====== ceph-rgw-cnode04.rgw0.log 2020-03-30 10:24:20.392 7f6a2dc1b700 1 ====== req done req=0x561d9ce465f0 op status=0 http_status=200 latency=0.0119993s ====== 2020-03-30 10:24:20.394 7f6a2cc19700 1 ====== starting new request req=0x561d9ce465f0 ===== 2020-03-30 10:24:20.396 7f6a2cc19700 1 ====== req done req=0x561d9ce465f0 op status=0 http_status=404 latency=0.00199988s ====== 2020-03-30 10:24:20.397 7f6a2bc17700 1 ====== starting new request req=0x561d9ce465f0 ===== 2020-03-30 10:24:20.410 7f6a2bc17700 1 ====== req done req=0x561d9ce465f0 op status=0 http_status=200 latency=0.0129992s ====== 2020-03-30 10:24:20.499 7f6a27c0f700 1 ====== starting new request req=0x561d9cec25f0 ===== 2020-03-30 10:24:20.502 7f6a27c0f700 1 ====== req done req=0x561d9cec25f0 op status=0 http_status=200 latency=0.00299982s ====== 2020-03-30 10:24:20.504 7f6a2740e700 1 ====== starting new request req=0x561d9cec25f0 ===== 2020-03-30 10:24:20.506 7f6a2740e700 1 ====== req done req=0x561d9cec25f0 op status=0 http_status=200 latency=0.00199988s ====== 2020-03-30 10:24:30.516 7f6a22404700 1 ====== starting new request req=0x561d9cf825f0 ===== 2020-03-30 10:24:30.518 7f6a22404700 1 ====== req done req=0x561d9cf825f0 op status=0 http_status=200 latency=0.00199988s ====== 2020-03-30 10:24:30.620 7f6a1ebfd700 1 ====== starting new request req=0x561d9cf925f0 ===== 2020-03-30 10:24:30.622 7f6a1ebfd700 1 ====== req done req=0x561d9cf925f0 op status=0 http_status=200 latency=0.00199988s ====== 2020-03-30 10:24:30.708 7f6a19bf3700 1 ====== starting new request req=0x561d9cfd45f0 ===== 2020-03-30 10:24:30.708 7f6a193f2700 1 ====== starting new request req=0x561d9cfaa5f0 ===== 2020-03-30 10:24:30.710 7f6a19bf3700 1 ====== req done req=0x561d9cfd45f0 op status=0 http_status=200 latency=0.00199988s ====== 2020-03-30 10:24:30.711 7f6a193f2700 1 ====== req done req=0x561d9cfaa5f0 op status=0 http_status=200 latency=0.00299982s ====== /ceph-rgw-cnode05.rgw0.log 2020-03-30 10:07:41.309 7fb79d31c700 1 ====== req done http_status=400 ====== 2020-03-30 10:07:41.505 7fb798312700 1 ====== starting new request req=0x5565d88b45f0 ===== 2020-03-30 10:07:41.508 7fb798312700 1 ====== req done req=0x5565d88b45f0 op status=0 http_status=200 latency=0.00299982s ====== 2020-03-30 10:07:41.531 7fb79430a700 1 failed to read header: bad method 2020-03-30 10:07:41.531 7fb79430a700 1 ====== req done http_status=400 ====== 2020-03-30 10:07:41.552 7fb791304700 1 failed to read header: bad method 2020-03-30 10:07:41.552 7fb791304700 1 ====== req done http_status=400 ====== (END)

4 years

3
2
0 0

ceph cephadm generate-key => No such file or directory: '/tmp/tmp4ejhr7wh/key'

by Ml Ml

Hello List, is this a bug? root@ceph02:~# ceph cephadm generate-key Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1413, in _generate_key with open(path, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4ejhr7wh/key' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/cephadm/module.py", line 1418, in _generate_key os.unlink(path) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp4ejhr7wh/key' root@ceph02:~# dpkg -l |grep ceph ii ceph-base 15.2.0-1~bpo10+1 amd64 common ceph daemon libraries and management tools ii ceph-common 15.2.0-1~bpo10+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-deploy 2.0.1 all Ceph-deploy is an easy to use configuration tool ii ceph-mds 15.2.0-1~bpo10+1 amd64 metadata server for the ceph distributed file system ii ceph-mgr 15.2.0-1~bpo10+1 amd64 manager for the ceph distributed storage system ii ceph-mgr-cephadm 15.2.0-1~bpo10+1 all cephadm orchestrator module for ceph-mgr ii ceph-mgr-dashboard 15.2.0-1~bpo10+1 all dashboard module for ceph-mgr ii ceph-mgr-diskprediction-cloud 15.2.0-1~bpo10+1 all diskprediction-cloud module for ceph-mgr ii ceph-mgr-diskprediction-local 15.2.0-1~bpo10+1 all diskprediction-local module for ceph-mgr ii ceph-mgr-k8sevents 15.2.0-1~bpo10+1 all kubernetes events module for ceph-mgr ii ceph-mgr-modules-core 15.2.0-1~bpo10+1 all ceph manager modules which are always enabled ii ceph-mgr-rook 15.2.0-1~bpo10+1 all rook module for ceph-mgr ii ceph-mon 15.2.0-1~bpo10+1 amd64 monitor server for the ceph storage system ii ceph-osd 15.2.0-1~bpo10+1 amd64 OSD server for the ceph storage system ii cephadm 15.2.0-1~bpo10+1 amd64 cephadm utility to bootstrap ceph daemons with systemd and containers ii libcephfs1 10.2.11-2 amd64 Ceph distributed file system client library ii libcephfs2 15.2.0-1~bpo10+1 amd64 Ceph distributed file system client library ii python-ceph-argparse 14.2.8-1 all Python 2 utility libraries for Ceph CLI ii python3-ceph-argparse 15.2.0-1~bpo10+1 all Python 3 utility libraries for Ceph CLI ii python3-ceph-common 15.2.0-1~bpo10+1 all Python 3 utility libraries for Ceph ii python3-cephfs 15.2.0-1~bpo10+1 amd64 Python 3 libraries for the Ceph libcephfs library root@ceph02:~# cat /etc/debian_version 10.3 Thanks, Michael

4 years

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2020