January 2023 - ceph-users

BlueFS spillover warning gone after upgrade to Quincy

by Peter van Heusden

Hello everyone I have a Ceph installation where some of the OSDs were misconfigured to use 1GB SSD partitions for rocksdb. This caused a spillover ("BlueFS *spillover* detected"). I recently upgraded to quincy using cephadm (17.2.5) the spillover warning vanished. This is despite bluestore_warn_on_bluefs_spillover still being set to true. Is there a way to investigate the current state of the DB to see if spillover is, indeed, still happening? Thank you, Peter

1 year, 4 months

5
5
0 0

heavy rotation in store.db folder alongside with traces and exceptions in the .log

by Jürgen Stawska

Hi everyone, I'm facing a weird issue with one of my pacific clusters. Brief into: - 5 Nodes Ubuntu 20.04. on 16.2.7 ( ceph01…05 ) - bootstrapped with cephadm recent image from quay.io (around 1 year ago) - approx. 200TB capacity 5% used - 5 OSD (2 HDD / 2 SSD / 1 NVMe) on each node - each node has a MON, yeah 5 MONs in charge - 3 RGW - 2 MGR - 3 MDS (2 active and 1 stby) The cluster is serving S3 files and cephFS for k8s PVCs and is doing very well. But: During a regular maintenance I found a heavy rotating store.db on EVERY node. Taking a further look, I found weird stuff going on in the #####.log The log is growing with a rate of approx. 400k/s and is rotating when reaching a certain size. store.db -rw-r--r-- 1 ceph ceph 11445745 Jan 13 09:53 1546576.log -rw-r--r-- 1 ceph ceph 67352998 Jan 13 09:53 1546578.sst -rw-r--r-- 1 ceph ceph 67349926 Jan 13 09:53 1546579.sst -rw-r--r-- 1 ceph ceph 67363989 Jan 13 09:53 1546580.sst -rw-r--r-- 1 ceph ceph 41063487 Jan 13 09:53 1546581.sst executing refresh((['ceph01', 'ceph02', 'ceph03', 'ceph04', 'ceph05'],)) failed. Traceback (most recent call last): File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec s = io.read(1) File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf))) EOFError: expected 1 bytes, got 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1357, in _remote_connection conn, connr = self.mgr._get_connection(addr) File "/usr/share/ceph/mgr/cephadm/module.py", line 1340, in _get_connection sudo=True if self.ssh_user != 'root' else False) File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 35, in __init__ self.gateway = self._make_gateway(hostname) File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway self._make_connection_string(hostname) File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway gw = gateway_bootstrap.bootstrap(io, spec) File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap bootstrap_exec(io, spec) File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec raise HostNotFound(io.remoteaddress) execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-6p_ae5op -i /tmp/cephadm-identity-hc1rt28x ubuntuadmin@<< IP_OF_CEPH-01 REPLACED >> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work return f(*arg) File "/usr/share/ceph/mgr/cephadm/serve.py", line 312, in refresh with self._remote_connection(host) as tpl: File "/lib64/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1391, in _remote_connection raise OrchestratorError(msg) from e orchestrator._interface.OrchestratorError: Failed to connect to ceph01 << IP_OF_CEPH-01 REPLACED >>). Please make sure that the host is reachable and accepts connections using the cephadm SSH key ... ... [some binary stuff here] … ... ceph01.sjtrntß$Skd???>ö#?c????Z+Removing orphan daemon mds.cephfs.ceph02…cephadm ceph01.sjtrntß$Skd???>ö#?cXx??Z-Removing daemon mds.cephfs.ceph02 from ceph01cephadm ceph01.sjtrntß$Skd???>_#?cԕ?0?Z"Removing key for mds.cephfs.ceph02cephadm ceph01.sjtrntß$Skd???>_#?cUƾ0?Z=Reconfiguring mds.cephfs.ceph02 (unknown last config time)...cephadm ceph01.sjtrntß$Skd???>_#?cE?"2?Z0Reconfiguring daemon mds.cephfs.ceph02 on ceph01cephadm ceph01.sjtrntß$Skd???>`#?c??&?Zcephadm exited with an error code: 1, stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02 /usr/bin/docker: stdout /usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02 Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02 /usr/bin/docker: stdout /usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02 Reconfig daemon mds.cephfs.ceph02 ... ERROR: cannot reconfig, data path /var/lib/ceph/<<cluster-ID REPLACED>>/mds.cephfs.ceph02 does not exist Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm code, 'ön'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02 /usr/bin/docker: stdout /usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02 Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02 /usr/bin/docker: stdout /usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02 Reconfig daemon mds.cephfs.ceph02 ... ERROR: cannot reconfig, data path /var/lib/ceph/<<cluster-ID REPLACED>>/mds.cephfs.ceph02 does not existcephadm Unable to add a Daemon without Service. ?t Please use `ceph orch apply ...` to create a Service. Note, you might want to create the service with "unmanaged=true" Traceback (most recent call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper return OrchResult(f(*args, **kwargs)) File "/usr/share/ceph/mgr/cephadm/module.py", line 2440, in add_daemon ret.extend(self._add_daemon(d_type, spec)) File "/usr/share/ceph/mgr/cephadm/module.py", line 2378, in _add_daemon raise OrchestratorError('Unable to add a Daemon without Service.\n' orchestrator._interface.OrchestratorError: Unable to add a Daemon without Service. Please use `ceph orch apply ...` to create a Service. I’m confused about the attempt of cephadm to do „things“ to a ceph02 daemon which is obviously not residing on node ceph01. Almost the same log lines are appearing on each MON host in its store.db. All in all it looks fare from healthy and I’m really concerned about that. Any help is highly appreciated! Thanks a lot. Cheers, Jürgen

1 year, 4 months

1
0
0 0

Re: RGW error Coundn't init storage provider (RADOS)

by Alexander Y. Fomichev

Hi I facing similar error a couple of days ago: radosgw-admin --cluster=cl00 realm create --rgw-realm=data00 --default ... (0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34) Numerical result out of range (this can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded) ... obviously radosgw-admin unable to create pool .rgw.root (at the same time "ceph pool create" works as expected) Crowling on a mon logs with debug=20 leads to record: "... prepare_new_pool got -34 'pgp_num' must be greater than 0 and lower or equal than 'pg_num', which in this case is 1" As for me pg_num=1 looks strange because default value of osd_pool_default_pg_num=32. On the other side default osd_pool_default_pgp_num=0 so I tried to set osd_pool_default_pgp_num=1 and it worked: pool .rgw.root was built. What really looks strange, after first success I can't reproduce it any more. After that "radosgw-admin ... realm create" successfully builds .rgw.root even with osd_pool_default_pgp_num=0.Nevertheless I suspect a record "pgp_num must be greater than 0 and lower or equal than 'pg_num', which in this case is 1" points to existing bug. It looks like default values of osd_pool_default_pg[p]_num somway ignored/omitted. On Tue, Jul 19, 2022 at 9:11 AM Robert Reihs <robert.reihs(a)gmail.com> wrote: > Yes, I checked pg_num, pgp_num and mon_max_pg_per_osd. I also setup a > single node cluster with the same ansible script we have. Using cephadm for > setting um and managing the cluster. I had the same problem on the new > single node cluster without setup of any other services. When I created the > pools manually the service started and also the dashboard connection > directly worked. > > On Mon, Jul 18, 2022 at 10:20 AM Janne Johansson <icepic.dz(a)gmail.com> > wrote: > > > No, rgw should have the ability to create its own pools. Check the caps > on > > tve keys used by the rgw daemon. > > > > Den mån 18 juli 2022 09:59Robert Reihs <robert.reihs(a)gmail.com> skrev: > > > >> Hi, > >> I had to manually create the pools, than the service automatically > started > >> and is now available. > >> pools: > >> .rgw.root > >> default.rgw.log > >> default.rgw.control > >> default.rgw.meta > >> default.rgw.buckets.index > >> default.rgw.buckets.data > >> default.rgw.buckets.non-ec > >> > >> Is this normal behavior? Should then the error message be changed? Or is > >> this a bug? > >> Best > >> Robert Reihs > >> > >> > >> On Fri, Jul 15, 2022 at 3:47 PM Robert Reihs <robert.reihs(a)gmail.com> > >> wrote: > >> > >> > Hi, > >> > When I have no luck yet solving the issue, but I can add some > >> > more information. The system pools ".rgw.root" and "default.rgw.log" > are > >> > not created. I have created them manually, Now there is more log > >> activity, > >> > but still getting the same error message in the log: > >> > rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned > >> (34) > >> > Numerical result out of range (this can be due to a pool or placement > >> group > >> > misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd > exceeded) > >> > I can't find the correct pool to create manually. > >> > Thanks for any help > >> > Best > >> > Robert > >> > > >> > On Tue, Jul 12, 2022 at 5:22 PM Robert Reihs <robert.reihs(a)gmail.com> > >> > wrote: > >> > > >> >> Hi, > >> >> > >> >> We have a problem with deloing radosgw vi cephadm. We have a Ceph > >> cluster > >> >> with 3 nodes deployed via cephadm. Pool creation, cephfs and block > >> storage > >> >> are working. > >> >> > >> >> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy > >> >> (stable) > >> >> > >> >> The service specs is like this for the rgw: > >> >> > >> >> --- > >> >> > >> >> service_type: rgw > >> >> > >> >> service_id: rgw > >> >> > >> >> placement: > >> >> > >> >> count: 3 > >> >> > >> >> label: "rgw" > >> >> > >> >> --- > >> >> > >> >> service_type: ingress > >> >> > >> >> service_id: rgw.rgw > >> >> > >> >> placement: > >> >> > >> >> count: 3 > >> >> > >> >> label: "ingress" > >> >> > >> >> spec: > >> >> > >> >> backend_service: rgw.rgw > >> >> > >> >> virtual_ip: [IPV6] > >> >> > >> >> virtual_interface_networks: [IPV6 CIDR] > >> >> > >> >> frontend_port: 8080 > >> >> > >> >> monitor_port: 1967 > >> >> > >> >> The error I get in the logfiles: > >> >> > >> >> 0 deferred set uid:gid to 167:167 (ceph:ceph) > >> >> > >> >> 0 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) > quincy > >> >> (stable), process radosgw, pid 2 > >> >> > >> >> 0 framework: beast > >> >> > >> >> 0 framework conf key: port, val: 80 > >> >> > >> >> 1 radosgw_Main not setting numa affinity > >> >> > >> >> 1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0 > >> >> > >> >> 1 D3N datacache enabled: 0 > >> >> > >> >> 0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create > returned > >> >> (34) Numerical result out of range (this can be due to a pool or > >> placement > >> >> group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd > >> >> exceeded) > >> >> > >> >> 0 rgw main: failed reading realm info: ret -34 (34) Numerical result > >> out > >> >> of range > >> >> > >> >> 0 rgw main: ERROR: failed to start notify service ((34) Numerical > >> result > >> >> out of range > >> >> > >> >> 0 rgw main: ERROR: failed to init services (ret=(34) Numerical result > >> out > >> >> of range) > >> >> > >> >> -1 Couldn't init storage provider (RADOS) > >> >> > >> >> I have for testing set the pg_num and pgp_num to 16 and the > >> >> mon_max_pg_per_osd to 1000 and still getting the same error. I have > >> also > >> >> tried creating the rgw with ceph command, same error. Pool creation > is > >> >> working, I created multiple other pools and there was no problem. > >> >> > >> >> Thanks for any help. > >> >> > >> >> Best > >> >> > >> >> Robert > >> >> > >> >> The 5 fails services are 3 from the rgw and 2 haproxy for the rgw, > >> there > >> >> is only one running: > >> >> > >> >> ceph -s > >> >> > >> >> cluster: > >> >> > >> >> id: 40ddf > >> >> > >> >> health: HEALTH_WARN > >> >> > >> >> 5 failed cephadm daemon(s) > >> >> > >> >> > >> >> > >> >> services: > >> >> > >> >> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 (age 4d) > >> >> > >> >> mgr: ceph-01.hbvyqi(active, since 4d), standbys: ceph-02.pqtxbv > >> >> > >> >> mds: 1/1 daemons up, 3 standby > >> >> > >> >> osd: 6 osds: 6 up (since 4d), 6 in (since 4d) > >> >> > >> >> > >> >> > >> >> data: > >> >> > >> >> volumes: 1/1 healthy > >> >> > >> >> pools: 5 pools, 65 pgs > >> >> > >> >> objects: 87 objects, 170 MiB > >> >> > >> >> usage: 1.4 GiB used, 19 TiB / 19 TiB avail > >> >> > >> >> pgs: 65 active+clean > >> >> > >> >> > >> > > >> > -- > >> > Robert Reihs > >> > Jakobsweg 22 > >> > 8046 Stattegg > >> > AUSTRIA > >> > > >> > mobile: +43 (664) 51 035 90 > >> > robert.reihs(a)gmail.com > >> > > >> > >> > >> -- > >> Robert Reihs > >> Jakobsweg 22 > >> 8046 Stattegg > >> AUSTRIA > >> > >> mobile: +43 (664) 51 035 90 > >> robert.reihs(a)gmail.com > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> > > > > -- > Robert Reihs > Jakobsweg 22 > 8046 Stattegg > AUSTRIA > > mobile: +43 (664) 51 035 90 > robert.reihs(a)gmail.com > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > -- Best regards. Alexander Y. Fomichev <git.user(a)gmail.com>

1 year, 4 months

1
0
0 0

OSDs failed to start after host reboot | Cephadm

by Ben Meinhart

Hello all! Linked stackoverflow post: https://stackoverflow.com/questions/75101087/cephadm-ceph-osd-fails-to-star… <https://stackoverflow.com/questions/75101087/cephadm-ceph-osd-fails-to-star…> A couple of weeks ago I deployed a new Ceph cluster using Cephadm. It is a three node cluster (node1, node2, & node3) with 6 OSD’s each; 6x18TB Seagate hard drives with a 2TB NVMe drive set as a DB device. Everything has been running smoothly until today when I went to perform maintenance on one of the nodes. I first moved all of the services off the host and put it into maintenance mode. I then made some changes to once of the NIC’s and ran updates. After the updates were done, I rebooted the machine. This is when the issue occurred. When the node (node1) finished rebooting, it was still showing as offline in the Ceph Dashboard so from one of the host I ran `ceph orch host rescan node1` and it came back online in the Ceph dashboard. I’ve seen this before when I’ve had to reboot host so NBD so far. However, after a couple of minutes passed the OSD’s on that host still haven’t come online. I then checked the status of the services `systemctl | grep ceph` and saw that all of the OSD’s had failed. # systemctl status ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6(a)osd.0.service × ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6(a)osd.0.service - Ceph osd.0 for 0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6 Loaded: loaded (/etc/systemd/system/ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2023-01-12 18:14:27 UTC; 1h 42min ago Main PID: 385982 (code=exited, status=1/FAILURE) CPU: 292ms Jan 12 19:48:30 node1 systemd[1]: /etc/systemd/system/ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer Kill It was at the reset counter max so I had to run `systemctl reset-failed` and I tried restarting the OSD’s by running `systemctl restart ceph.target`. I watched the service try to load but it kept failing. This was the output of /var/log/ceph/<fsid>/ceph-osd.0.log: 2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 set uid:gid to 167:167 (ceph:ceph) 2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process ceph-osd, pid 7 2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 pidfile_write: ignore empty --pid-file 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) open size 20000584761344 (0x1230bfc00000, 18 TiB) block_size 4096 (4 KiB) rotational discard not supported 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) open size 333396836352 (0x4da0000000, 310 GiB) block_size 4096 (4 KiB) non-rotational discard supported 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 310 GiB 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) open size 20000584761344 (0x1230bfc00000, 18 TiB) block_size 4096 (4 KiB) rotational discard not supported 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 18 TiB 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) close 2023-01-12T18:12:06.817+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) close 2023-01-12T18:12:07.085+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) close 2023-01-12T18:12:07.305+0000 7fb5d3b1e3c0 0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 load: jerasure load: lrc 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 osd.0:0.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 osd.0:1.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:2.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:3.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:4.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 osd.0 0 OSD:init: unable to mount object store 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 [0;31m ** ERROR: osd init failed: (13) Permission denied[0m Judging by the final error, it looked like some sort of permissions issue with mounting the volume to the container. I did notice on the other two host, node2 & node3, that I have not yet reboot since deploying Ceph with cephadm that it was more docker overlays mounted when I ran the `mount` command. My theory is that the LVM volume stored on the OSD’s is not being mounted at boot. Otherwise it might also be the case that the user that Ceph is passing to the containers is not allowed to mount the volumes for some reason. I’ve looked through most of the docs and forums I could find and haven’t found any solutions. I would like to say I’m fairly experienced with Linux 5+ years, but I am new to Ceph (~6 months) and I haven’t emailed this list before. Sorry in advance if I’ve mistakenly broken any roles and thanks for the help! - Ben M

1 year, 4 months

1
0
0 0

Laggy PGs on a fairly high performance cluster

by Matthew Stroud

We have a 14 osd node all ssd cluster and for some reason we are continually getting laggy PGs and those seem to correlate to slow requests on Quincy (doesn't seem to happen on our Pacific clusters). These laggy pgs seem to shift between osds. The network seems solid, as in I'm not seeing errors or slowness. OSD hosts are heavily underutilized, normally sub 1 load and the cpus are 98% idle. I have been looking through the logs and nothing is really standing out in the OSD or ceph logs. Some things we have tried: 1. Updating our cluster to 17.2.5 2. Manually setting our mClock profile to high_client_ops. 3. Increasing our total number of PGs (this something that should've happened anyways.) 4. Verified that jumbo frames, lacp, and throughput were functioning as intended. 5. Took some of our newer nodes out to see if that was an issue. Also rebooted the cluster just to be sure. I'm curious if someone in the community has experience with this kind of issue and maybe could point to something I have overlooked. Some example logs: 2023-01-10T22:50:23.245823+0000 mgr.openstack-mon01.b.pc.ostk.com.flbudm (mgr.120371640) 231175 : cluster [DBG] pgmap v235204: 2625 pgs: 1 active+clean+laggy, 2624 active+clean; 6.0 TiB data, 18 TiB used, 84 TiB / 102 TiB avail; 19 MiB/s rd, 67 MiB/s wr, 4.76k op/s 2023-01-10T22:50:23.762562+0000 osd.83 (osd.83) 906 : cluster [WRN] 6 slow requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected pool [ 'vms' : 6 ]) 2023-01-10T22:50:24.771260+0000 osd.83 (osd.83) 907 : cluster [WRN] 6 slow requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected pool [ 'vms' : 6 ]) ________________________________ CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

1 year, 4 months

1
0
0 0

CephFS: Questions regarding Namespaces, Subvolumes and Mirroring

by Jonas Schwab

Dear everyone, I have several questions regarding CephFS connected to Namespaces, Subvolumes and snapshot Mirroring: *1. How to display/create namespaces used for isolating subvolumes?* I have created multiple subvolumes with the option --namespace-isolated, so I was expecting to see the namespaces returned from ceph fs subvolume info <volume_name> <subvolume_name> also returned by rbd namespace ls <cephfs_data_pool> --format=json But the latter command just returns an empty list. Are the namespaces used for rdb and CephFS different ones? *2. Can CephFS Snapshot mirroring also be applied to subvolumes?* I tried this, but without success. Is there something to take into account rather than just mirroring the directory, or is it just not possible right now? *3. Can xattr for namespaces and pools also be mirrored?* Or more specifically, is there a way to preserve the namespace and pool layout of mirrored directories? Thank you for your help! Best regrads, Jonas PS: You could receive this mail twice, sine this email address somehow got removed from the ceph-users list.

1 year, 4 months

1
0
0 0

CephFS: Questions regarding Namespaces, Subvolumes and Mirroring

by Jonas Schwab

Dear everyone, I have several questions regarding CephFS connected to Namespaces, Subvolumes and snapshot Mirroring: *1. How to display/create namespaces used for isolating subvolumes?* I have created multiple subvolumes with the option --namespace-isolated, so I was expecting to see the namespaces returned from ceph fs subvolume info <volume_name> <subvolume_name> also returned by rbd namespace ls <cephfs_data_pool> --format=json But the latter command just returns an empty list. Are the namespaces used for rdb and CephFS different ones? *2. Can CephFS Snapshot mirroring also be applied to subvolumes?* I tried this, but without success. Is there something to take into account rather than just mirroring the directory, or is it just not possible right now? *3. Can xattr for namespaces and pools also be mirrored?* Or more specifically, is there a way to preserve the namespace and pool layout of mirrored directories? Thank you for your help! Best regrads, Jonas

1 year, 4 months

2
1
0 0

Creating nfs RGW export makes nfs-gnaesha server in crash loop

by Ruidong Gao

Hi, This is running Quincy 17.2.5 deployed by rook on k8s. RGW nfs export will crash Ganesha server pod. CephFS export works just fine. Here are steps of it: 1, create export: bash-4.4$ ceph nfs export create rgw --cluster-id nfs4rgw --pseudo-path /bucketexport --bucket testbk { "bind": "/bucketexport", "path": "testbk", "cluster": "nfs4rgw", "mode": "RW", "squash": "none" } 2, check pods status afterwards: rook-ceph-nfs-nfs1-a-679fdb795-82tcx 2/2 Running 0 4h3m rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 1/2 Error 2 4h6m 3, check failing pod’s logs: 11/01/2023 08:11:53 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 11/01/2023 08:11:54 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend 11/01/2023 08:11:54 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 11/01/2023 08:11:57 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE 11/01/2023 08:11:57 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] export_defaults_commit :CONFIG :INFO :Export Defaults now (options=03303002/00080000 , , , , , , , , expire= 0) 2023-01-11T08:11:57.853+0000 7f59dac7c200 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-admin/keyring: (2) No such file or directory 2023-01-11T08:11:57.853+0000 7f59dac7c200 -1 AuthRegistry(0x56476817a480) no keyring found at /var/lib/ceph/radosgw/ceph-admin/keyring, disabling cephx 2023-01-11T08:11:57.855+0000 7f59dac7c200 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-admin/keyring: (2) No such file or directory 2023-01-11T08:11:57.855+0000 7f59dac7c200 -1 AuthRegistry(0x7ffe4d092c90) no keyring found at /var/lib/ceph/radosgw/ceph-admin/keyring, disabling cephx 2023-01-11T08:11:57.856+0000 7f5987537700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] 2023-01-11T08:11:57.856+0000 7f5986535700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] 2023-01-11T08:12:00.861+0000 7f5986d36700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] 2023-01-11T08:12:00.861+0000 7f59dac7c200 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication failed to fetch mon config (--no-mon-config to skip) 4, delete the export: ceph nfs export delete nfs4rgw /bucketexport Ganesha servers go back normal: rook-ceph-nfs-nfs1-a-679fdb795-82tcx 2/2 Running 0 4h30m rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 2/2 Running 10 4h33m Any ideas to make it work? Thanks Ben

1 year, 4 months

2
2
0 0

Mysterious Disk-Space Eater

by duluxoz

Hi All, Got a funny one, which I'm hoping someone can help us with. We've got three identical(?) Ceph Quincy Nodes running on Rocky Linux 8.7. Each Node has 4 OSDs, plus Monitor, Manager, and iSCSI G/W services running on them (we're only a small shop). Each Node has a separate 16 GiB partition mounted as /var. Everything is running well and the Ceph Cluster is handling things very well). However, one of the Nodes (not the one currently acting as the Active Manager) is running out of space on /var. Normally, all of the Nodes have around 10% space used (via a df -H command), but the problem Node only takes 1 to 3 days to run out of space, hence taking it out of Quorum. Its currently at 85% and growing. At first we thought this was caused by an overly large log file, but investigations showed that all the logs on all 3 Nodes were of comparable size. Also, searching for the 20 largest files on the problem Node's /var didn't produce any significant results. Coincidentally, unrelated to this issue, the problem Node (but not the other 2 Nodes) was re-booted a couple of days ago and, when the Cluster had re-balanced itself and everything was back online and reporting as Healthy, the problem Node's /var was back down to around 10%, the same as the other two Nodes. This lead us to suspect that there was some sort of "run-away" process or journaling/logging/temporary file(s) or whatever that the re-boot has "cleaned up". So we've been keeping an eye on things but we can't see anything causing the issue and now, as I said above, the problem Node's /var is back up to 85% and growing. I've been looking at the log files, tying to determine the issue, but as I don't really know what I'm looking for I don't even know if I'm looking in the *correct* log files... Obviously rebooting the problem Node every couple of days is not a viable option, and increasing the size of the /var partition is only going to postpone the issue, not resolve it. So if anyone has any ideas we'd love to hear about it - thanks Cheers Dulux-Oz

1 year, 4 months

4
3
0 0

Ceph Octopus rbd images stuck in trash

by Jeff Welling

Hello there, I'm running Ceph 15.2.17 (Octopus) on Debian Buster and I'm starting an upgrade but I'm seeing a problem and I wanted to ask how best to proceed in case I make things worse by mucking with it without asking experts. I've moved an rbd image to the trash without clearing the snapshots first, and then tried to 'trash purge'. This resulted in an error because the image still has snapshots, but I'm unable to remove the image from the pool to clear the snapshots either. At least one of these images is from a clone of a snapshot from another trashed image, which I'm already kicking myself for. The contents of my trash: # rbd trash ls 07afadac0ed69c nfsroot_pi08 240ae5a5eb3214 bigdisk 7fd5138848231e nfsroot_pi01 f33e1f5bad0952 bigdisk2 fcdeb1f96a6124 raspios-64bit-lite-manuallysetup-p1 fcdebd2237697a raspios-64bit-lite-manuallysetup-p2 fd51418d5c43da nfsroot_pi02 fd514a6b4d3441 nfsroot_pi03 fd515061816c70 nfsroot_pi04 fd51566859250b nfsroot_pi05 fd5162c5885d9c nfsroot_pi07 fd5171c27c36c2 nfsroot_pi09 fd51743cb8813c nfsroot_pi10 fd517ad3bc3c9d nfsroot_pi11 fd5183bfb1e588 nfsroot_pi12 This is the error I get trying to purge the trash: # rbd trash purge Removing images: 0% complete...failed. rbd: some expired images could not be removed Ensure that they are closed/unmapped, do not have snapshots (including trashed snapshots with linked clones), are not in a group and were moved to the trash successfully. This is the error when I try and restore one of the trashed images: # rbd trash restore nfsroot_pi08 rbd: error: image does not exist in trash 2023-01-11T12:28:52.982-0800 7f4b69a7c3c0 -1 librbd::api::Trash: restore: error getting image id nfsroot_pi08 info from trash: (2) No such file or directory Trying to restore other images gives the same error. These trash images are now taking up a significant portion of the cluster space. One thought was to upgrade and see if that resolves the problem, but I've shot myself in the foot doing that in the past without confirming it would solve the problem, so I'm looking for a second opinion on how best to clear these? These are all Debian Buster systems, the kernel version of the host I'm running these commands on is: Linux zim 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1+deb10u1 (2020-04-27) x86_64 GNU/Linux I'm going to be upgrading that too but one step at a time. The exact ceph version is: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) This was installed from the ceph repos, not the debian repos, using cephadm. If there's any additional details I can share please let me know, any and all thoughts welcome! I've been googling and have found folks with similar issues but nothing similar enough to feel helpful. Thanks in advance, and thank you to any and everyone who contributes to Ceph, it's awesome!

1 year, 4 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2023