Hello everyone
I have a Ceph installation where some of the OSDs were misconfigured to use
1GB SSD partitions for rocksdb. This caused a spillover ("BlueFS *spillover*
detected"). I recently upgraded to quincy using cephadm (17.2.5) the
spillover warning vanished. This is
despite bluestore_warn_on_bluefs_spillover still being set to true.
Is there a way to investigate the current state of the DB to see if
spillover is, indeed, still happening?
Thank you,
Peter
Hi everyone,
I'm facing a weird issue with one of my pacific clusters.
Brief into:
- 5 Nodes Ubuntu 20.04. on 16.2.7 ( ceph01…05 )
- bootstrapped with cephadm recent image from quay.io (around 1 year ago)
- approx. 200TB capacity 5% used
- 5 OSD (2 HDD / 2 SSD / 1 NVMe) on each node
- each node has a MON, yeah 5 MONs in charge
- 3 RGW
- 2 MGR
- 3 MDS (2 active and 1 stby)
The cluster is serving S3 files and cephFS for k8s PVCs and is doing very well.
But:
During a regular maintenance I found a heavy rotating store.db on EVERY node. Taking a further look, I found weird stuff going on in the #####.log
The log is growing with a rate of approx. 400k/s and is rotating when reaching a certain size.
store.db
-rw-r--r-- 1 ceph ceph 11445745 Jan 13 09:53 1546576.log
-rw-r--r-- 1 ceph ceph 67352998 Jan 13 09:53 1546578.sst
-rw-r--r-- 1 ceph ceph 67349926 Jan 13 09:53 1546579.sst
-rw-r--r-- 1 ceph ceph 67363989 Jan 13 09:53 1546580.sst
-rw-r--r-- 1 ceph ceph 41063487 Jan 13 09:53 1546581.sst
executing refresh((['ceph01', 'ceph02', 'ceph03', 'ceph04', 'ceph05'],)) failed.
Traceback (most recent call last):
File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in bootstrap_exec
s = io.read(1)
File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
EOFError: expected 1 bytes, got 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1357, in _remote_connection
conn, connr = self.mgr._get_connection(addr)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1340, in _get_connection
sudo=True if self.ssh_user != 'root' else False)
File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 35, in __init__
self.gateway = self._make_gateway(hostname)
File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in _make_gateway
self._make_connection_string(hostname)
File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway
gw = gateway_bootstrap.bootstrap(io, spec)
File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap
bootstrap_exec(io, spec)
File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in bootstrap_exec
raise HostNotFound(io.remoteaddress)
execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-6p_ae5op -i /tmp/cephadm-identity-hc1rt28x ubuntuadmin@<< IP_OF_CEPH-01 REPLACED >>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 312, in refresh
with self._remote_connection(host) as tpl:
File "/lib64/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1391, in _remote_connection
raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to ceph01 << IP_OF_CEPH-01 REPLACED >>).
Please make sure that the host is reachable and accepts connections using the cephadm SSH key
...
... [some binary stuff here] …
...
ceph01.sjtrntß$Skd???>ö#?c????Z+Removing orphan daemon mds.cephfs.ceph02…cephadm
ceph01.sjtrntß$Skd???>ö#?cXx??Z-Removing daemon mds.cephfs.ceph02 from ceph01cephadm
ceph01.sjtrntß$Skd???>_#?cԕ?0?Z"Removing key for mds.cephfs.ceph02cephadm
ceph01.sjtrntß$Skd???>_#?cUƾ0?Z=Reconfiguring mds.cephfs.ceph02 (unknown last config time)...cephadm
ceph01.sjtrntß$Skd???>_#?cE?"2?Z0Reconfiguring daemon mds.cephfs.ceph02 on ceph01cephadm
ceph01.sjtrntß$Skd???>`#?c??&?Zcephadm exited with an error code: 1, stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02
Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02
Reconfig daemon mds.cephfs.ceph02 ...
ERROR: cannot reconfig, data path /var/lib/ceph/<<cluster-ID REPLACED>>/mds.cephfs.ceph02 does not exist
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
yield (conn, connr)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
code, 'ön'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds-cephfs-ceph02
Non-zero exit code 1 from /usr/bin/docker container inspect --format ää.State.Status¨¨ ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error: No such container: ceph-<<cluster-ID REPLACED>>-mds.cephfs.ceph02
Reconfig daemon mds.cephfs.ceph02 ...
ERROR: cannot reconfig, data path /var/lib/ceph/<<cluster-ID REPLACED>>/mds.cephfs.ceph02 does not existcephadm
Unable to add a Daemon without Service. ?t
Please use `ceph orch apply ...` to create a Service.
Note, you might want to create the service with "unmanaged=true"
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 2440, in add_daemon
ret.extend(self._add_daemon(d_type, spec))
File "/usr/share/ceph/mgr/cephadm/module.py", line 2378, in _add_daemon
raise OrchestratorError('Unable to add a Daemon without Service.\n'
orchestrator._interface.OrchestratorError: Unable to add a Daemon without Service.
Please use `ceph orch apply ...` to create a Service.
I’m confused about the attempt of cephadm to do „things“ to a ceph02 daemon which is obviously not residing on node ceph01. Almost the same log lines are appearing on each MON host in its store.db.
All in all it looks fare from healthy and I’m really concerned about that.
Any help is highly appreciated! Thanks a lot.
Cheers,
Jürgen
Hi
I facing similar error a couple of days ago:
radosgw-admin --cluster=cl00 realm create --rgw-realm=data00 --default
...
(0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
(34) Numerical result out of range (this can be due to a pool or placement
group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
exceeded)
...
obviously radosgw-admin unable to create pool .rgw.root (at the same time
"ceph pool create" works as expected)
Crowling on a mon logs with debug=20 leads to record:
"... prepare_new_pool got -34 'pgp_num' must be greater than 0 and lower or
equal than 'pg_num', which in this case is 1"
As for me pg_num=1 looks strange because default value of
osd_pool_default_pg_num=32.
On the other side default osd_pool_default_pgp_num=0 so I tried to set
osd_pool_default_pgp_num=1 and it worked:
pool .rgw.root was built.
What really looks strange, after first success I can't reproduce it any
more.
After that "radosgw-admin ... realm create" successfully builds .rgw.root
even with osd_pool_default_pgp_num=0.Nevertheless I suspect a record
"pgp_num must be greater than 0 and lower or equal than 'pg_num', which in
this case is 1"
points to existing bug. It looks like default values of
osd_pool_default_pg[p]_num somway ignored/omitted.
On Tue, Jul 19, 2022 at 9:11 AM Robert Reihs <robert.reihs(a)gmail.com> wrote:
> Yes, I checked pg_num, pgp_num and mon_max_pg_per_osd. I also setup a
> single node cluster with the same ansible script we have. Using cephadm for
> setting um and managing the cluster. I had the same problem on the new
> single node cluster without setup of any other services. When I created the
> pools manually the service started and also the dashboard connection
> directly worked.
>
> On Mon, Jul 18, 2022 at 10:20 AM Janne Johansson <icepic.dz(a)gmail.com>
> wrote:
>
> > No, rgw should have the ability to create its own pools. Check the caps
> on
> > tve keys used by the rgw daemon.
> >
> > Den mån 18 juli 2022 09:59Robert Reihs <robert.reihs(a)gmail.com> skrev:
> >
> >> Hi,
> >> I had to manually create the pools, than the service automatically
> started
> >> and is now available.
> >> pools:
> >> .rgw.root
> >> default.rgw.log
> >> default.rgw.control
> >> default.rgw.meta
> >> default.rgw.buckets.index
> >> default.rgw.buckets.data
> >> default.rgw.buckets.non-ec
> >>
> >> Is this normal behavior? Should then the error message be changed? Or is
> >> this a bug?
> >> Best
> >> Robert Reihs
> >>
> >>
> >> On Fri, Jul 15, 2022 at 3:47 PM Robert Reihs <robert.reihs(a)gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> > When I have no luck yet solving the issue, but I can add some
> >> > more information. The system pools ".rgw.root" and "default.rgw.log"
> are
> >> > not created. I have created them manually, Now there is more log
> >> activity,
> >> > but still getting the same error message in the log:
> >> > rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
> >> (34)
> >> > Numerical result out of range (this can be due to a pool or placement
> >> group
> >> > misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
> exceeded)
> >> > I can't find the correct pool to create manually.
> >> > Thanks for any help
> >> > Best
> >> > Robert
> >> >
> >> > On Tue, Jul 12, 2022 at 5:22 PM Robert Reihs <robert.reihs(a)gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> We have a problem with deloing radosgw vi cephadm. We have a Ceph
> >> cluster
> >> >> with 3 nodes deployed via cephadm. Pool creation, cephfs and block
> >> storage
> >> >> are working.
> >> >>
> >> >> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
> >> >> (stable)
> >> >>
> >> >> The service specs is like this for the rgw:
> >> >>
> >> >> ---
> >> >>
> >> >> service_type: rgw
> >> >>
> >> >> service_id: rgw
> >> >>
> >> >> placement:
> >> >>
> >> >> count: 3
> >> >>
> >> >> label: "rgw"
> >> >>
> >> >> ---
> >> >>
> >> >> service_type: ingress
> >> >>
> >> >> service_id: rgw.rgw
> >> >>
> >> >> placement:
> >> >>
> >> >> count: 3
> >> >>
> >> >> label: "ingress"
> >> >>
> >> >> spec:
> >> >>
> >> >> backend_service: rgw.rgw
> >> >>
> >> >> virtual_ip: [IPV6]
> >> >>
> >> >> virtual_interface_networks: [IPV6 CIDR]
> >> >>
> >> >> frontend_port: 8080
> >> >>
> >> >> monitor_port: 1967
> >> >>
> >> >> The error I get in the logfiles:
> >> >>
> >> >> 0 deferred set uid:gid to 167:167 (ceph:ceph)
> >> >>
> >> >> 0 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1)
> quincy
> >> >> (stable), process radosgw, pid 2
> >> >>
> >> >> 0 framework: beast
> >> >>
> >> >> 0 framework conf key: port, val: 80
> >> >>
> >> >> 1 radosgw_Main not setting numa affinity
> >> >>
> >> >> 1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0
> >> >>
> >> >> 1 D3N datacache enabled: 0
> >> >>
> >> >> 0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create
> returned
> >> >> (34) Numerical result out of range (this can be due to a pool or
> >> placement
> >> >> group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
> >> >> exceeded)
> >> >>
> >> >> 0 rgw main: failed reading realm info: ret -34 (34) Numerical result
> >> out
> >> >> of range
> >> >>
> >> >> 0 rgw main: ERROR: failed to start notify service ((34) Numerical
> >> result
> >> >> out of range
> >> >>
> >> >> 0 rgw main: ERROR: failed to init services (ret=(34) Numerical result
> >> out
> >> >> of range)
> >> >>
> >> >> -1 Couldn't init storage provider (RADOS)
> >> >>
> >> >> I have for testing set the pg_num and pgp_num to 16 and the
> >> >> mon_max_pg_per_osd to 1000 and still getting the same error. I have
> >> also
> >> >> tried creating the rgw with ceph command, same error. Pool creation
> is
> >> >> working, I created multiple other pools and there was no problem.
> >> >>
> >> >> Thanks for any help.
> >> >>
> >> >> Best
> >> >>
> >> >> Robert
> >> >>
> >> >> The 5 fails services are 3 from the rgw and 2 haproxy for the rgw,
> >> there
> >> >> is only one running:
> >> >>
> >> >> ceph -s
> >> >>
> >> >> cluster:
> >> >>
> >> >> id: 40ddf
> >> >>
> >> >> health: HEALTH_WARN
> >> >>
> >> >> 5 failed cephadm daemon(s)
> >> >>
> >> >>
> >> >>
> >> >> services:
> >> >>
> >> >> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 (age 4d)
> >> >>
> >> >> mgr: ceph-01.hbvyqi(active, since 4d), standbys: ceph-02.pqtxbv
> >> >>
> >> >> mds: 1/1 daemons up, 3 standby
> >> >>
> >> >> osd: 6 osds: 6 up (since 4d), 6 in (since 4d)
> >> >>
> >> >>
> >> >>
> >> >> data:
> >> >>
> >> >> volumes: 1/1 healthy
> >> >>
> >> >> pools: 5 pools, 65 pgs
> >> >>
> >> >> objects: 87 objects, 170 MiB
> >> >>
> >> >> usage: 1.4 GiB used, 19 TiB / 19 TiB avail
> >> >>
> >> >> pgs: 65 active+clean
> >> >>
> >> >>
> >> >
> >> > --
> >> > Robert Reihs
> >> > Jakobsweg 22
> >> > 8046 Stattegg
> >> > AUSTRIA
> >> >
> >> > mobile: +43 (664) 51 035 90
> >> > robert.reihs(a)gmail.com
> >> >
> >>
> >>
> >> --
> >> Robert Reihs
> >> Jakobsweg 22
> >> 8046 Stattegg
> >> AUSTRIA
> >>
> >> mobile: +43 (664) 51 035 90
> >> robert.reihs(a)gmail.com
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users(a)ceph.io
> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >>
> >
>
> --
> Robert Reihs
> Jakobsweg 22
> 8046 Stattegg
> AUSTRIA
>
> mobile: +43 (664) 51 035 90
> robert.reihs(a)gmail.com
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
Best regards.
Alexander Y. Fomichev <git.user(a)gmail.com>
Hello all!
Linked stackoverflow post: https://stackoverflow.com/questions/75101087/cephadm-ceph-osd-fails-to-star… <https://stackoverflow.com/questions/75101087/cephadm-ceph-osd-fails-to-star…>
A couple of weeks ago I deployed a new Ceph cluster using Cephadm. It is a three node cluster (node1, node2, & node3) with 6 OSD’s each; 6x18TB Seagate hard drives with a 2TB NVMe drive set as a DB device. Everything has been running smoothly until today when I went to perform maintenance on one of the nodes. I first moved all of the services off the host and put it into maintenance mode. I then made some changes to once of the NIC’s and ran updates. After the updates were done, I rebooted the machine. This is when the issue occurred.
When the node (node1) finished rebooting, it was still showing as offline in the Ceph Dashboard so from one of the host I ran `ceph orch host rescan node1` and it came back online in the Ceph dashboard. I’ve seen this before when I’ve had to reboot host so NBD so far.
However, after a couple of minutes passed the OSD’s on that host still haven’t come online. I then checked the status of the services `systemctl | grep ceph` and saw that all of the OSD’s had failed.
# systemctl status ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6(a)osd.0.service
× ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6(a)osd.0.service - Ceph osd.0 for 0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6
Loaded: loaded (/etc/systemd/system/ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2023-01-12 18:14:27 UTC; 1h 42min ago
Main PID: 385982 (code=exited, status=1/FAILURE)
CPU: 292ms
Jan 12 19:48:30 node1 systemd[1]: /etc/systemd/system/ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer Kill
It was at the reset counter max so I had to run `systemctl reset-failed` and I tried restarting the OSD’s by running `systemctl restart ceph.target`. I watched the service try to load but it kept failing.
This was the output of /var/log/ceph/<fsid>/ceph-osd.0.log:
2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 set uid:gid to 167:167 (ceph:ceph)
2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process ceph-osd, pid 7
2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 pidfile_write: ignore empty --pid-file
2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) open size 20000584761344 (0x1230bfc00000, 18 TiB) block_size 4096 (4 KiB) rotational discard not supported
2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db
2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) open size 333396836352 (0x4da0000000, 310 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 310 GiB
2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) open size 20000584761344 (0x1230bfc00000, 18 TiB) block_size 4096 (4 KiB) rotational discard not supported
2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 18 TiB
2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) close
2023-01-12T18:12:06.817+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) close
2023-01-12T18:12:07.085+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) close
2023-01-12T18:12:07.305+0000 7fb5d3b1e3c0 0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 load: jerasure load: lrc
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 osd.0:0.OSDShard using op scheduler mClockScheduler
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 osd.0:1.OSDShard using op scheduler mClockScheduler
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:2.OSDShard using op scheduler mClockScheduler
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:3.OSDShard using op scheduler mClockScheduler
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:4.OSDShard using op scheduler mClockScheduler
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 osd.0 0 OSD:init: unable to mount object store
2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 [0;31m ** ERROR: osd init failed: (13) Permission denied[0m
Judging by the final error, it looked like some sort of permissions issue with mounting the volume to the container. I did notice on the other two host, node2 & node3, that I have not yet reboot since deploying Ceph with cephadm that it was more docker overlays mounted when I ran the `mount` command. My theory is that the LVM volume stored on the OSD’s is not being mounted at boot. Otherwise it might also be the case that the user that Ceph is passing to the containers is not allowed to mount the volumes for some reason.
I’ve looked through most of the docs and forums I could find and haven’t found any solutions. I would like to say I’m fairly experienced with Linux 5+ years, but I am new to Ceph (~6 months) and I haven’t emailed this list before. Sorry in advance if I’ve mistakenly broken any roles and thanks for the help!
- Ben M
We have a 14 osd node all ssd cluster and for some reason we are continually getting laggy PGs and those seem to correlate to slow requests on Quincy (doesn't seem to happen on our Pacific clusters). These laggy pgs seem to shift between osds. The network seems solid, as in I'm not seeing errors or slowness. OSD hosts are heavily underutilized, normally sub 1 load and the cpus are 98% idle. I have been looking through the logs and nothing is really standing out in the OSD or ceph logs.
Some things we have tried:
1. Updating our cluster to 17.2.5
2. Manually setting our mClock profile to high_client_ops.
3. Increasing our total number of PGs (this something that should've happened anyways.)
4. Verified that jumbo frames, lacp, and throughput were functioning as intended.
5. Took some of our newer nodes out to see if that was an issue. Also rebooted the cluster just to be sure.
I'm curious if someone in the community has experience with this kind of issue and maybe could point to something I have overlooked.
Some example logs:
2023-01-10T22:50:23.245823+0000 mgr.openstack-mon01.b.pc.ostk.com.flbudm (mgr.120371640) 231175 : cluster [DBG] pgmap v235204: 2625 pgs: 1 active+clean+laggy, 2624 active+clean; 6.0 TiB data, 18 TiB used, 84 TiB
/ 102 TiB avail; 19 MiB/s rd, 67 MiB/s wr, 4.76k op/s
2023-01-10T22:50:23.762562+0000 osd.83 (osd.83) 906 : cluster [WRN] 6 slow requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected pool [ 'vms' : 6 ])
2023-01-10T22:50:24.771260+0000 osd.83 (osd.83) 907 : cluster [WRN] 6 slow requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected pool [ 'vms' : 6 ])
________________________________
CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
Dear everyone,
I have several questions regarding CephFS connected to Namespaces,
Subvolumes and snapshot Mirroring:
*1. How to display/create namespaces used for isolating subvolumes?*
I have created multiple subvolumes with the option
--namespace-isolated, so I was expecting to see the namespaces returned from
ceph fs subvolume info <volume_name> <subvolume_name>
also returned by
rbd namespace ls <cephfs_data_pool> --format=json
But the latter command just returns an empty list. Are the
namespaces used for rdb and CephFS different ones?
*2. Can CephFS Snapshot mirroring also be applied to subvolumes?*
I tried this, but without success. Is there something to take into
account rather than just mirroring the directory, or is it just not
possible right now?
*3. Can xattr for namespaces and pools also be mirrored?*
Or more specifically, is there a way to preserve the namespace and
pool layout of mirrored directories?
Thank you for your help!
Best regrads,
Jonas
PS: You could receive this mail twice, sine this email address somehow
got removed from the ceph-users list.
Dear everyone,
I have several questions regarding CephFS connected to Namespaces,
Subvolumes and snapshot Mirroring:
*1. How to display/create namespaces used for isolating subvolumes?*
I have created multiple subvolumes with the option
--namespace-isolated, so I was expecting to see the namespaces returned from
ceph fs subvolume info <volume_name> <subvolume_name>
also returned by
rbd namespace ls <cephfs_data_pool> --format=json
But the latter command just returns an empty list. Are the
namespaces used for rdb and CephFS different ones?
*2. Can CephFS Snapshot mirroring also be applied to subvolumes?*
I tried this, but without success. Is there something to take into
account rather than just mirroring the directory, or is it just not
possible right now?
*3. Can xattr for namespaces and pools also be mirrored?*
Or more specifically, is there a way to preserve the namespace and
pool layout of mirrored directories?
Thank you for your help!
Best regrads,
Jonas
Hi,
This is running Quincy 17.2.5 deployed by rook on k8s. RGW nfs export will crash Ganesha server pod. CephFS export works just fine. Here are steps of it:
1, create export:
bash-4.4$ ceph nfs export create rgw --cluster-id nfs4rgw --pseudo-path /bucketexport --bucket testbk
{
"bind": "/bucketexport",
"path": "testbk",
"cluster": "nfs4rgw",
"mode": "RW",
"squash": "none"
}
2, check pods status afterwards:
rook-ceph-nfs-nfs1-a-679fdb795-82tcx 2/2 Running 0 4h3m
rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 1/2 Error 2 4h6m
3, check failing pod’s logs:
11/01/2023 08:11:53 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
11/01/2023 08:11:54 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend
11/01/2023 08:11:54 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0)
11/01/2023 08:11:57 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
11/01/2023 08:11:57 : epoch 63be6f49 : rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 : nfs-ganesha-1[main] export_defaults_commit :CONFIG :INFO :Export Defaults now (options=03303002/00080000 , , , , , , , , expire= 0)
2023-01-11T08:11:57.853+0000 7f59dac7c200 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-admin/keyring: (2) No such file or directory
2023-01-11T08:11:57.853+0000 7f59dac7c200 -1 AuthRegistry(0x56476817a480) no keyring found at /var/lib/ceph/radosgw/ceph-admin/keyring, disabling cephx
2023-01-11T08:11:57.855+0000 7f59dac7c200 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-admin/keyring: (2) No such file or directory
2023-01-11T08:11:57.855+0000 7f59dac7c200 -1 AuthRegistry(0x7ffe4d092c90) no keyring found at /var/lib/ceph/radosgw/ceph-admin/keyring, disabling cephx
2023-01-11T08:11:57.856+0000 7f5987537700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2023-01-11T08:11:57.856+0000 7f5986535700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2023-01-11T08:12:00.861+0000 7f5986d36700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2023-01-11T08:12:00.861+0000 7f59dac7c200 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
failed to fetch mon config (--no-mon-config to skip)
4, delete the export:
ceph nfs export delete nfs4rgw /bucketexport
Ganesha servers go back normal:
rook-ceph-nfs-nfs1-a-679fdb795-82tcx 2/2 Running 0 4h30m
rook-ceph-nfs-nfs4rgw-a-5c594d67dc-nlr42 2/2 Running 10 4h33m
Any ideas to make it work?
Thanks
Ben
Hi All,
Got a funny one, which I'm hoping someone can help us with.
We've got three identical(?) Ceph Quincy Nodes running on Rocky Linux
8.7. Each Node has 4 OSDs, plus Monitor, Manager, and iSCSI G/W services
running on them (we're only a small shop). Each Node has a separate 16
GiB partition mounted as /var. Everything is running well and the Ceph
Cluster is handling things very well).
However, one of the Nodes (not the one currently acting as the Active
Manager) is running out of space on /var. Normally, all of the Nodes
have around 10% space used (via a df -H command), but the problem Node
only takes 1 to 3 days to run out of space, hence taking it out of
Quorum. Its currently at 85% and growing.
At first we thought this was caused by an overly large log file, but
investigations showed that all the logs on all 3 Nodes were of
comparable size. Also, searching for the 20 largest files on the problem
Node's /var didn't produce any significant results.
Coincidentally, unrelated to this issue, the problem Node (but not the
other 2 Nodes) was re-booted a couple of days ago and, when the Cluster
had re-balanced itself and everything was back online and reporting as
Healthy, the problem Node's /var was back down to around 10%, the same
as the other two Nodes.
This lead us to suspect that there was some sort of "run-away" process
or journaling/logging/temporary file(s) or whatever that the re-boot has
"cleaned up". So we've been keeping an eye on things but we can't see
anything causing the issue and now, as I said above, the problem Node's
/var is back up to 85% and growing.
I've been looking at the log files, tying to determine the issue, but as
I don't really know what I'm looking for I don't even know if I'm
looking in the *correct* log files...
Obviously rebooting the problem Node every couple of days is not a
viable option, and increasing the size of the /var partition is only
going to postpone the issue, not resolve it. So if anyone has any ideas
we'd love to hear about it - thanks
Cheers
Dulux-Oz
Hello there,
I'm running Ceph 15.2.17 (Octopus) on Debian Buster and I'm starting an
upgrade but I'm seeing a problem and I wanted to ask how best to proceed
in case I make things worse by mucking with it without asking experts.
I've moved an rbd image to the trash without clearing the snapshots
first, and then tried to 'trash purge'. This resulted in an error
because the image still has snapshots, but I'm unable to remove the
image from the pool to clear the snapshots either. At least one of these
images is from a clone of a snapshot from another trashed image, which
I'm already kicking myself for.
The contents of my trash:
# rbd trash ls
07afadac0ed69c nfsroot_pi08
240ae5a5eb3214 bigdisk
7fd5138848231e nfsroot_pi01
f33e1f5bad0952 bigdisk2
fcdeb1f96a6124 raspios-64bit-lite-manuallysetup-p1
fcdebd2237697a raspios-64bit-lite-manuallysetup-p2
fd51418d5c43da nfsroot_pi02
fd514a6b4d3441 nfsroot_pi03
fd515061816c70 nfsroot_pi04
fd51566859250b nfsroot_pi05
fd5162c5885d9c nfsroot_pi07
fd5171c27c36c2 nfsroot_pi09
fd51743cb8813c nfsroot_pi10
fd517ad3bc3c9d nfsroot_pi11
fd5183bfb1e588 nfsroot_pi12
This is the error I get trying to purge the trash:
# rbd trash purge
Removing images: 0% complete...failed.
rbd: some expired images could not be removed
Ensure that they are closed/unmapped, do not have snapshots (including
trashed snapshots with linked clones), are not in a group and were moved
to the trash successfully.
This is the error when I try and restore one of the trashed images:
# rbd trash restore nfsroot_pi08
rbd: error: image does not exist in trash
2023-01-11T12:28:52.982-0800 7f4b69a7c3c0 -1 librbd::api::Trash:
restore: error getting image id nfsroot_pi08 info from trash: (2) No
such file or directory
Trying to restore other images gives the same error.
These trash images are now taking up a significant portion of the
cluster space. One thought was to upgrade and see if that resolves the
problem, but I've shot myself in the foot doing that in the past without
confirming it would solve the problem, so I'm looking for a second
opinion on how best to clear these?
These are all Debian Buster systems, the kernel version of the host I'm
running these commands on is:
Linux zim 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1+deb10u1 (2020-04-27)
x86_64 GNU/Linux
I'm going to be upgrading that too but one step at a time.
The exact ceph version is:
ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
(stable)
This was installed from the ceph repos, not the debian repos, using
cephadm. If there's any additional details I can share please let me
know, any and all thoughts welcome! I've been googling and have found
folks with similar issues but nothing similar enough to feel helpful.
Thanks in advance, and thank you to any and everyone who contributes to
Ceph, it's awesome!