Hi all,
we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to advance oldest client/flush tid". I looked at the client and there was nothing going on, so I rebooted it. After the client was back, the message was still there. To clean this up I failed the MDS. Unfortunately, the MDS that took over is remained stuck in rejoin without doing anything. All that happened in the log was:
[root@ceph-10 ceph]# tail -f ceph-mds.ceph-10.log
2023-07-20T15:54:29.147+0200 7fedb9c9f700 1 mds.2.896604 rejoin_start
2023-07-20T15:54:29.161+0200 7fedb9c9f700 1 mds.2.896604 rejoin_joint_start
2023-07-20T15:55:28.005+0200 7fedb9c9f700 1 mds.ceph-10 Updating MDS map to version 896614 from mon.4
2023-07-20T15:56:00.278+0200 7fedb9c9f700 1 mds.ceph-10 Updating MDS map to version 896615 from mon.4
[...]
2023-07-20T16:02:54.935+0200 7fedb9c9f700 1 mds.ceph-10 Updating MDS map to version 896653 from mon.4
2023-07-20T16:03:07.276+0200 7fedb9c9f700 1 mds.ceph-10 Updating MDS map to version 896654 from mon.4
After some time I decided to give another fail a try and, this time, the replacement daemon went to active state really fast.
If I have a message like the above, what is the clean way of getting the client clean again (version: 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable))?
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
Since the 6.5 kernel addressed the issue with regards to regression in
the readahead handling code... we went ahead and installed this kernel
for a couple of mail / web clusters (Ubuntu 6.5.1-060501-generic
#202309020842 SMP PREEMPT_DYNAMIC Sat Sep 2 08:48:34 UTC 2023 x86_64
x86_64 x86_64 GNU/Linux). Since then we occasionally see the following
being logged by the kernel:
[Sun Sep 10 07:19:00 2023] workqueue: delayed_work [ceph] hogged CPU for
>10000us 4 times, consider switching to WQ_UNBOUND
[Sun Sep 10 08:41:24 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[Sun Sep 10 11:05:55 2023] workqueue: delayed_work [ceph] hogged CPU for
>10000us 8 times, consider switching to WQ_UNBOUND
[Sun Sep 10 12:54:38 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[Sun Sep 10 19:06:37 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 16 times, consider switching to WQ_UNBOUND
[Mon Sep 11 10:53:33 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 32 times, consider switching to WQ_UNBOUND
[Tue Sep 12 10:14:03 2023] workqueue: ceph_con_workfn [libceph] hogged
CPU for >10000us 64 times, consider switching to WQ_UNBOUND
[Tue Sep 12 11:14:33 2023] workqueue: ceph_cap_reclaim_work [ceph]
hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
We wonder if this is a new phenomenon, or that it's rather logged in the
new kernel and it was not before.
However, we have hit a few OOM situations since we switched to the new
kernel because of ceph_cap_reclaim_work events (OOM is because Apache
threads keep piling up as it cannot access CephFS). We then also see MDS
slow ops reported. This might be related to a backup job that is running
on a backup server. We did not observe this behavior on 5.12.19 kernel.
Ceph cluster is on 16.2.11 currently.
Anyone has some insight on this?
Thanks,
Stefan
Bringing up that topic again:
is it possible to log the bucket name in the rgw client logs?
currently I am only to know the bucket name when someone access the bucket
via https://TLD/bucket/object instead of https://bucket.TLD/object.
Am Di., 3. Jan. 2023 um 10:25 Uhr schrieb Boris Behrens <bb(a)kervyn.de>:
> Hi,
> I am looking forward to move our logs from
> /var/log/ceph/ceph-client...log to our logaggregator.
>
> Is there a way to have the bucket name in the log file?
>
> Or can I write the rgw_enable_ops_log into a file? Maybe I could work with
> this.
>
> Cheers and happy new year
> Boris
>
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
Hey everyone,
On 20/10/2022 10:12, Christian Rohmann wrote:
> 1) May I bring up again my remarks about the timing:
>
> On 19/10/2022 11:46, Christian Rohmann wrote:
>
>> I believe the upload of a new release to the repo prior to the
>> announcement happens quite regularly - it might just be due to the
>> technical process of releasing.
>> But I agree it would be nice to have a more "bit flip" approach to
>> new releases in the repo and not have the packages appear as updates
>> prior to the announcement and final release and update notes.
> By my observations sometimes there are packages available on the
> download servers via the "last stable" folders such as
> https://download.ceph.com/debian-quincy/ quite some time before the
> announcement of a release is out.
> I know it's hard to time this right with mirrors requiring some time
> to sync files, but would be nice to not see the packages or have
> people install them before there are the release notes and potential
> pointers to changes out.
Todays 16.2.11 release shows the exact issue I described above ....
1) 16.2.11 packages are already available via e.g.
https://download.ceph.com/debian-pacific
2) release notes not yet merged:
(https://github.com/ceph/ceph/pull/49839), thus
https://ceph.io/en/news/blog/2022/v16-2-11-pacific-released/ show a 404 :-)
3) No announcement like
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/QOCU563UD3…
to the ML yet.
Regards
Christian
Hi everyone
I'm new to CEPH, just a french 4 days training session with Octopus on
VMs that convince me to build my first cluster.
At this time I have 4 old identical nodes for testing with 3 HDDs each,
2 network interfaces and running Alma Linux8 (el8). I try to replay the
training session but it fails, breaking the web interface because of
some problems with podman 4.2 not compatible with Octopus.
So I try to deploy Pacific with cephadm tool on my first node (mostha1)
(to enable testing also an upgrade later).
dnf -y install
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noar…
monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }')
cephadm bootstrap --mon-ip $monip --initial-dashboard-password xxxxx \
--initial-dashboard-user admceph \
--allow-fqdn-hostname --cluster-network 10.1.0.0/16
This was sucessfull.
But running "*c**eph orch device ls*" do not show any HDD even if I have
/dev/sda (used by the OS), /dev/sdb and /dev/sdc
The web interface shows a row capacity which is an aggregate of the
sizes of the 3 HDDs for the node.
I've also tried to reset /dev/sdb but cephadm do not see it:
[ceph: root@mostha1 /]# ceph orch device zap
mostha1.legi.grenoble-inp.fr /dev/sdb --force
Error EINVAL: Device path '/dev/sdb' not found on host
'mostha1.legi.grenoble-inp.fr'
On my first attempt with octopus, I was able to list the available HDD
with this command line. Before moving to Pacific, the OS on this node
has been reinstalled from scratch.
Any advices for a CEPH beginner ?
Thanks
Patrick
Hello,
we removed an SSD cache tier and its pool.
The PGs for the pool do still exist.
The cluster is healthy.
The PGs are empty and they reside on the cache tier pool's SSDs.
We like to take out the disks but it is not possible. The cluster sees
the PGs and answers with a HEALTH_WARN.
Because of the replication of three there are still 128 PGs on three of
the 24 OSDs. We were able to remove the other OSDs.
Summary:
- pool removed
- 3 x 128 empty PGs still exist
- 3 of 24 OSDs still exist
How is it possible to remove these empty and healthy PGs?
The only way I found was something like:
ceph pg {pg-id} mark_unfound_lost delete
Is that the right way?
Some output of:
ceph pg ls-by-osd 23
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES*
OMAP_KEYS* LOG STATE SINCE VERSION REPORTED
UP ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP
3.0 0 0 0 0 0 0
0 0 active+clean 27h 0'0 2627265:196316
[15,6,23]p15 [15,6,23]p15 2023-09-28T12:41:52.982955+0200
2023-09-27T06:48:23.265838+0200
3.1 0 0 0 0 0 0
0 0 active+clean 9h 0'0 2627266:19330
[6,23,15]p6 [6,23,15]p6 2023-09-29T06:30:57.630016+0200
2023-09-27T22:58:21.992451+0200
3.2 0 0 0 0 0 0
0 0 active+clean 2h 0'0 2627265:1135185
[23,15,6]p23 [23,15,6]p23 2023-09-29T13:42:07.346658+0200
2023-09-24T14:31:52.844427+0200
3.3 0 0 0 0 0 0
0 0 active+clean 13h 0'0 2627266:193170
[6,15,23]p6 [6,15,23]p6 2023-09-29T01:56:54.517337+0200
2023-09-27T17:47:24.961279+0200
3.4 0 0 0 0 0 0
0 0 active+clean 14h 0'0 2627265:2343551
[23,6,15]p23 [23,6,15]p23 2023-09-29T00:47:47.548860+0200
2023-09-25T09:39:51.259304+0200
3.5 0 0 0 0 0 0
0 0 active+clean 2h 0'0 2627265:194111
[15,6,23]p15 [15,6,23]p15 2023-09-29T13:28:48.879959+0200
2023-09-26T15:35:44.217302+0200
3.6 0 0 0 0 0 0
0 0 active+clean 6h 0'0 2627265:2345717
[23,15,6]p23 [23,15,6]p23 2023-09-29T09:26:02.534825+0200
2023-09-27T21:56:57.500126+0200
Best regards,
Malte
Hi,
while writing a response to [1] I tried to convert an existing
directory within a single cephfs into a subvolume. According to [2]
that should be possible, I'm just wondering how to confirm that it
actually worked. Because setting the xattr works fine, the directory
just doesn't show up in the subvolume ls command. This is what I tried
(in Reef and Pacific):
# one "regular" subvolume already exists
$ ceph fs subvolume ls cephfs
[
{
"name": "subvol1"
}
]
# mounted / and created new subdir
$ mkdir /mnt/volumes/subvol2
$ setfattr -n ceph.dir.subvolume -v 1 /mnt/volumes/subvol2
# still only one subvolume
$ ceph fs subvolume ls cephfs
[
{
"name": "subvol1"
}
]
I also tried it directly underneath /mnt:
$ mkdir /mnt/subvol2
$ setfattr -n ceph.dir.subvolume -v 1 /mnt/subvol2
But still no subvolume2 available. What am I missing here?
Thanks
Eugen
[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/G4ZWGGUPPFQ…
[2] https://www.spinics.net/lists/ceph-users/msg72341.html
Hi,
I've seen this issue mentioned in the past, but with older releases. So
I'm wondering if anybody has any pointers.
The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all
clients are working fine, with the exception of our backup server. This
is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1]
(so I suspect a newer Ceph version?).
The backup server has multiple (12) CephFS mount points. One of them,
the busiest, regularly causes this error on the cluster:
HEALTH_WARN 1 clients failing to respond to capability release
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing to respond to capability release client_id: 521306112
And occasionally, which may be unrelated, but occurs at the same time:
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs
The second one clears itself, but the first sticks until I can unmount
the filesystem on the client after the backup completes.
It appears that whilst it's in this stuck state there may be one or more
directory trees that are inaccessible to all clients. The backup server
is walking the whole tree but never gets stuck itself, so either the
inaccessible directory entry is caused after it has gone past, or it's
not affected. Maybe the backup server is holding a directory when it
shouldn't?
It may be that an upgrade to Quincy resolves this, since it's more
likely to be inline with the kernel client version wise, but I don't
want to knee-jerk upgrade just to try and fix this problem.
Thanks for any advice.
Tim.
[1] The reason for the newer kernel is that the backup performance from
CephFS was terrible with older kernels. This newer kernel does at least
resolve that issue.
Hi,
on debian12, ceph-dashboard is throwing a warning
"Module 'dashboard' has failed dependency: PyO3 modules may only be
initialized once per interpreter process"
Seem to be related to pyo3 0.17 change
https://github.com/PyO3/pyo3/blob/7bdc504252a2f972ba3490c44249b202a4ce6180/…
"
Each #[pymodule] can now only be initialized once per process
To make PyO3 modules sound in the presence of Python sub-interpreters,
for now it has been necessary to explicitly disable the ability to
initialize a #[pymodule] more than once in the same process. Attempting
to do this will now raise an ImportError.
"