Hi,
We are running a ceph cluster on Ubuntu 18.04 machines with ceph 14.2.4.
Our cephfs clients are using the kernel module and we have noticed that
some of them are sometimes (at least once) hanging after an MDS restart.
The only way to resolve this is to unmount and remount the mountpoint,
or reboot the machine if unmounting is not possible.
After some investigation, the problem seems to be that the MDS denies
reconnect attempts from some clients during restart even though the
reconnect interval is not yet reached. In particular, I see the following
log entries. Note that there are supposedly 9 sessions. 9 clients
reconnect (one client has two mountpoints) and then two more clients
reconnect after the MDS already logged "reconnect_done". These two
clients were hanging after the event. The kernel log of one of them is
shown below too.
Running `ceph tell mds.0 client ls` after the clients have been
rebooted/remounted also shows 11 clients instead of 9.
Do you have any ideas what is wrong here and how it could be fixed? I'm
guessing that the issue is that the MDS apparently has an incorrect
session count and stops the reconnect process to soon. Is this indeed a
bug and if so, do you know what is broken?
Regardless, I also think that the kernel should be able to deal with a
denied reconnect and that it should try again later. Yet, even after
10 minutes, the kernel does not attempt to reconnect. Is this a known
issue or maybe fixed in newer kernels? If not, is there a chance to get
this fixed?
Thanks,
Florian
MDS log:
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.server reconnect_clients -- 9 sessions
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24197043 v1:10.1.4.203:0/990008521 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.30487144 v1:10.1.4.146:0/483747473 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21019865 v1:10.1.7.22:0/3752632657 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21020717 v1:10.1.7.115:0/2841046616 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24171153 v1:10.1.7.243:0/1127767158 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.23978093 v1:10.1.4.71:0/824226283 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24209569 v1:10.1.4.157:0/1271865906 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190930 v1:10.1.4.240:0/3195698606 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190912 v1:10.1.4.146:0/852604154 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.59 reconnect_done
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.24167394 v1:10.1.67.49:0/1483641729 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1087700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.49:0/1483641729 conn(0x55af50053f80 0x55af50140800 :6801 s=OPENED pgs=21 cs=1 l=0).fault server, going to standby
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.30586072 v1:10.1.67.140:0/3664284158 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1888700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.140:0/3664284158 conn(0x55af50055600 0x55af50143000 :6801 s=OPENED pgs=8 cs=1 l=0).fault server, going to standby
Hanging client (10.1.67.49) kernel log:
> 2019-09-26T16:08:27.481676+02:00 hostnamefoo kernel: [708596.227148] ceph: mds0 reconnect start
> 2019-09-26T16:08:27.488943+02:00 hostnamefoo kernel: [708596.233145] ceph: mds0 reconnect denied
> 2019-09-26T16:16:17.541041+02:00 hostnamefoo kernel: [709066.287601] libceph: mds0 10.1.4.203:6801 socket closed (con state NEGOTIATING)
> 2019-09-26T16:16:18.068934+02:00 hostnamefoo kernel: [709066.813064] ceph: mds0 rejected session
> 2019-09-26T16:16:18.068955+02:00 hostnamefoo kernel: [709066.814843] ceph: get_quota_realm: ino (10000000008.fffffffffffffffe) null i_snap_realm
Dear Cephers,
we are currently mounting CephFS with relatime, using the FUSE client (version 13.2.6):
ceph-fuse on /cephfs type fuse.ceph-fuse (rw,relatime,user_id=0,group_id=0,allow_other)
For the first time, I wanted to use atime to identify old unused data. My expectation with "relatime" was that the access time stamp would be updated less often, for example,
only if the last file access was >24 hours ago. However, that does not seem to be the case:
----------------------------------------------
$ stat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root
...
Access: 2019-04-10 15:50:04.975959159 +0200
Modify: 2019-04-10 15:50:05.651613843 +0200
Change: 2019-04-10 15:50:06.141006962 +0200
...
$ cat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root > /dev/null
$ sync
$ stat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root
...
Access: 2019-04-10 15:50:04.975959159 +0200
Modify: 2019-04-10 15:50:05.651613843 +0200
Change: 2019-04-10 15:50:06.141006962 +0200
...
----------------------------------------------
I also tried this via an nfs-ganesha mount, and via a ceph-fuse mount with admin caps,
but atime never changes.
Is atime really never updated with CephFS, or is this configurable?
Something as coarse as "update at maximum once per day only" would be perfectly fine for the use case.
Cheers,
Oliver
Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
interconnected via Infiniband 40.
Problem is that the ceph performance is quite bad (approx. 30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
have a faster ceph storage and also some storage extension.
The question is now which SSDs I should use. If I understand it right,
not every SSD is suitable for ceph, as is denoted at the links below:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-i…
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
fast SSD for ceph. As the 950 is not available anymore, I ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
Before equipping all nodes with these SSDs, I did some tests with "fio"
as recommended, e.g. like this:
fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
The results are as the following:
-----------------------
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
-----------------------
So the read speed is impressive, but the write speed is really bad.
Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
chips (MLC instead of TLC). The results are, however even worse for writing:
-----------------------
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
-----------------------
I did some research and found out, that the "--sync" flag sets the flag
"O_DSYNC" which seems to disable the SSD cache which leads to these
horrid write speeds.
It seems that this relates to the fact that the write cache is only not
disabled for SSDs which implement some kind of battery buffer that
guarantees a data flush to the flash in case of a powerloss.
However, It seems impossible to find out which SSDs do have this
powerloss protection, moreover, these enterprise SSDs are crazy
expensive compared to the SSDs above - moreover it's unclear if
powerloss protection is even available in the NVMe form factor. So
building a 1 or 2 TB cluster seems not really affordable/viable.
So, can please anyone give me hints what to do? Is it possible to ensure
that the write cache is not disabled in some way (my server is situated
in a data center, so there will probably never be loss of power).
Or is the link above already outdated as newer ceph releases somehow
deal with this problem? Or maybe a later Debian release (10) will handle
the O_DSYNC flag differently?
Perhaps I should simply invest in faster (and bigger) harddisks and
forget the SSD-cluster idea?
Thank you in advance for any help,
Best Regards,
Hermann
--
hermann(a)qwer.tk
PGP/GPG: 299893C7 (on keyservers)
Hello everybody,
Can somebody add support for Debian buster and ceph-deploy:
https://tracker.ceph.com/issues/42870
Highly appreciated,
Regards,
Jelle de Jong
Hi guys, I deploy an efk cluster and use ceph as block storage in kubernetes, but RBD write iops sometimes becomes zero and last for a few minutes. I want to check logs about RBD so I add some config to ceph.conf and restart ceph.
Here is my ceph.conf:
[global]
fsid = 53f4e1d5-32ce-4e9c-bf36-f6b54b009962
mon_initial_members = db-16-4-hzxs, db-16-5-hzxs, db-16-6-hzxs
mon_host = 10.25.16.4,10.25.16.5,10.25.16.6
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3
[client]
debug rbd = 20
debug rbd mirror = 20
debug rbd replay = 20
log file = /var/log/ceph/client_rbd.log
I can not get any logs in /var/log/ceph/client_rbd.log. I also try to execute 'ceph daemon osd.* config set debug_rbd 20’ and there is also no related logs in ceph-osd.log.
How can I get useful logs about this question or How can I analyze this problem? Look forward to your reply.
Thanks
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 声明:此邮件可能包含依图公司保密或特权信息,并且仅应发送至有权接收该邮件的收件人。如果您无权收取该邮件,您应当立即删除该邮件并通知发件人,您并被禁止传播、分发或复制此邮件以及附件。对于此邮件可能携带的病毒引起的任何损害,本公司不承担任何责任。此外,本公司不保证已正确和完整地传输此信息,也不接受任何延迟收件的赔偿责任。 ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Notice: This email may contain confidential or privileged information of Yitu and was sent solely to the intended recipients. If you are unauthorized to receive this email, you should delete the email and contact the sender immediately. Any unauthorized disclosing, distribution, or copying of this email and attachment thereto is prohibited. Yitu does not accept any liability for any loss caused by possibly viruses in this email. E-mail transmission cannot be guaranteed to be secure or error-free and Yitu is not responsible for any delayed transmission.
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
os :CentOS Linux release 7.7.1908 (Core)
single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used.
ceph -s is blocked after shutting down the machine (192.168.0.104), then ip address changed to 192.168.1.6
I created the monmap with monmap tool and update the ceph.conf , hosts file and then start ceph-mon.
and the ceph-mon log:
...
2019-12-11 08:57:45.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s
2019-12-11 08:57:50.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s
2019-12-11 08:57:55.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s
2019-12-11 08:58:00.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s
2019-12-11 08:58:05.172 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s
2019-12-11 08:58:10.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s
2019-12-11 08:58:15.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s
2019-12-11 08:58:20.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s
2019-12-11 08:58:25.174 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s
...
I changed IP back to 192.168.0.104 yeasterday, but all the same.
# cat /etc/ceph/ceph.conf
[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor
[client.rgw.ceph-node1.rgw0]
host = ceph-node1
keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-node1.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-ceph-node1.rgw0.log
rgw frontends = beast endpoint=192.168.1.6:8080
rgw thread pool size = 512
# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
cluster network = 192.168.1.0/24
fsid = e384e8e6-94d5-4812-bfbb-d1b0468bdef5
mon host = [v2:192.168.1.6:3300,v1:192.168.1.6:6789]
mon initial members = ceph-node1
osd crush chooseleaf type = 0
osd pool default crush rule = -1
public network = 192.168.1.0/24
[osd]
osd memory target = 7870655146
Hello, Ceph users,
does anybody use Ceph on recently released CentOS 8? Apparently there are
no el8 packages neither at download.ceph.com, nor in the native CentOS package
tree. I am thinking about upgrading my cluster to C8 (because of other
software running on it apart from Ceph). Do el7 packages simply work?
Can they be rebuilt using rpmbuild --rebuild? Or is running Ceph on
C8 more complicated than that?
Thanks,
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
laryross> As far as stealing... we call it sharing here. --from rcgroups
Hello Team,
We've integrated Ceph cluster storage with Kubernetes and provisioning
volumes through rbd-provisioner. When we're creating volumes from yaml
files in Kubernetes, pv > pvc > mounting to pod, In kubernetes end pvc are
showing as meaningful naming convention as per yaml file defined. But in
ceph cluster, rbd image name is creating with dynamic uid.
During troubleshooting time, this will be tedious to find exact rbd image.
Please find the provisioning logs in below pasted snippet.
kubectl get pods,pv,pvc
NAME READY STATUS RESTARTS AGE
pod/sleepypod 1/1 Running 0 4m9s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
AGE
persistentvolume/pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO Delete
Bound default/test-dyn-pvc ceph-rbd 4m9s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/test-dyn-pvc Bound
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO ceph-rbd 4m11s
*rbd-provisioner logs*
I1121 10:59:15.009012 1 provision.go:132] successfully created rbd image
"kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2" I1121
10:59:15.009092 1 controller.go:1087] provision "default/test-dyn-pvc"
class "ceph-rbd": volume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5"
provisioned I1121 10:59:15.009138 1 controller.go:1101] provision
"default/test-dyn-pvc" class "ceph-rbd": trying to save persistentvvolume
"pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" I1121 10:59:15.020418 1
controller.go:1108] provision "default/test-dyn-pvc" class "ceph-rbd":
persistentvolume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" saved I1121
10:59:15.020476 1 controller.go:1149] provision "default/test-dyn-pvc"
class "ceph-rbd": succeeded I1121 10:59:15.020802 1 event.go:221]
Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default",
Name:"test-dyn-pvc", UID:"cd37d2d6-cecc-4a05-9736-c8d80abde7f5",
APIVersion:"v1", ResourceVersion:"24545639", FieldPath:""}): type: 'Normal'
reason: 'ProvisioningSucceeded' Successfully provisioned volume
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5
*rbd image details in Ceph cluster end*
rbd -p kube ls --long
NAME SIZE PARENT FMT PROT LOCK
kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2 1 GiB 2
is there way to setup proper naming convention for rbd image as well during
kubernetes deployment itself.
Kubernetes version: v1.15.5
Ceph cluster version: 14.2.2 nautilus (stable)
*Best Regards,*
*Palanisamy*
hi,every one,
my ceph version 12.2.12,I want to set require min compat client
luminous,I use command
#ceph osd set-require-min-compat-client luminous
but ceph report:Error EPERM: cannot set require_min_compat_client to
luminous: 4 connected client(s) look like jewel (missing
0xa00000000200000); add --yes-i-really-mean-it to do it anyway
[root@node-1 ~]# ceph features
{
"mon": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 3
}
},
"osd": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 15
}
},
"client": {
"group": {
"features": "0x40106b84a842a52",
"release": "jewel",
"num": 4
},
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 168
}
}
}
so,I run command:
[root@node-1 gyt]# ceph osd set-require-min-compat-client luminous
--yes-i-really-mean-it
set require_min_compat_client to luminous
but now,I want to set require min compat client jewel,I use command:
[root@node-1 gyt]# ceph osd set-require-min-compat-client jewel
Error EPERM: osdmap current utilizes features that require luminous;
cannot set require_min_compat_client below that to jewel
what‘s the way we are set luminous chang to jewel?