April 2020 - ceph-users - lists.ceph.io

by 陈旭

Hi guys, I deploy an efk cluster and use ceph as block storage in kubernetes, but RBD write iops sometimes becomes zero and last for a few minutes. I want to check logs about RBD so I add some config to ceph.conf and restart ceph. Here is my ceph.conf: [global] fsid = 53f4e1d5-32ce-4e9c-bf36-f6b54b009962 mon_initial_members = db-16-4-hzxs, db-16-5-hzxs, db-16-6-hzxs mon_host = 10.25.16.4,10.25.16.5,10.25.16.6 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd pool default size = 3 [client] debug rbd = 20 debug rbd mirror = 20 debug rbd replay = 20 log file = /var/log/ceph/client_rbd.log I can not get any logs in /var/log/ceph/client_rbd.log. I also try to execute 'ceph daemon osd.* config set debug_rbd 20’ and there is also no related logs in ceph-osd.log. How can I get useful logs about this question or How can I analyze this problem? Look forward to your reply. Thanks ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 声明：此邮件可能包含依图公司保密或特权信息，并且仅应发送至有权接收该邮件的收件人。如果您无权收取该邮件，您应当立即删除该邮件并通知发件人，您并被禁止传播、分发或复制此邮件以及附件。对于此邮件可能携带的病毒引起的任何损害，本公司不承担任何责任。此外，本公司不保证已正确和完整地传输此信息，也不接受任何延迟收件的赔偿责任。 ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Notice: This email may contain confidential or privileged information of Yitu and was sent solely to the intended recipients. If you are unauthorized to receive this email, you should delete the email and contact the sender immediately. Any unauthorized disclosing, distribution, or copying of this email and attachment thereto is prohibited. Yitu does not accept any liability for any loss caused by possibly viruses in this email. E-mail transmission cannot be guaranteed to be secure or error-free and Yitu is not responsible for any delayed transmission.

3 years, 10 months

2
1
0 0

Help! ceph-mon is blocked after shutting down and ip address changed

by occj＠qq.com

ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) os :CentOS Linux release 7.7.1908 (Core) single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used. ceph -s is blocked after shutting down the machine (192.168.0.104), then ip address changed to 192.168.1.6 I created the monmap with monmap tool and update the ceph.conf , hosts file and then start ceph-mon. and the ceph-mon log: ... 2019-12-11 08:57:45.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s 2019-12-11 08:57:50.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s 2019-12-11 08:57:55.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s 2019-12-11 08:58:00.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s 2019-12-11 08:58:05.172 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s 2019-12-11 08:58:10.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s 2019-12-11 08:58:15.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s 2019-12-11 08:58:20.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s 2019-12-11 08:58:25.174 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s ... I changed IP back to 192.168.0.104 yeasterday, but all the same. # cat /etc/ceph/ceph.conf [client.libvirt] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor [client.rgw.ceph-node1.rgw0] host = ceph-node1 keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-node1.rgw0/keyring log file = /var/log/ceph/ceph-rgw-ceph-node1.rgw0.log rgw frontends = beast endpoint=192.168.1.6:8080 rgw thread pool size = 512 # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] cluster network = 192.168.1.0/24 fsid = e384e8e6-94d5-4812-bfbb-d1b0468bdef5 mon host = [v2:192.168.1.6:3300,v1:192.168.1.6:6789] mon initial members = ceph-node1 osd crush chooseleaf type = 0 osd pool default crush rule = -1 public network = 192.168.1.0/24 [osd] osd memory target = 7870655146

3 years, 10 months

2
1
0 0

Radosgw PubSub Traffic

by Dustin Guerrero

Hey all, We’ve been running some benchmarks against Ceph which we deployed using the Rook operator in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some investigation we noticed a ton of pubsub traffic in the strace coming from the RGW pods like so: [pid 22561] sendmsg(77, {msg_name(0)=NULL, msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"..., 73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314}, {"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …> I’ve checked other OSDs and only a single OSD receives these messages. I suspect its creating a bottleneck. Does anyone have an idea on why these are being generated or how to stop them? The pubsub sync module doesn’t appear to be enabled, and our benchmark is doing simple gets/puts/deletes. We’re running Ceph 14.2.5 nautilus Thank you!

3 years, 10 months

2
2
0 0

OSD backups and recovery

by Ludek Navratil

HI all, what is the best approach for OSD backups and recovery? We use only Radosgw with S3 API and I need to backup the content of S3 buckets. Currently I sync s3 buckets to local filesystem and backup the content using Amanda. I believe that there must a better way to do this but I couldn't find it in docs. I know that one option is to setup an archive zone, but it requires an additional ceph cluster that needs to be maintained and looked after. I would rather avoid that. How can I backup an entire Ceph cluster? Or individual OSDs in the way that will allow me to recover the data correctly? Many thanks,Ludek

3 years, 10 months

4
5
0 0

General question CephFS or RBD

by Willi Schiegel

Hello All, I have a HW RAID based 240 TB data pool with about 200 million files for users in a scientific institution. Data sizes range from tiny parameter files for scientific calculations and experiments to huge images of brain scans. There are group directories, home directories, Windows roaming profile directories organized in ZFS pools on Solaris operating systems, exported via NFS and Samba to Linux, macOS, and Windows clients. I would like to switch to CephFS because of the flexibility and expandability but I cannot find any recommendations for which storage backend would be suitable for all the functionality we have. Since I like the features of ZFS like immediate snapshots of very large data pools, quotas for each file system within hierarchical data trees and dynamic expandability by simply adding new disks or disk images without manual resizing would it be a good idea to create RBD images, map them onto the file servers and create zpools on the mapped images? I know that ZFS best works with raw disks but maybe a RBD image is close enough to a raw disk? Or would CephFS be the way to go? Can there be multiple CephFS pools for the group data folders and for the user's home directory folders for example or do I have to have everything in one single file space? Maybe someone can share his or her field experience? Thank you very much. Best regards Willi

3 years, 10 months

2
1
0 0

Ceph and iSCSI

by Bobby

Hi all, I am new to Ceph. But I have a some good understanding of iSCSI protocol. I will dive into Ceph because it looks promising. I am particularly interested in Ceph-RBD. I have a request. Can you please tell me, if any, what are the common similarities between iSCSI and Ceph. If someone has to work on a common model for iSCSI and Ceph, what would be those significant points you would suggest to someone who has some understanding of iSCSI? Looking forward to answers. Thanks in advance :-) BR

3 years, 10 months

2
1
0 0

Very bad performance on a ceph rbd pool via iSCSI to VMware esx

by Salsa

I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seems I've filled up Vmware's ISCSI queue). Right now I'm getting write latencies from 20ms to 80 ms (per OSD) and sometimes peaking at 600 ms (per OSD). Client throughput is giving me around 4 MBs. Using a 4MB stripe 1 image I got 1.955..359 B/s inside the VM. On a 1MB stripe 1 I got 2.323.206 B/s inside the same VM. I think the performance is way too slow, much more than should be and that I can fix this by correcting some configuration. Any advices? -- Salsa Sent with [ProtonMail](https://protonmail.com) Secure Email.

3 years, 10 months

2
2
0 0

Ceph and Windows - experiences or suggestions

by Lars Täuber

Hi there! I got the task to connect a Windows client to our existing ceph cluster. I'm looking for experiences or suggestions from the community. There came two possibilities to my mind: 1. iSCSI Target on RBD exported to Windows 2. NFS-Ganesha on CephFS exported to Windows Is there a third way exporting a ceph cluster to a windows machine? I have some experiences with CephFS. We have a small cluster successfully running for linux clients. I don't have experiences with RBD or iSCSI. The Windows machine will use the space for backups. The kind of data is unknown. I expect the data to be a MS-SQL dump and user data from a sharepoint system. The Windows admin does not care whether NFS or iSCSI is used. I'd be happy if some of you could share experiences. Thanks Lars

3 years, 10 months

10
10
0 0

ceph with rdma can not mount with kernel

by 李亚锋

3 years, 10 months

2
1
0 0

Ceph on CentOS 8?

by Jan Kasprzak

Hello, Ceph users, does anybody use Ceph on recently released CentOS 8? Apparently there are no el8 packages neither at download.ceph.com, nor in the native CentOS package tree. I am thinking about upgrading my cluster to C8 (because of other software running on it apart from Ceph). Do el7 packages simply work? Can they be rebuilt using rpmbuild --rebuild? Or is running Ceph on C8 more complicated than that? Thanks, -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | sir_clive> I hope you don't mind if I steal some of your ideas? laryross> As far as stealing... we call it sharing here. --from rcgroups

3 years, 10 months

9
8
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2020