ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
os :CentOS Linux release 7.7.1908 (Core)
single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but only cephfs is used.
ceph -s is blocked after shutting down the machine (192.168.0.104), then ip address changed to 192.168.1.6
I created the monmap with monmap tool and update the ceph.conf , hosts file and then start ceph-mon.
and the ceph-mon log:
...
2019-12-11 08:57:45.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1285.14s
2019-12-11 08:57:50.170 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1290.14s
2019-12-11 08:57:55.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1295.14s
2019-12-11 08:58:00.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1300.14s
2019-12-11 08:58:05.172 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1305.14s
2019-12-11 08:58:10.171 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1310.14s
2019-12-11 08:58:15.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1315.14s
2019-12-11 08:58:20.173 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1320.14s
2019-12-11 08:58:25.174 7f952cdac700 1 mon.ceph-node1 at 0(leader).mds e34 no beacon from mds.0.10 (gid: 4384 addr: [v2:192.168.0.104:6898/4084823750,v1:192.168.0.104:6899/4084823750] state: up:active) since 1325.14s
...
I changed IP back to 192.168.0.104 yeasterday, but all the same.
# cat /etc/ceph/ceph.conf
[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and allowed by SELinux or AppArmor
[client.rgw.ceph-node1.rgw0]
host = ceph-node1
keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-node1.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-ceph-node1.rgw0.log
rgw frontends = beast endpoint=192.168.1.6:8080
rgw thread pool size = 512
# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
cluster network = 192.168.1.0/24
fsid = e384e8e6-94d5-4812-bfbb-d1b0468bdef5
mon host = [v2:192.168.1.6:3300,v1:192.168.1.6:6789]
mon initial members = ceph-node1
osd crush chooseleaf type = 0
osd pool default crush rule = -1
public network = 192.168.1.0/24
[osd]
osd memory target = 7870655146
Hey all,
We’ve been running some benchmarks against Ceph which we deployed using the Rook operator in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some investigation we noticed a ton of pubsub traffic in the strace coming from the RGW pods like so:
[pid 22561] sendmsg(77, {msg_name(0)=NULL, msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"..., 73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314}, {"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …>
I’ve checked other OSDs and only a single OSD receives these messages. I suspect its creating a bottleneck. Does anyone have an idea on why these are being generated or how to stop them? The pubsub sync module doesn’t appear to be enabled, and our benchmark is doing simple gets/puts/deletes.
We’re running Ceph 14.2.5 nautilus
Thank you!
HI all,
what is the best approach for OSD backups and recovery?
We use only Radosgw with S3 API and I need to backup the content of S3 buckets.
Currently I sync s3 buckets to local filesystem and backup the content using Amanda.
I believe that there must a better way to do this but I couldn't find it in docs.
I know that one option is to setup an archive zone, but it requires an additional ceph cluster that needs to be maintained and looked after. I would rather avoid that.
How can I backup an entire Ceph cluster? Or individual OSDs in the way that will allow me to recover the data correctly?
Many thanks,Ludek
Hello All,
I have a HW RAID based 240 TB data pool with about 200 million files for
users in a scientific institution. Data sizes range from tiny parameter
files for scientific calculations and experiments to huge images of
brain scans. There are group directories, home directories, Windows
roaming profile directories organized in ZFS pools on Solaris operating
systems, exported via NFS and Samba to Linux, macOS, and Windows clients.
I would like to switch to CephFS because of the flexibility and
expandability but I cannot find any recommendations for which storage
backend would be suitable for all the functionality we have.
Since I like the features of ZFS like immediate snapshots of very large
data pools, quotas for each file system within hierarchical data trees
and dynamic expandability by simply adding new disks or disk images
without manual resizing would it be a good idea to create RBD images,
map them onto the file servers and create zpools on the mapped images? I
know that ZFS best works with raw disks but maybe a RBD image is close
enough to a raw disk?
Or would CephFS be the way to go? Can there be multiple CephFS pools for
the group data folders and for the user's home directory folders for
example or do I have to have everything in one single file space?
Maybe someone can share his or her field experience?
Thank you very much.
Best regards
Willi
Hi all,
I am new to Ceph. But I have a some good understanding of iSCSI protocol. I
will dive into Ceph because it looks promising. I am particularly
interested in Ceph-RBD. I have a request. Can you please tell me, if any,
what are the common similarities between iSCSI and Ceph. If someone has to
work on a common model for iSCSI and Ceph, what would be those significant
points you would suggest to someone who has some understanding of iSCSI?
Looking forward to answers. Thanks in advance :-)
BR
I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seems I've filled up Vmware's ISCSI queue).
Right now I'm getting write latencies from 20ms to 80 ms (per OSD) and sometimes peaking at 600 ms (per OSD).
Client throughput is giving me around 4 MBs.
Using a 4MB stripe 1 image I got 1.955..359 B/s inside the VM.
On a 1MB stripe 1 I got 2.323.206 B/s inside the same VM.
I think the performance is way too slow, much more than should be and that I can fix this by correcting some configuration.
Any advices?
--
Salsa
Sent with [ProtonMail](https://protonmail.com) Secure Email.
Hi there!
I got the task to connect a Windows client to our existing ceph cluster.
I'm looking for experiences or suggestions from the community.
There came two possibilities to my mind:
1. iSCSI Target on RBD exported to Windows
2. NFS-Ganesha on CephFS exported to Windows
Is there a third way exporting a ceph cluster to a windows machine?
I have some experiences with CephFS. We have a small cluster successfully running for linux clients. I don't have experiences with RBD or iSCSI.
The Windows machine will use the space for backups. The kind of data is unknown. I expect the data to be a MS-SQL dump and user data from a sharepoint system. The Windows admin does not care whether NFS or iSCSI is used.
I'd be happy if some of you could share experiences.
Thanks
Lars
Hello, Ceph users,
does anybody use Ceph on recently released CentOS 8? Apparently there are
no el8 packages neither at download.ceph.com, nor in the native CentOS package
tree. I am thinking about upgrading my cluster to C8 (because of other
software running on it apart from Ceph). Do el7 packages simply work?
Can they be rebuilt using rpmbuild --rebuild? Or is running Ceph on
C8 more complicated than that?
Thanks,
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
laryross> As far as stealing... we call it sharing here. --from rcgroups
Hello Team,
We've integrated Ceph cluster storage with Kubernetes and provisioning
volumes through rbd-provisioner. When we're creating volumes from yaml
files in Kubernetes, pv > pvc > mounting to pod, In kubernetes end pvc are
showing as meaningful naming convention as per yaml file defined. But in
ceph cluster, rbd image name is creating with dynamic uid.
During troubleshooting time, this will be tedious to find exact rbd image.
Please find the provisioning logs in below pasted snippet.
kubectl get pods,pv,pvc
NAME READY STATUS RESTARTS AGE
pod/sleepypod 1/1 Running 0 4m9s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
AGE
persistentvolume/pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO Delete
Bound default/test-dyn-pvc ceph-rbd 4m9s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/test-dyn-pvc Bound
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO ceph-rbd 4m11s
*rbd-provisioner logs*
I1121 10:59:15.009012 1 provision.go:132] successfully created rbd image
"kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2" I1121
10:59:15.009092 1 controller.go:1087] provision "default/test-dyn-pvc"
class "ceph-rbd": volume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5"
provisioned I1121 10:59:15.009138 1 controller.go:1101] provision
"default/test-dyn-pvc" class "ceph-rbd": trying to save persistentvvolume
"pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" I1121 10:59:15.020418 1
controller.go:1108] provision "default/test-dyn-pvc" class "ceph-rbd":
persistentvolume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" saved I1121
10:59:15.020476 1 controller.go:1149] provision "default/test-dyn-pvc"
class "ceph-rbd": succeeded I1121 10:59:15.020802 1 event.go:221]
Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default",
Name:"test-dyn-pvc", UID:"cd37d2d6-cecc-4a05-9736-c8d80abde7f5",
APIVersion:"v1", ResourceVersion:"24545639", FieldPath:""}): type: 'Normal'
reason: 'ProvisioningSucceeded' Successfully provisioned volume
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5
*rbd image details in Ceph cluster end*
rbd -p kube ls --long
NAME SIZE PARENT FMT PROT LOCK
kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2 1 GiB 2
is there way to setup proper naming convention for rbd image as well during
kubernetes deployment itself.
Kubernetes version: v1.15.5
Ceph cluster version: 14.2.2 nautilus (stable)
*Best Regards,*
*Palanisamy*