Hello folks!
I'm designing a new Ceph storage from scratch and I want to increase CephFS
speed and decrease latency.
Usually I always build (WAL+DB on NVME with Sas-Sata SSD's) and I deploy
MDS and MON's on the same servers.
This time a weird idea came to my mind and I think it has great potential
and will perform better on paper with my limited knowledge.
I have 5 racks and the 3nd "middle" rack is my storage and management rack.
- At RACK-3 I'm gonna locate 8x 1u OSD server (Spec: 2x E5-2690V4, 256GB,
4x 25G, 2x 1.6TB PCI-E NVME "MZ-PLK3T20", 8x 4TB SATA SSD)
- My Cephfs kernel clients are 40x GPU nodes located at RACK1,2,4,5
With my current workflow, all the clients;
1- visit the rack data switch
2- jump to main VPC switch via 2x100G,
3- talk with MDS servers,
4- Go back to the client with the answer,
5- To access data follow the same HOP's and visit the OSD's everytime.
If I deploy separate metadata pool by using 4x MDS server at top of
RACK-1,2,4,5 (Spec: 2x E5-2690V4, 128GB, 2x 10G(Public), 2x 25G (cluster),
2x 960GB U.2 NVME "MZ-PLK3T20")
Then all the clients will make the request directly in-rack 1 HOP away MDS
servers and if the request is only metadata, then the MDS node doesn't need
to redirect the request to OSD nodes.
Also by locating MDS servers with seperated metadata pool across all the
racks will reduce the high load on main VPC switch at RACK-3
If I'm not missing anything then only Recovery workload will suffer with
this topology.
What do you think?
I deployed a Ceph reef cluster using cephadm. When it comes to the ceph.conf file, which file should I be editing for making changes to the cluster - the one running under the docker container or the local one on the Ceph monitors?
-- Michael
This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments.
Team,
Guys,
We were facing cephFs volume mount issue and ceph status it was showing
mds slow requests
Mds behind on trimming
After restarting mds pods it was resolved
But wanted to know Root caus of this
It was started after 2 hours of one of the active mds was crashed
So does that an active mds crash can cause this issue ?
Please provide your inputs anyone
What exactly does the osd pool repair function do?
Documentation is not clear.
Kind regards,
AP
This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
Hi ceph-users,
I currently use Ceph Octopus to provide CephFS & S3 Storage for our app servers, deployed in containers by ceph-ansible. I'm planning to take an upgrade to get off Ceph Octopus as it's EOL.
I'd love to go straight to reef, but vaguely remember reading a statement that only two major versions can be taken on upgrade. I've failed to find that statement again.
Is it possible to go directly from Octopus straight to Reef?
I think a sensible approach here is to first migrate our existing deployments to use cephadm, and then use cephadm to upgrade. Any advice on this very welcome.
Many thanks,
Alex
Hello everybody,
Suddenly faced with a problem with (probably) authorization playing with cephx.
So, long story short:
1) Rollout completely new testing cluster by cephadm with only one node
2) According to docs I've set this to /etc/ceph/ceph.conf
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
3) restart ceph.target
4) now even "ceph -s " cannot connect to RADOS saying
root@ceph1:/etc/ceph# ceph -s
2024-02-24T18:15:59.219+0000 7f7c10d65700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2024-02-24T18:15:59.219+0000 7f7c11d67700 0 librados: client.admin authentication error (13) Permission denied
[errno 13] RADOS permission denied (error connecting to the cluster)
4) I have ceph.client.admin.keyring in both /etc/ceph and /var/lib/ceph/$fsid/config
5) logs of monitor doesnt show any error. It looks like it keeps normal living and even doesn't know that something goes wrong
6) Tried to set back /etc/ceph/ceph.conf to
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
with no success
7) I have noted that some process (I guess it is one of processes in containers?) always rewrite /etc/ceph/ceph.cong and /var/lib/ceph/$fsid/config/ceph.conf whatever I woud write there. What is the process? Who is it? How to set up settings if I want to keep it in the file??
Ubuntu 20.04, Reef, 18.0.2
Thanks in advance.
Have just upgraded a cluster from 17.2.7 to 18.2.1
Everything is working as expected apart from the amount of scrubs & deep scrubs is bouncing all over the place every second.
I have the value set to 1 per OSD but currently the cluster reckons one minute it’s doing 60+ scrubs, and then second this will drop to 40, then back to 70.
If I check the ceph live log’s I can see every second it’s reporting multiple PG’s starting either a scrub or deep scrub, it does not look like these are actually running as isn’t having a negative effect on the cluster’s performance.
Is this something to be expected off the back of the upgrade and should sort it self out?
A sample of the logs:
2024-02-24T00:41:20.055401+0000 osd.54 (osd.54) 3160 : cluster 0 12.9a deep-scrub starts
2024-02-24T00:41:19.658144+0000 osd.41 (osd.41) 4103 : cluster 0 12.cd deep-scrub starts
2024-02-24T00:41:19.823910+0000 osd.33 (osd.33) 5625 : cluster 0 12.ae deep-scrub starts
2024-02-24T00:41:19.846736+0000 osd.65 (osd.65) 3947 : cluster 0 12.53 deep-scrub starts
2024-02-24T00:41:20.007331+0000 osd.20 (osd.20) 7214 : cluster 0 12.142 scrub starts
2024-02-24T00:41:20.114748+0000 osd.10 (osd.10) 6538 : cluster 0 12.2c deep-scrub starts
2024-02-24T00:41:20.247205+0000 osd.36 (osd.36) 4789 : cluster 0 12.16f deep-scrub starts
2024-02-24T00:41:20.908051+0000 osd.68 (osd.68) 3869 : cluster 0 12.d7 deep-scrub starts
Hi,
I got one cephfs with one volume and subvolumes with a erasure coding.
If I don't set any quota when I run df on the client I got
0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo 5,8P 78T 5,8P 2% /vo
The 78T seem to be the size use by ceph on disk (on the hardware I mean). And I find that very good.
But If I set a quota
setfattr -n ceph.quota.max_bytes -v 109951162777600 vo
then on the same client I got
0ccbc438-d109-4c5f-b47b-70f8df707c2c/vo 100T 51T 50T 51% /vo
and that are the size of the data (I using erasure 4/2 so 51*1.5 ~ 77 To)
Is they are any way to keep the first answer ?
Regards
--
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
jeu. 22 févr. 2024 08:44:17 CET