Hi everyone,
I'm totally newbie with ceph, so sorry if I'm asking some stupid question.
I'm trying to understand how the crush map & rule work, my goal is to have
two groups of 3 servers, so I'm using “row” bucket
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 59.38367 root default
-15 59.38367 zone City
-17 29.69183 row primary
-3 9.89728 host server1
0 ssd 3.49309 osd.0 up 1.00000 1.00000
1 ssd 1.74660 osd.1 up 1.00000 1.00000
2 ssd 1.74660 osd.2 up 1.00000 1.00000
3 ssd 2.91100 osd.3 up 1.00000 1.00000
-5 9.89728 host server2
4 ssd 1.74660 osd.4 up 1.00000 1.00000
5 ssd 1.74660 osd.5 up 1.00000 1.00000
6 ssd 2.91100 osd.6 up 1.00000 1.00000
7 ssd 3.49309 osd.7 up 1.00000 1.00000
-7 9.89728 host server3
8 ssd 3.49309 osd.8 up 1.00000 1.00000
9 ssd 1.74660 osd.9 up 1.00000 1.00000
10 ssd 2.91100 osd.10 up 1.00000 1.00000
11 ssd 1.74660 osd.11 up 1.00000 1.00000
-19 29.69183 row secondary
-9 9.89728 host server4
12 ssd 1.74660 osd.12 up 1.00000 1.00000
13 ssd 1.74660 osd.13 up 1.00000 1.00000
14 ssd 3.49309 osd.14 up 1.00000 1.00000
15 ssd 2.91100 osd.15 up 1.00000 1.00000
-11 9.89728 host server5
16 ssd 1.74660 osd.16 up 1.00000 1.00000
17 ssd 1.74660 osd.17 up 1.00000 1.00000
18 ssd 3.49309 osd.18 up 1.00000 1.00000
19 ssd 2.91100 osd.19 up 1.00000 1.00000
-13 9.89728 host server6
20 ssd 1.74660 osd.20 up 1.00000 1.00000
21 ssd 1.74660 osd.21 up 1.00000 1.00000
22 ssd 2.91100 osd.22 up 1.00000 1.00000
and I want to create a some rules, first I like to have
a rule «replica» (over host) inside the «row» primary
a rule «erasure» (over host) inside the «row» primary
but also two crush rule between primary/secondary, meaning I like to have a
replica (with only 1 copy of course) of pool from “row” primary to
secondary.
How can I achieve that ?
Regards
--
Albert SHIH 🦫 🐸
mer. 08 nov. 2023 18:37:54 CET
Hello,
We are using a Ceph Pacific (16.2.10) cluster and enabled the balancer module, but the usage of some OSDs keeps growing and reached up to mon_osd_nearfull_ratio, which we use 85% by default, and we think the balancer module should do some balancer work.
So I checked our balancer configuration and found that our "crush_compat_metrics" is set to "pgs,objects,bytes", and this three values are used in src.pybind.mgr.balancer.module.Module.calc_eval. However, when doing the actual balance task, only the first key is used to do the auto balance, in src.pybind.mgr.balancer.module.Module.do_crush_compat:
metrics = self.get_module_option('crush_compat_metrics').split(',')
key = metrics[0] # balancing using the first score metric
My concern is, any reason why we calculate the balancing using the three items but only do the balance using the first one?
Thanks.
Hello all,
Here are the minutes from today's meeting.
- New time for CDM APAC to increase participation
- 9.30 - 11.30 pm PT seems like the most popular based on
https://doodle.com/meeting/participate/id/aM9XGZ3a/vote
- One more week for more feedback; please ask more APAC folks to suggest
their preferred times.
- [Ernesto] Revamp Ansible/Ceph-Ansible for non-containerized users?
- open nebula / proxmox
- solicit maintainers for ceph-ansible on the ML
- 18.2.1
- yuri: approval email sent out a few days ago; waiting on some approvals
- Blocker:
- https://tracker.ceph.com/issues/63391
- lab upgrades (Laura will help Yuri coordinate)
- Next Pacific release being worked on in background by Yuri.
- https://pad.ceph.com/p/pacific_16.2.15
- Try v16.2.15 milestone to help prune PRs
- https://github.com/ceph/ceph/milestone/17
- [Nizam] Ceph News Ticker - Ceph Dashboard
- Notify when new release is available (display changelogs)
- Display important ceph events
- CVEs, critical bug fixes
- Maybe newly added blog posts or informations regarding the upcoming
group meetings?
- User + Dev meeting next week
- Topics include migration between EC profiles and challenges related to
RGW zone replication
- Casey can attend end of meeting
- open nebula folks planning to do webinar; looking for speakers
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Ceph 3 clusters are running and the 3rd cluster gave an error, it is currently offline. I want to get all the remaining data in 2 clusters. Instead of fixing ceph, I just want to save the data. How can I access this data and connect to the pool? Can you help me?1 and 2 clusters are working. I want to view my data from them and then transfer them to another place. How can I do this? I have never used Ceph before.
In a production setup of 36 OSDs( SAS disks) totalling 180 TB allocated to a single Ceph Cluster with 3 monitors and 3 managers. There were 830 volumes and VMs created in Openstack with Ceph as a backend. On Sep 21, users reported slowness in accessing the VMs.
Analysing the logs lead us to problem with SAS , Network congestion and Ceph configuration( as all default values were used). We updated the Network from 1Gbps to 10Gbps for public and cluster networking. There was no change.
The ceph benchmark performance showed that 28 OSDs out of 36 OSDs reported very low IOPS of 30 to 50 while the remaining showed 300+ IOPS.
We gradually started reducing the load on the ceph cluster and now the volumes count is 650. Now the slow operations has gradually reduced but I am aware that this is not the solution.
Ceph configuration is updated with increasing the
osd_journal_size to 10 GB,
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
bluestore_cache_trim_max_skip_pinned=10000
After one month, now we faced another issue with Mgr daemon stopped in all 3 quorums and 16 OSDs went down. From the ceph-mon,ceph-mgr.log could not get the reason. Please guide me as its a production setup
Hi,
I used this but all returns "directory inode not in cache"
ceph tell mds.* dirfrag ls path
I would like to pin some subdirs to a rank after dynamic subtree
partitioning. Before that, I need to know where are they exactly
Thank you,
Ben
I configured a password for Grafana because I want to use Loki. I used the spec parameter initial_admin_password and this works fine for a staging environment, where I never tried to used Grafana with a password for Loki.
Using the username admin with the configured password gives a credentials error on environment where I tried to use Grafana with Loki in the past (with 17.2.6 of Ceph/cephadm). I changed the password in the past within Grafana, but how can I overwrite this now? Or is there a way to cleanup all Grafana files?
Best regards,
Sake
I have a bucket which got injected with bucket policy which locks the
bucket even to the bucket owner. The bucket now cannot be accessed (even
get its info or delete bucket policy does not work) I have looked in the
radosgw-admin command for a way to delete a bucket policy but do not see
anything. I presume I will need to somehow remove the bucket policy from
however it is stored in the bucket metadata / omap etc. If anyone can point
me in the right direction on that I would appreciate it. Thanks
Respectfully,
*Wes Dillingham*
wes(a)wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
We have been seeing some odd behavior with scrubbing (very slow) and OSD
warnings on a couple of new clusters. A bit of research turned up this:
https://www.reddit.com/r/truenas/comments/p1ebnf/seagate_exos_load_cyclingi…
We've installed the tool from https://github.com/Seagate/openSeaChest and
disabled EPC power features similar to:
openSeaChest_PowerControl --scan|grep ST|awk '{print $2}'|xargs -I {}
openSeaChest_PowerControl -d {} --EPCfeature disable
Things seem to be better now on those two clusters. Has anyone seen
anything similar? This would seem to be a huge issue if all defaults on
Exos are wrong (stop-and-go on all Ceph/ZFS workloads).
--
Best regards,
Alex Gorbachev
--
Intelligent Systems Services Inc.
http://www.iss-integration.comhttps://www.linkedin.com/in/alex-gorbachev-iss/