Hi Team,
We have a ceph cluster with 3 storage nodes:
1. storagenode1 - abcd:abcd:abcd::21
2. storagenode2 - abcd:abcd:abcd::22
3. storagenode3 - abcd:abcd:abcd::23
The requirement is to mount ceph using the domain name of MON node:
Note: we resolved the domain name via DNS server.
For this we are using the command:
```
mount -t ceph [storagenode.storage.com]:6789:/ /backup -o
name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
```
We are getting the following logs in /var/log/messages:
```
Jan 24 17:23:17 localhost kernel: libceph: resolve 'storagenode.storage.com'
(ret=-3): failed
Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
storagenode.storage.com:6789'
```
We also tried mounting ceph storage using IP of MON which is working fine.
Query:
Could you please help us out with how we can mount ceph using FQDN.
My /etc/ceph/ceph.conf is as follows:
[global]
ms bind ipv6 = true
ms bind ipv4 = false
mon initial members = storagenode1,storagenode2,storagenode3
osd pool default crush rule = -1
fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
mon host =
[v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]
public network = abcd:abcd:abcd::/64
cluster network = eff0:eff0:eff0::/64
[osd]
osd memory target = 4294967296
[client.rgw.storagenode1.rgw0]
host = storagenode1
keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
rgw thread pool size = 512
--
~ Lokendra
skype: lokendrarathour
Hello,
What's the status with the *-stable-* tags?
https://quay.io/repository/ceph/daemon?tab=tags
No longer build/support?
What should we use until we'll migrate from ceph-ansible to cephadm?
Thanks.
--
Jonas
Details of this release are summarized here:
https://tracker.ceph.com/issues/59070#note-1
Release Notes - TBD
The reruns were in the queue for 4 days because of some slowness issues.
The core team (Neha, Radek, Laura, and others) are trying to narrow
down the root cause.
Seeking approvals/reviews for:
rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
the core)
rgw - Casey
fs - Venky (the fs suite has an unusually high amount of failed jobs,
any reason to suspect it in the observed slowness?)
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade/octopus-x - Laura is looking into failures
upgrade/pacific-x - Laura is looking into failures
upgrade/quincy-p2p - Laura is looking into failures
client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
is looking into it
powercycle - Brad
ceph-volume - needs a rerun on merged
https://github.com/ceph/ceph-ansible/pull/7409
Please reply to this email with approval and/or trackers of known
issues/PRs to address them.
Also, share any findings or hypnosis about the slowness in the
execution of the suite.
Josh, Neha - gibba and LRC upgrades pending major suites approvals.
RC release - pending major suites approvals.
Thx
YuriW
Hello,
After a successful upgrade of a Ceph cluster from 16.2.7 to 16.2.11, I needed to downgrade it back to 16.2.7 as I found an issue with the new version.
I expected that running the downgrade with:`ceph orch upgrade start --ceph-version 16.2.7` should have worked fine. However, it blocked right after the downgrade of the first MGR daemon. In fact, the downgraded daemon is not able to use the cephadm module anymore. Any `ceph orch` command fails with the following error:
```
$ ceph orch ps
Error ENOENT: Module not found
```
And the downgrade process is therefore blocked.
These are the logs of the MGR when issuing the command:
```
Mar 28 12:13:15 astano03 ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 2023-03-28T10:13:15.557+0000 7f828fe8c700 0 log_channel(audit) log [DBG] : from='client.3136173 -' entity='client.admin' cmd=[{"prefix": "orch ps", "target": ["mon-mgr", ""]}]: dispatch
Mar 28 12:13:15 astano03 ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 2023-03-28T10:13:15.558+0000 7f829068d700 0 [orchestrator DEBUG root] _oremote orchestrator -> cephadm.list_daemons(*(None, None), **{'daemon_id': None, 'host': None, 'refresh': False})
Mar 28 12:13:15 astano03 ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 2023-03-28T10:13:15.558+0000 7f829068d700 -1 no module 'cephadm'
Mar 28 12:13:15 astano03 ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 2023-03-28T10:13:15.558+0000 7f829068d700 0 [orchestrator DEBUG root] _oremote orchestrator -> cephadm.get_feature_set(*(), **{})
Mar 28 12:13:15 astano03 ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 2023-03-28T10:13:15.558+0000 7f829068d700 -1 no module 'cephadm'
Mar 28 12:13:15 astano03 ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]: debug 2023-03-28T10:13:15.558+0000 7f829068d700 -1 mgr.server reply reply (2) No such file or directory Module not found
```
Other interesting MGR logs are:
```
2023-03-28T11:05:59.519+0000 7fcd16314700 4 mgr get_store get_store key: mgr/cephadm/upgrade_state
2023-03-28T11:05:59.519+0000 7fcd16314700 -1 mgr load Failed to construct class in 'cephadm'
2023-03-28T11:05:59.519+0000 7fcd16314700 -1 mgr load Traceback (most recent call last):
e "/usr/share/ceph/mgr/cephadm/module.py", line 450, in __init__
elf.upgrade = CephadmUpgrade(self)
e "/usr/share/ceph/mgr/cephadm/upgrade.py", line 111, in __init__
elf.upgrade_state: Optional[UpgradeState] = UpgradeState.from_json(json.loads(t))
e "/usr/share/ceph/mgr/cephadm/upgrade.py", line 92, in from_json
eturn cls(**c)
rror: __init__() got an unexpected keyword argument 'daemon_types'
2023-03-28T11:05:59.521+0000 7fcd16314700 -1 mgr operator() Failed to run module in active mode ('cephadm')
```
Which seem to relate to the new feature of staggered upgrades.
Please note that before, everything was working fine with version 16.2.7.
I am currently stuck in this situation with only one MGR daemon on version 16.2.11 which is the only one still working fine:
```
[root@astano01 ~]# ceph orch ps | grep mgr
mgr.astano02.mzmewn astano02 *:8443,9283 running (5d) 43s ago 2y 455M - 16.2.11 7a63bce27215 e2d7806acf16
mgr.astano03.qtzccn astano03 *:8443,9283 running (3m) 22s ago 95m 383M - 16.2.7 463ec4b1fdc0 cc0d88864fa1
```
Does anyone already faced this issue or knows how can I make the 16.2.7 MGR load the cephadm module correctly?
Thanks in advance for any help!
Dear all;
Up until a few hours ago, I had a seemingly normally-behaving cluster
(Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6
nodes. The cluster is only used for CephFS and the only non-standard
configuration I can think of is that I had 2 active MDSs, but only 1
standby. I had also doubled mds_cache_memory limit to 8 GB (all OSD
hosts have 256 G of RAM) at some point in the past.
Then I rebooted one of the OSD nodes. The rebooted node held one of the
active MDSs. Now the node is back up: ceph -s says the cluster is
healthy, but all PGs are in a active+clean+remapped state and 166.67% of
the objects are misplaced (dashboard: -66.66% healthy).
The data pool is a threefold replica with 5.4M object, the number of
misplaced objects is reported as 27087410/16252446. The denominator in
the ratio makes sense to me (16.2M / 3 = 5.4M), but the numerator does
not. I also note that the ratio is *exactly* 5 / 3. The filesystem is
still mounted and appears to be usable, but df reports it as 100% full;
I suspect it would say 167% but that is capped somewhere.
Any ideas about what is going on? Any suggestions for recovery?
// Best wishes; Johan
We have an internal use case where we back the storage of a proprietary
database by a shared file system. We noticed something very odd when
testing some workload with a local block device backed file system vs
cephfs. We noticed that the amount of network IO done by cephfs is almost
double compared to the IO done in case of a local file system backed by an
attached block device.
We also noticed that CephFS thrashes through the page cache very quickly
compared to the amount of data being read and think that the two issues
might be related. So, I wrote a simple test.
1. I wrote 10k files 400KB each using dd (approx 4 GB data).
2. I dropped the page cache completely.
3. I then read these files serially, again using dd. The page cache usage
shot up to 39 GB for reading such a small amount of data.
Following is the code used to repro this in bash:
for i in $(seq 1 10000); do
dd if=/dev/zero of=test_${i} bs=4k count=100
done
sync; echo 1 > /proc/sys/vm/drop_caches
for i in $(seq 1 10000); do
dd if=test_${i} of=/dev/null bs=4k count=100
done
The ceph version being used is:
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus
(stable)
The ceph configs being overriden:
WHO MASK LEVEL OPTION VALUE
RO
mon advanced auth_allow_insecure_global_id_reclaim false
mgr advanced mgr/balancer/mode upmap
mgr advanced mgr/dashboard/server_addr 127.0.0.1
*
mgr advanced mgr/dashboard/server_port 8443
*
mgr advanced mgr/dashboard/ssl false
*
mgr advanced mgr/prometheus/server_addr 0.0.0.0
*
mgr advanced mgr/prometheus/server_port 9283
*
osd advanced bluestore_compression_algorithm lz4
osd advanced bluestore_compression_mode aggressive
osd advanced bluestore_throttle_bytes 536870912
osd advanced osd_max_backfills 3
osd advanced osd_op_num_threads_per_shard_ssd 8
*
osd advanced osd_scrub_auto_repair true
mds advanced client_oc false
mds advanced client_readahead_max_bytes 4096
mds advanced client_readahead_max_periods 1
mds advanced client_readahead_min 0
mds basic mds_cache_memory_limit
21474836480
client advanced client_oc false
client advanced client_readahead_max_bytes 4096
client advanced client_readahead_max_periods 1
client advanced client_readahead_min 0
client advanced fuse_disable_pagecache false
The cephfs mount options (note that readahead was disabled for this test):
/mnt/cephfs type ceph (rw,relatime,name=cephfs,secret=<hidden>,acl,rasize=0)
Any help or pointers are appreciated; this is a major performance issue for
us.
Thanks and Regards,
Ashu Pachauri
Hi,
One of my customers had a correctly working RGW cluster with two zones in
one zonegroup and since a few days ago users are not able to create buckets
and are always getting Access denied. Working with existing buckets works
(like listing/putting objects into existing bucket). The only operation
which is not working is bucket creation. We also tried to create a new
user, but the behavior is the same, and he is not able to create the
bucket. We tried s3cmd, python script with boto library and also Dashboard
as admin user. We are always getting Access Denied. Zones are in-sync.
Has anyone experienced such behavior?
Thanks in advance, here are some outputs:
$ s3cmd -c .s3cfg_python_client mb s3://test
ERROR: Access to bucket 'test' was denied
ERROR: S3 error: 403 (AccessDenied)
Zones are in-sync:
Primary cluster:
# radosgw-admin sync status
realm 5429b434-6d43-4a18-8f19-a5720a89c621 (solargis-prod)
zonegroup 00e4b3ff-1da8-4a86-9f52-4300c6d0f149 (solargis-prod-ba)
zone 6067eec6-a930-45c7-af7d-a7ef2785a2d7 (solargis-prod-ba-dc)
metadata sync no sync (zone is master)
data sync source: e84fd242-dbae-466c-b4d9-545990590995 (solargis-prod-ba-hq)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
Secondary cluster:
# radosgw-admin sync status
realm 5429b434-6d43-4a18-8f19-a5720a89c621 (solargis-prod)
zonegroup 00e4b3ff-1da8-4a86-9f52-4300c6d0f149 (solargis-prod-ba)
zone e84fd242-dbae-466c-b4d9-545990590995 (solargis-prod-ba-hq)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 6067eec6-a930-45c7-af7d-a7ef2785a2d7 (solargis-prod-ba-dc)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
--
Kamil Madac
Hi,
I am reading reading some documentation about mClock and have two questions.
First, about the IOPS. Are those IOPS disk IOPS or other kind of IOPS? And what the assumption of those? (Like block size, sequential or random reads/writes)?
And the second question,
How mClock calculates its profiles? I have my lab cluster running Quincy, and I have this parameters for mClock:
"osd_mclock_max_capacity_iops_hdd": "450.000000",
"osd_mclock_profile": "balanced",
According to the documentation: https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#bala… I am expecting to have:
"osd_mclock_scheduler_background_best_effort_lim": "999999",
"osd_mclock_scheduler_background_best_effort_res": "90",
"osd_mclock_scheduler_background_best_effort_wgt": "2",
"osd_mclock_scheduler_background_recovery_lim": "675",
"osd_mclock_scheduler_background_recovery_res": "180",
"osd_mclock_scheduler_background_recovery_wgt": "1",
"osd_mclock_scheduler_client_lim": "450",
"osd_mclock_scheduler_client_res": "180", "osd_mclock_scheduler_client_wgt": "1",
But what I get is:
"osd_mclock_scheduler_background_best_effort_lim": "999999",
"osd_mclock_scheduler_background_best_effort_res": "18",
"osd_mclock_scheduler_background_best_effort_wgt": "2",
"osd_mclock_scheduler_background_recovery_lim": "135",
"osd_mclock_scheduler_background_recovery_res": "36",
"osd_mclock_scheduler_background_recovery_wgt": "1",
"osd_mclock_scheduler_client_lim": "90",
"osd_mclock_scheduler_client_res": "36",
"osd_mclock_scheduler_client_wgt": "1",
Which seems very low according to what my disk seems to be able to handle.
Is this calculation the expected one? Or did I miss something on how those profiles are populated?
Luis Domingues
Proton AG
Dear Ceph users,
my cluster is made up of 10 old machines, with uneven number of disks and disk size. Essentially I have just one big data pool (6+2 erasure code, with host failure domain) for which I am currently experiencing a very poor available space (88 TB of which 40 TB occupied, as reported by df -h on hosts mounting the cephfs) compared to the raw one (196.5 TB). I have a total of 104 OSDs and 512 PGs for the pool; I cannot increment the PG number since the machines are old and with very low amount of RAM, and some of them are already overloaded.
In this situation I'm seeing a high occupation of small OSDs (500 MB) with respect to bigger ones (2 and 4 TB) even if the weight is set equal to disk capacity (see below for ceph osd tree). For example OSD 9 is at 62% occupancy even with weight 0.5 and reweight 0.75, while the highest occupancy for 2 TB OSDs is 41% (OSD 18) and 4 TB OSDs is 23% (OSD 79). I guess this high occupancy for 500 MB OSDs combined with erasure code size and host failure domain might be the cause of the poor available space, could this be true? The upmap balancer is currently running but I don't know if and how much it could improve the situation.
Any hint is greatly appreciated, thanks.
Nicola
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 196.47754 root default
-7 14.55518 host aka
4 hdd 1.81940 osd.4 up 1.00000 1.00000
11 hdd 1.81940 osd.11 up 1.00000 1.00000
18 hdd 1.81940 osd.18 up 1.00000 1.00000
26 hdd 1.81940 osd.26 up 1.00000 1.00000
32 hdd 1.81940 osd.32 up 1.00000 1.00000
41 hdd 1.81940 osd.41 up 1.00000 1.00000
48 hdd 1.81940 osd.48 up 1.00000 1.00000
55 hdd 1.81940 osd.55 up 1.00000 1.00000
-3 14.55518 host balin
0 hdd 1.81940 osd.0 up 1.00000 1.00000
8 hdd 1.81940 osd.8 up 1.00000 1.00000
15 hdd 1.81940 osd.15 up 1.00000 1.00000
22 hdd 1.81940 osd.22 up 1.00000 1.00000
29 hdd 1.81940 osd.29 up 1.00000 1.00000
34 hdd 1.81940 osd.34 up 1.00000 1.00000
43 hdd 1.81940 osd.43 up 1.00000 1.00000
49 hdd 1.81940 osd.49 up 1.00000 1.00000
-13 29.10950 host bifur
3 hdd 3.63869 osd.3 up 1.00000 1.00000
14 hdd 3.63869 osd.14 up 1.00000 1.00000
27 hdd 3.63869 osd.27 up 1.00000 1.00000
37 hdd 3.63869 osd.37 up 1.00000 1.00000
50 hdd 3.63869 osd.50 up 1.00000 1.00000
59 hdd 3.63869 osd.59 up 1.00000 1.00000
64 hdd 3.63869 osd.64 up 1.00000 1.00000
69 hdd 3.63869 osd.69 up 1.00000 1.00000
-17 29.10950 host bofur
2 hdd 3.63869 osd.2 up 1.00000 1.00000
21 hdd 3.63869 osd.21 up 1.00000 1.00000
39 hdd 3.63869 osd.39 up 1.00000 1.00000
57 hdd 3.63869 osd.57 up 1.00000 1.00000
66 hdd 3.63869 osd.66 up 1.00000 1.00000
72 hdd 3.63869 osd.72 up 1.00000 1.00000
76 hdd 3.63869 osd.76 up 1.00000 1.00000
79 hdd 3.63869 osd.79 up 1.00000 1.00000
-21 29.10376 host dwalin
88 hdd 1.81898 osd.88 up 1.00000 1.00000
89 hdd 1.81898 osd.89 up 1.00000 1.00000
90 hdd 1.81898 osd.90 up 1.00000 1.00000
91 hdd 1.81898 osd.91 up 1.00000 1.00000
92 hdd 1.81898 osd.92 up 1.00000 1.00000
93 hdd 1.81898 osd.93 up 1.00000 1.00000
94 hdd 1.81898 osd.94 up 1.00000 1.00000
95 hdd 1.81898 osd.95 up 1.00000 1.00000
96 hdd 1.81898 osd.96 up 1.00000 1.00000
97 hdd 1.81898 osd.97 up 1.00000 1.00000
98 hdd 1.81898 osd.98 up 1.00000 1.00000
99 hdd 1.81898 osd.99 up 1.00000 1.00000
100 hdd 1.81898 osd.100 up 1.00000 1.00000
101 hdd 1.81898 osd.101 up 1.00000 1.00000
102 hdd 1.81898 osd.102 up 1.00000 1.00000
103 hdd 1.81898 osd.103 up 1.00000 1.00000
-9 14.55518 host ogion
7 hdd 1.81940 osd.7 up 1.00000 1.00000
16 hdd 1.81940 osd.16 up 1.00000 1.00000
23 hdd 1.81940 osd.23 up 1.00000 1.00000
33 hdd 1.81940 osd.33 up 1.00000 1.00000
40 hdd 1.81940 osd.40 up 1.00000 1.00000
47 hdd 1.81940 osd.47 up 1.00000 1.00000
54 hdd 1.81940 osd.54 up 1.00000 1.00000
61 hdd 1.81940 osd.61 up 1.00000 1.00000
-19 14.55518 host prestno
81 hdd 1.81940 osd.81 up 1.00000 1.00000
82 hdd 1.81940 osd.82 up 1.00000 1.00000
83 hdd 1.81940 osd.83 up 1.00000 1.00000
84 hdd 1.81940 osd.84 up 1.00000 1.00000
85 hdd 1.81940 osd.85 up 1.00000 1.00000
86 hdd 1.81940 osd.86 up 1.00000 1.00000
87 hdd 1.81940 osd.87 up 1.00000 1.00000
104 hdd 1.81940 osd.104 up 1.00000 1.00000
-15 29.10376 host remolo
6 hdd 1.81897 osd.6 up 1.00000 1.00000
12 hdd 1.81897 osd.12 up 1.00000 1.00000
19 hdd 1.81897 osd.19 up 1.00000 1.00000
28 hdd 1.81897 osd.28 up 1.00000 1.00000
35 hdd 1.81897 osd.35 up 1.00000 1.00000
44 hdd 1.81897 osd.44 up 1.00000 1.00000
52 hdd 1.81897 osd.52 up 1.00000 1.00000
58 hdd 1.81897 osd.58 up 1.00000 1.00000
63 hdd 1.81897 osd.63 up 1.00000 1.00000
67 hdd 1.81897 osd.67 up 1.00000 1.00000
71 hdd 1.81897 osd.71 up 1.00000 1.00000
73 hdd 1.81897 osd.73 up 1.00000 1.00000
74 hdd 1.81897 osd.74 up 1.00000 1.00000
75 hdd 1.81897 osd.75 up 1.00000 1.00000
77 hdd 1.81897 osd.77 up 1.00000 1.00000
78 hdd 1.81897 osd.78 up 1.00000 1.00000
-5 14.55518 host rokanan
1 hdd 1.81940 osd.1 up 1.00000 1.00000
10 hdd 1.81940 osd.10 up 1.00000 1.00000
17 hdd 1.81940 osd.17 up 1.00000 1.00000
24 hdd 1.81940 osd.24 up 1.00000 1.00000
31 hdd 1.81940 osd.31 up 1.00000 1.00000
38 hdd 1.81940 osd.38 up 1.00000 1.00000
46 hdd 1.81940 osd.46 up 1.00000 1.00000
53 hdd 1.81940 osd.53 up 1.00000 1.00000
-11 7.27515 host romolo
5 hdd 0.45470 osd.5 up 1.00000 1.00000
9 hdd 0.45470 osd.9 up 0.75000 1.00000
13 hdd 0.45470 osd.13 up 1.00000 1.00000
20 hdd 0.45470 osd.20 up 0.95000 1.00000
25 hdd 0.45470 osd.25 up 0.75000 1.00000
30 hdd 0.45470 osd.30 up 1.00000 1.00000
36 hdd 0.45470 osd.36 up 1.00000 1.00000
42 hdd 0.45470 osd.42 up 1.00000 1.00000
45 hdd 0.45470 osd.45 up 0.85004 1.00000
51 hdd 0.45470 osd.51 up 0.89999 1.00000
56 hdd 0.45470 osd.56 up 1.00000 1.00000
60 hdd 0.45470 osd.60 up 1.00000 1.00000
62 hdd 0.45470 osd.62 up 1.00000 1.00000
65 hdd 0.45470 osd.65 up 0.85004 1.00000
68 hdd 0.45470 osd.68 up 1.00000 1.00000
70 hdd 0.45470 osd.70 up 1.00000 1.00000