Hi,
Installing Ceph Octopus using cephadm. I managed to install ceph-common
with cephadm and when trying to add new hosts with "ceph orch host add
ceph2" I get error
"Error EINVAL: Failed to connect to ceph2 (ceph2). Check that the host is
reachable and accepts connection using the cephadm SSH key".
I verified that I am able to ssh login to the ceph2 server with ceph
private_key like it was described in the error message. But since adding
new hosts to the ceph wasn't working, I tried generating the new private
key and updating the public keys to the remote servers with:
# ceph cephadm clear-key
# ceph cephadm generate-key
# ceph cephadm get-pub-key > ceph.pub
# ceph config-key get mgr/cephadm/ssh_identity_key > ceph.priv
# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph1
# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph2
# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph3
And then testing that the private key really is working:
# chmod 600 ceph.priv
# ssh -i ceph.priv root@ceph2
At this point ssh works with passwordless login. But still the ceph orch
host add ceph2 doesn't work (giving exactly the same error)
I also tried restarting the manager with "ceph mgr fail" which was
informed somewhere -> no effect. Also tried rebooting the machines -> no
effect.
Any tips I could still try ?
Thank you very much!
Hi,
I have to use this *rbd-nbd *tool from Ceph. This is part of Ceph source
code.
Here: https://github.com/ceph/ceph/tree/master/src/tools/rbd_nbd
My question is: Can we use this *rbd-nbd* tool in the Ceph cluster? By Ceph
cluster I mean the development cluster we build through *vstart.sh* script.
I am quite sure we could use it. I have this script running. I can *start*
and *stop* the cluster. But I am struggling to use this rbd-nbd tool in
the development cluster which we build through vstart.sh script.
Looking for help.
Thanks.
Hi all,
one of our MONs was down for maintenance for ca. 45 minutes. After this time I started it up again and it joined the cluster.
Unfortunately, things did not go as expected. The MON sub-cluster became unresponsive for a bit more than 10 minutes. Admin commands would hang, even if issued directly to a specific monitor via "ceph tell mon.xxx". In addition, our MDS lost connection to the MONs and reported a laggy connection. Consequently, all ceph fs access was frozen for a bit more than 10 minutes as well.
From the little I could get out with "ceph daemon mon.xxx mon_status" I could see that the restarted MON was in state "synchronizing" (or similar, its from memory) while the other mons were in quorum.
Our cluster is mimic-12.2.8. Somehow, this observation does not fit together with the intended HA of the MON cluster, there should not be any stall at all.
My questions: Why do the MONs become unresponsive for such a long time? What are the MONs doing during this time frame? Are there any config options I should look at? Are there any log messages I should hunt for?
Any hint is appreciated.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
I'm investigating an issue where 4 to 5 OSDs in a rack aren't marked as
down when the network is cut to that rack.
Situation:
- Nautilus cluster
- 3 racks
- 120 OSDs, 40 per rack
We performed a test where we turned off the network Top-of-Rack for each
rack. This worked as expected with two racks, but with the third
something weird happened.
From the 40 OSDs which were supposed to be marked as down only 36 were
marked as down.
In the end it took 15 minutes for all 40 OSDs to be marked as down.
$ ceph config set mon mon_osd_reporter_subtree_level rack
That setting is set to make sure that we only accept reports from other
racks.
What we saw in the logs for example:
2020-10-29T03:49:44.409-0400 7fbda185e700 10
mon.CEPH2-MON1-206-U39(a)0(leader).osd e107102 osd.51 has 54 reporters,
239.856038 grace (20.000000 + 219.856 + 7.43801e-23), max_failed_since
2020-10-29T03:47:22.374857-0400
But osd.51 was still not marked as down after 54 reporters have reported
that it is actually down.
I checked, no ping or other traffic possible to osd.51. Host is unreachable.
Another osd was marked as down, but it took a couple of minutes as well:
2020-10-29T03:50:54.455-0400 7fbda185e700 10
mon.CEPH2-MON1-206-U39(a)0(leader).osd e107102 osd.37 has 48 reporters,
221.378970 grace (20.000000 + 201.379 + 6.34437e-23), max_failed_since
2020-10-29T03:47:12.761584-0400
2020-10-29T03:50:54.455-0400 7fbda185e700 1
mon.CEPH2-MON1-206-U39(a)0(leader).osd e107102 we have enough reporters
to mark osd.37 down
In the end osd.51 was marked as down, but only after the MON decided to
do so:
2020-10-29T03:53:44.631-0400 7fbda185e700 0 log_channel(cluster) log
[INF] : osd.51 marked down after no beacon for 903.943390 seconds
2020-10-29T03:53:44.631-0400 7fbda185e700 -1
mon.CEPH2-MON1-206-U39(a)0(leader).osd e107104 no beacon from osd.51 since
2020-10-29T03:38:40.689062-0400, 903.943390 seconds ago. marking down
I haven't seen this happen before in any cluster. It's also strange that
this only happens in this rack, the other two racks work fine.
ID CLASS WEIGHT TYPE NAME
-1 1545.35999 root default
-206 515.12000 rack 206
-7 27.94499 host CEPH2-206-U16
...
-207 515.12000 rack 207
-17 27.94499 host CEPH2-207-U16
...
-208 515.12000 rack 208
-31 27.94499 host CEPH2-208-U16
...
That's how the CRUSHMap looks like. Straight forward and 3x replication
over 3 racks.
This issue only occurs in rack *207*.
Has anybody seen this before or knows where to start?
Wido
Hello all.
I'm trying to deploy the dashboard (Nautilus 14.2.8), and after I run ceph
dashboard create-self-signed-cert, the cluster started to show this warning:
# ceph health detail
HEALTH_ERR Module 'dashboard' has failed: '_cffi_backend.CDataGCP' object
has no attribute 'type'
MGR_MODULE_ERROR Module 'dashboard' has failed: '_cffi_backend.CDataGCP'
object has no attribute 'type'
Module 'dashboard' has failed: '_cffi_backend.CDataGCP' object has no
attribute 'type'
If I set ceph config set mgr mgr/dashboard/ssl false, the error goes away.
I tried to manually upload the certs, but I'm still hitting the error.
Has anyone experienced something similar?
Thanks, Marcelo.
I am getting an error in the log.smbd from the Samba gateway that I
don’t understand and looking for help from anyone who has gotten the
vfs_ceph working.
Background:
I am trying to get a Samba gateway with CephFS working with the
vfs_ceph module. I observed that the default Samba package on CentOS
7.7 did not come with the ceph.so vfs_ceph module, so I tried to
compile a working Samba version with vfs_ceph.
Newer Samba versions have a requirement for GnuTLS >= 3.4.7, which is
not an available package on CentOS 7.7 without a custom repository. I
opted to build an earlier version of Samba.
On CentOS 7.7, I built Samba 4.11.16 with
[global]
security = user
map to guest = Bad User
username map = /etc/samba/smbusers
log level = 4
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
[cryofs_upload]
public = yes
read only = yes
guest ok = yes
vfs objects = ceph
path = /upload
kernel share modes = no
ceph:user_id = samba.upload
ceph:config_file = /etc/ceph/ceph.conf
I have a file at /etc/ceph/ceph.conf including:
fsid = redacted
mon_host = redacted
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
I have an /etc/ceph/client.samba.upload.keyring /w key for the user
`samba.upload`
However, connecting fails:
smbclient \\\\localhost\\cryofs_upload -U guest
Enter guest's password:
tree connect failed: NT_STATUS_UNSUCCESSFUL
The log.smbd gives these errors:
Initialising custom vfs hooks from [ceph]
[2020/11/11 17:24:37.388460, 3]
../../lib/util/modules.c:167(load_module_absolute_path)
load_module_absolute_path: Module '/usr/local/samba/lib/vfs/ceph.so' loaded
[2020/11/11 17:24:37.402026, 1]
../../source3/smbd/service.c:668(make_connection_snum)
make_connection_snum: SMB_VFS_CONNECT for service 'cryofs_upload' at
'/upload' failed: No such file or directory
There is an /upload directory for which the samba.upload user has read
access to in the CephFS.
What does this error mean: ‘no such file or directory’ ? Is it that
vfs_ceph isn’t finding `/upload` or is some other file depended by
vfs_ceph not been found? I have also tried to specify a local path
rather than a CephFS path and will get the same error.
Is there any good guide that describes not just the Samba smb.conf,
but also what should be in /etc/ceph/ceph.conf, and how to provide the
key for the ceph:user_id ? I am really struggling to find good
first-hand documentation for this.
Thanks,
Matt
--
Matt Larson, PhD
Madison, WI 53705 U.S.A.
Hi,
I have this error:
I have 36 osd and get this:
Error ERANGE: pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 10500 (mon_max_pg_per_osd 250 * num_in_osds 42)
If I want to calculate the max pg in my server, how it works if I have EC pool?
I have 4:2 data EC pool, and the others are replicated.
These are the pools:
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 597 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 598 flags hashpspool stripe_width 0 application rgw
pool 6 'sin.rgw.log' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 599 flags hashpspool stripe_width 0 application rgw
pool 7 'sin.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 600 flags hashpspool stripe_width 0 application rgw
pool 8 'sin.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 601 lfor 0/393/391 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 10 'sin.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 602 lfor 0/529/527 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 11 'sin.rgw.buckets.data.old' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 603 flags hashpspool stripe_width 0 application rgw
pool 12 'sin.rgw.buckets.data' erasure profile data-ec size 6 min_size 5 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 604 flags hashpspool,ec_overwrites stripe_width 16384 application rgw
So how I can calculate the pgs?
This is my osd tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 534.38354 root default
-5 89.06392 host cephosd-6s01
36 nvme 1.74660 osd.36 up 1.00000 1.00000
0 ssd 14.55289 osd.0 up 1.00000 1.00000
8 ssd 14.55289 osd.8 up 1.00000 1.00000
15 ssd 14.55289 osd.15 up 1.00000 1.00000
18 ssd 14.55289 osd.18 up 1.00000 1.00000
24 ssd 14.55289 osd.24 up 1.00000 1.00000
30 ssd 14.55289 osd.30 up 1.00000 1.00000
-3 89.06392 host cephosd-6s02
37 nvme 1.74660 osd.37 up 1.00000 1.00000
1 ssd 14.55289 osd.1 up 1.00000 1.00000
11 ssd 14.55289 osd.11 up 1.00000 1.00000
17 ssd 14.55289 osd.17 up 1.00000 1.00000
23 ssd 14.55289 osd.23 up 1.00000 1.00000
28 ssd 14.55289 osd.28 up 1.00000 1.00000
35 ssd 14.55289 osd.35 up 1.00000 1.00000
-11 89.06392 host cephosd-6s03
41 nvme 1.74660 osd.41 up 1.00000 1.00000
2 ssd 14.55289 osd.2 up 1.00000 1.00000
6 ssd 14.55289 osd.6 up 1.00000 1.00000
13 ssd 14.55289 osd.13 up 1.00000 1.00000
19 ssd 14.55289 osd.19 up 1.00000 1.00000
26 ssd 14.55289 osd.26 up 1.00000 1.00000
32 ssd 14.55289 osd.32 up 1.00000 1.00000
-13 89.06392 host cephosd-6s04
38 nvme 1.74660 osd.38 up 1.00000 1.00000
5 ssd 14.55289 osd.5 up 1.00000 1.00000
7 ssd 14.55289 osd.7 up 1.00000 1.00000
14 ssd 14.55289 osd.14 up 1.00000 1.00000
20 ssd 14.55289 osd.20 up 1.00000 1.00000
25 ssd 14.55289 osd.25 up 1.00000 1.00000
31 ssd 14.55289 osd.31 up 1.00000 1.00000
-9 89.06392 host cephosd-6s05
40 nvme 1.74660 osd.40 up 1.00000 1.00000
3 ssd 14.55289 osd.3 up 1.00000 1.00000
10 ssd 14.55289 osd.10 up 1.00000 1.00000
12 ssd 14.55289 osd.12 up 1.00000 1.00000
21 ssd 14.55289 osd.21 up 1.00000 1.00000
29 ssd 14.55289 osd.29 up 1.00000 1.00000
33 ssd 14.55289 osd.33 up 1.00000 1.00000
-7 89.06392 host cephosd-6s06
39 nvme 1.74660 osd.39 up 1.00000 1.00000
4 ssd 14.55289 osd.4 up 1.00000 1.00000
9 ssd 14.55289 osd.9 up 1.00000 1.00000
16 ssd 14.55289 osd.16 up 1.00000 1.00000
22 ssd 14.55289 osd.22 up 1.00000 1.00000
27 ssd 14.55289 osd.27 up 1.00000 1.00000
34 ssd 14.55289 osd.34 up 1.00000 1.00000
This is the crush rules:
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "replicated_nvme",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~nvme"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "replicated_ssd",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 3,
"rule_name": "sin.rgw.buckets.data.new",
"ruleset": 3,
"type": 3,
"min_size": 3,
"max_size": 6,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -2,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]
So everything else rather than the data pool are on SSD and nvme with replica 3.
If I calculate the pg in the ec like 36osd*100/6=600 which means the max pg in the EC pool is 512?
But how this affect the SSD replica pools then?
This is the EC pool definition:
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8
Thank you in advance.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi All,
I'm not sure if this is the correct place to ask this question, I have tried the channels, but have received very little help there.
I am currently very new to Ceph and am investigating it as to a possible replacement for a legacy application which use to provide us with replication.
At the moment my company has three servers, two primary servers running Ubuntu and a backup server also running Ubuntu, the two primary servers each host a virtual machine, and it is these virtual machines that the office workers use for shared folder access, email and as a domain server, the office workers are not aware of the underlying linux servers. In the past the legacy software would replicate the running VM files on both primary servers to the backup server. The replication is done at the underlying linux host level and not from within the guest VMs. I was hoping that I could get Ceph to do this as well. From what I have read and I speak under correction, the best Ceph client type for this would be the block access, whereby I would then mount the block and start up the VMs. As I would be running the VMs, as per normal routine, would Ceph then have to retrieve the large VM files from the storage nodes across the lan and bring the data back to the client to run in the VM. Is there an option to cache certain parts of the data on certain clients?
Also none of the primary servers as they currently stand have the capacity to run both VMs together, so each primary has a dedicated VM which it runs, the backup server currently keeps replicated copies of both VM images from each primary, the replication is provided by the legacy application. I'm also wondering if I need to get a fourth server, so I have 2 clients and 2 storage nodes.
Any suggestions or help would be greatly appreciated.
Yours sincerely
Vaughan Beckwith
Bluesphere Technologies
BSC I.T. (Honours)
vaughan.beckwith(a)bluesphere.co.za<mailto:vaughan.beckwith@bluesphere.co.za>
Telephone: 011 675 6354
Fax: (011) 675 6423
I've inherited a Ceph Octopus cluster that seems like it needs urgent maintenance before data loss begins to happen. I'm the guy with the most Ceph experience on hand and that's not saying much. I'm experiencing most of the ops and repair tasks for the first time here.
Ceph health output looks like this:
HEALTH_WARN Degraded data redundancy: 3640401/8801868 objects degraded (41.359%),
128 pgs degraded, 128 pgs undersized; 128 pgs not deep-scrubbed in time;
128 pgs not scrubbed in time
Ceph -s output: https://termbin.com/i06u
The crush rule 'cephfs.media' is here: https://termbin.com/2klmq
So, it seems like all PGs are in a 'warning' state for the main pool, which is erasure coded and 11TiB across 4 OSDs, of which around 6.4TiB is used. The Ceph services themselves seem happy, they're stable and have Quorum. I'm able to access the web panel fine also. The block devices are of different sizes and types (2 large, different sized spinners, and 2 identical SSDs)
I would welcome any pointers on what my steps to bring this up to full health may be. If it's undersized, can I simply add another block device/OSD? Or perhaps adjusting config somewhere will get it to rebalance successfully? (the rebalance jobs have been stuck at 0% for weeks)
Thank you for your time reading this message.
Hi,
We have recently deployed a Ceph cluster with
12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8
We are using Ceph Octopus and we are using RBD block devices.
We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8) across which RBDs are mapped and mounted, 25 RBDs each on each client node. Each RBD size is 10TB. Each RBD is formatted as EXT4 file system.
From network side, we have 10Gbps Active/Passive Bond on all the Ceph cluster nodes, including the clients. Jumbo frames enabled and MTU is 9000
This is a new cluster and cluster health reports Ok. But we see high IO wait during the writes.
From one of the clients,
15:14:30 CPU %user %nice %system %iowait %steal %idle
15:14:31 all 0.06 0.00 1.00 45.03 0.00 53.91
15:14:32 all 0.06 0.00 0.94 41.28 0.00 57.72
15:14:33 all 0.06 0.00 1.25 45.78 0.00 52.91
15:14:34 all 0.00 0.00 1.06 40.07 0.00 58.86
15:14:35 all 0.19 0.00 1.38 41.04 0.00 57.39
Average: all 0.08 0.00 1.13 42.64 0.00 56.16
and the system load shows very high
top - 15:19:15 up 34 days, 41 min, 2 users, load average: 13.49, 13.62, 13.83
From 'atop'
one of the CPUs shows this
CPU | sys 7% | user 1% | irq 2% | idle 1394% | wait 195% | steal 0% | guest 0% | ipc initial | cycl initial | curf 806MHz | curscal ?%
On the OSD nodes, don't see much %utilization of the disks.
RBD caching values are default.
Are we overlooking some configuration item ?
Thanks and Regards,
At