November 2019 - ceph-users

by Wesley Peng

4 years, 5 months

2
1
0 0

by Thomas Schneider

Hi, I can see the following error message regularely in MGR log: 2019-11-18 14:25:48.847 7fd9e6a3a700 0 mgr[dashboard] [18/Nov/2019:14:25:48] ENGINE Error in HTTPServer.tick Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line 2021, in start self.tick() File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line 2090, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py", line 67, in wrap server_side=True) File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket _context=self) File "/usr/lib/python2.7/ssl.py", line 599, in __init__ self.do_handshake() File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake self._sslobj.do_handshake() error: [Errno 0] Error 2019-11-18 14:25:49.027 7fd9e6a3a700 0 mgr[dashboard] [18/Nov/2019:14:25:49] ENGINE Error in HTTPServer.tick Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line 2021, in start self.tick() File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line 2090, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py", line 67, in wrap server_side=True) File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket _context=self) File "/usr/lib/python2.7/ssl.py", line 599, in __init__ self.do_handshake() File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake self._sslobj.do_handshake() SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate unknown (_ssl.c:727) In many cases this error causes a switch of the active MGR node. But there's another impact on the Ceph Dashboard directly that hangs completely when this error is logged. Any advise how to fix this issue is appreciated. THX

4 years, 5 months

1
0
0 0

nfs ganesha rgw write errors

by Marc Roos

Hi Daniel, I am able to mount the buckets with your config, however when I try to write something, my logs get a lot of these errors: svc_732] nfs4_Errno_verbose :NFS4 :CRIT :Error I/O error in nfs4_write_cb converted to NFS4ERR_IO but was set non-retryable Any chance you know how to resolve this?

4 years, 5 months

2
1
0 0

Nfs-ganesha rpm still has samba package dependency

by Marc Roos

======================================================================== ================================================== Package Arch Version Repository Size ======================================================================== ================================================== Installing: nfs-ganesha x86_64 2.8.1.2-0.1.el7 CentOS7-custom 680 k nfs-ganesha-ceph x86_64 2.8.1.2-0.1.el7 CentOS7-custom 30 k nfs-ganesha-mem x86_64 2.8.1.2-0.1.el7 CentOS7-custom 30 k nfs-ganesha-rgw x86_64 2.8.1.2-0.1.el7 CentOS7-custom 21 k nfs-ganesha-vfs x86_64 2.8.1.2-0.1.el7 CentOS7-custom 44 k nfs-ganesha-xfs x86_64 2.8.1.2-0.1.el7 CentOS7-custom 42 k Installing for dependencies: libldb x86_64 1.4.2-1.el7 CentOS7 144 k libntirpc x86_64 1.8.0-0.1.el7 CentOS7-custom 113 k libtalloc x86_64 2.1.14-1.el7 CentOS7 32 k libtdb x86_64 1.3.16-1.el7 CentOS7 48 k libtevent x86_64 0.9.37-1.el7 CentOS7 40 k libwbclient x86_64 4.9.1-6.el7 CentOS7 110 k samba-client-libs x86_64 4.9.1-6.el7 CentOS7 4.9 M samba-common noarch 4.9.1-6.el7 CentOS7 209 k samba-common-libs x86_64 4.9.1-6.el7 CentOS7 170 k Transaction Summary ======================================================================== ================================================== Install 6 Packages (+9 Dependent packages) Total download size: 6.6 M Installed size: 23 M

4 years, 5 months

2
1
0 0

Full FLash NVME Cluster recommendation

by Yoann Moulin

Hello, I'm going to deploy a new cluster soon based on 6.4TB NVME PCI-E Cards, I will have only 1 NVME card per node and 38 nodes. The use case is to offer cephfs volumes for a k8s platform, I plan to use an EC-POOL 8+3 for the cephfs_data pool. Do you have recommendations for the setup or mistakes to avoid? I use ceph-ansible to deploy all myclusters. Best regards, -- Yoann Moulin EPFL IC-IT

4 years, 5 months

3
3
0 0

Re: Full FLash NVME Cluster recommendation

by Yoann Moulin

Hello Nathan, >>>> I'm going to deploy a new cluster soon based on 6.4TB NVME PCI-E Cards, I will have only 1 NVME card per node and 38 nodes. >>>> >>>> The use case is to offer cephfs volumes for a k8s platform, I plan to use an EC-POOL 8+3 for the cephfs_data pool. >>>> >>>> Do you have recommendations for the setup or mistakes to avoid? I use ceph-ansible to deploy all myclusters. >>> >>> In order to get optimal performance out of NVMe, you will want very >>> fast cores, and you will probably have to split each NVMe card into >>> 2-4 OSD partitions in order to throw enough cores at it. That's a good idea ! If I have enough time, I'll try to do some benchmark with 2 and 4 OSD partitions. >> I’ve been trying unsuccessfully to convince some folks of the need for fast cores, there’s the idea that the effect would be slight. Do >> you have any numbers? I’ve also read a claim that each BlueStore will use 3-4 cores, They’re listening to me though about splitting the >> card into multiple OSDs. > > Bluestore will use about 4 cores, but in my experience, the maximum > utilization I've seen has been something like: 100%, 100%, 50%, 50% > > So those first 2 cores are the bottleneck for pure OSD IOPS. This sort > of pattern isn't uncommon in multithreaded programs. This was on HDD > OSDs with DB/WAL on NVMe, as well as some small metadata OSDs on pure > NVMe. SSD OSDs default to 2 threads per shard, and HDD to 1, but we > had to set HDD to 2 as well when we enabled NVMe WAL/DB. Otherwise the > OSDs ran out of CPU and failed to heartbeat when under load. I believe > that if we had 50% faster cores, we might not have needed to do this. > > On SSDs/NVMe you can compensate for slower cores with more OSDs, but > of course only for parallel operations. Anything that is > serial+synchronous, not so much. I would expect something like 4 OSDs > per NVMe, 4 cores per OSD. That's already 16 cores per node just for > OSDs. > > Our bottleneck in practice is the Ceph MDS, which seems to use exactly > 2 cores and has no setting to change this. As far as I can tell, if we > had 50% faster cores just for the MDS, I would expect roughly +50% > performance in terms of metadata ops/second. Each filesystem has it's > own rank-0 MDS, so this load will be split across daemons. The MDS can > also use a ton of RAM (32GB) if the clients have a working set of 1 > million+ files. Multi-mds exists to further split the load, but is > quite new and I would not trust it. CephFS in general is likely where > you will have the most issues, as it both new and complex compared to > a simple object store. Having an MDS in standby-replay mode keeps it's > RAM cache synced with the active, so you get far faster failover ( > O(seconds) rather than O(minutes) with a few million file caps) but > you use the same RAM again. > > So, IMHO, you will want at least: > CPU: > 16 cores per 1-card NVMe OSD node. 2 cores per filesystem (maybe 1 if > you don't expect a lot of simultaneous load?) > > RAM: > The Bluestore default is 4GB per OSD, so 16GB per node. > ~32GB of RAM per active and standby-replay MDS if you expect file > counts in the millions, so 64GB per filesystem. The context is 3 Intel Server 1U for MONs/MDSs/MGRs services + K8s daemons CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24c/48t) Memory : 64GB Disk OS : 2x Intel SSD DC S3520 240GB 38 Dell C4140 1U for OSD nodes : CPU : 2 x Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (28c/56t) Memory : 384GB GPU : 4 Nvidia V100 32GB NVLink Disk OS : M.2 240G NVME : Dell 6.4TB NVME PCI-E Drive (Samsung PM1725b), only 1 slot available Each server is used in a k8s cluster to give access to GPUs and CPUs for X-learning labs. Ceph have to share the CPU and memory with the compute K8s cluster. > 128GB of RAM per node ought to do, if you have less than 14 filesystems? I plan to have only 1 filesystem. Thanks to all those useful information. Best regards, -- Yoann Moulin EPFL IC-IT

4 years, 5 months

1
0
0 0

PG in state: creating+down

by Thomas Schneider

Hi, ceph health is reporting: pg 59.1c is creating+down, acting [426,438] root@ld3955:~# ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio; mons ld5505,ld5506 are low on available space MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsld4465(mds.0): 8 slow metadata IOs are blocked > 30 secs, oldest blocked for 120721 secs OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down pg 59.1c is creating+down, acting [426,438] MON_DISK_LOW mons ld5505,ld5506 are low on available space mon.ld5505 has 22% avail mon.ld5506 has 29% avail root@ld3955:~# ceph pg dump_stuck inactive ok PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 59.1c creating+down [426,438] 426 [426,438] 426 How can I fix this? THX

4 years, 5 months

3
4
0 0

Cannot list RBDs in any pool / cannot mount any RBD

by Thomas Schneider

Hi, when I execute this command rbd ls -l <pool-name> to list all RBDs I get spamming errors: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Sender did not set CEPH_MSG_FOOTER_SIGNED. 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Message signature does not match contents. 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1Signature on message: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 sig: 0 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1Locally calculated signature: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 sig_check:6229148783016323662 2019-11-15 11:29:19.428 7fd852678700 0 Signature failed. 2019-11-15 11:29:19.428 7fd852678700 0 --1- 10.97.206.91:0/2841811017 >> v1:10.97.206.97:6884/265976 conn(0x7fd834090770 0x7fd83408c190 :-1 s=READ_FOOTER_AND_DISPATCH pgs=42068 cs=1 l=1).handle_message_footer Signature check failed 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Sender did not set CEPH_MSG_FOOTER_SIGNED. 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Message signature does not match contents. 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1Signature on message: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 sig: 0 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1Locally calculated signature: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 sig_check:4639667068846516939 2019-11-15 11:29:19.428 7fd852678700 0 Signature failed. 2019-11-15 11:29:19.428 7fd852678700 0 --1- 10.97.206.91:0/2841811017 >> v1:10.97.206.97:6884/265976 conn(0x7fd83408c990 0x7fd83408b160 :-1 s=READ_FOOTER_AND_DISPATCH pgs=42069 cs=1 l=1).handle_message_footer Signature check failed 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Sender did not set CEPH_MSG_FOOTER_SIGNED. 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Message signature does not match contents. 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1Signature on message: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 sig: 0 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1Locally calculated signature: 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 sig_check:12754808375040063976 2019-11-15 11:29:19.428 7fd852678700 0 Signature failed. 2019-11-15 11:29:19.428 7fd852678700 0 --1- 10.97.206.91:0/2841811017 >> v1:10.97.206.97:6884/265976 conn(0x7fd834090770 0x7fd83408c190 :-1 s=READ_FOOTER_AND_DISPATCH pgs=42070 cs=1 l=1).handle_message_footer Signature check failed I allready stopped and started all MON services w/o success. This is the output of config dump: root@ld3955:~# ceph config-key dump | grep config/ "config/mgr/mgr/balancer/active": "true", "config/mgr/mgr/balancer/mode": "upmap", "config/mgr/mgr/balancer/pool_ids": "11,59,60,61", "config/mgr/mgr/balancer/upmap_max_iterations": "2", "config/mgr/mgr/dashboard/url_prefix": "dashboard", "config/mgr/mgr/devicehealth/enable_monitoring": "false", "config/mgr/target_max_misplaced_ratio": "0.010000", And this is the config of an OSD showing that cephx authentication is activated: root@ld3955:~# ceph config show osd.0 NAME VALUE SOURCE OVERRIDES IGNORES auth_client_required cephx file auth_cluster_required cephx file auth_service_required cephx file bluestore_block_db_size 10737418240 file cephx_cluster_require_signatures false file cephx_require_signatures false file cephx_sign_messages false file cluster_network 192.168.1.0/27 file daemonize false override debug_ms 0/0 file keyring $osd_data/keyring default leveldb_log default mon_allow_pool_delete true file mon_host 10.97.206.93 10.97.206.94 10.97.206.95 file mon_osd_full_ratio 0.850000 file mon_osd_nearfull_ratio 0.750000 file osd_crush_update_on_start false file osd_deep_scrub_interval 1209600.000000 file osd_journal_size 1024 file osd_max_backfills 2 file osd_op_queue wpq file osd_op_queue_cut_off high file osd_pool_default_min_size 2 file osd_pool_default_size 3 file osd_scrub_begin_hour 21 file osd_scrub_end_hour 8 file osd_scrub_sleep 0.100000 file public_network 10.97.206.0/24 file rbd_default_features 61 default setgroup ceph cmdline setuser ceph cmdline How can I fix this error? THX

4 years, 5 months

3
4
0 0

mds can't trim journal

by locallocal

hi,cool guys, Recently,we had encountered a problem,the journal of MDS daemon could't be trimmed,resulting in a large amount of space occupied by the metadata pool.so what we could think out was using the admin socket command to flush journal,you know,It got worse,the admin thread of MDS was also stuck and after that we couldn't configure the log level.After analyzing the code, we found some segments couldn't get out out from expiring queue.but we don't know why and where is stuck in the function void(LogSegment::try_to_expire(MDSRank *mds, MDSGatherBuilder &gather_bld, int op_prio). any ideas or advice?Thanks a lot.Here are some cluster information: Version: luminous(v12.2.12) MDS debug log: 5 mds.0.log trim already expiring segment 3658103659/11516554553473, 980 events 5 mds.0.log trim already expiring segment 3658104639/11516556356904, 1024 events 5 mds.0.log trim already expiring segment 3658105663/11516558241475, 1024 events cephfs-journal-tool: { "magic": "ceph fs volume v011", "write_pos": 11836049063598, "expire_pos": 11516554553473, "trimmed_pos": 11516552151040, "stream_format": 1, "layout": { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 2, "pool_ns": "" } } | | locallocal | | locallocal(a)163.com | 签名由网易邮箱大师定制

4 years, 5 months

1
0
0 0

Rolling out radosgw-admin4j v2.0.2

by hrchu

radosgw-admin4j is an admin client in Java that allows provisioning and control of Ceph object store. In version 2.0.2, Java 11 and Ceph Nautilus are supported. See https://github.com/twonote/radosgw-admin4j for more details.

4 years, 5 months

1
0
1 0

2024

2023

2022

2021

2020

2019

ceph-users November 2019