- ceph-users - lists.ceph.io

Best practice and expected benefits of using separate WAL and DB devices with Bluestore

by Niklaus Hofer

Dear all We have an HDD ceph cluster that could do with some more IOPS. One solution we are considering is installing NVMe SSDs into the storage nodes and using them as WAL- and/or DB devices for the Bluestore OSDs. However, we have some questions about this and are looking for some guidance and advice. The first one is about the expected benefits. Before we undergo the efforts involved in the transition, we are wondering if it is even worth it. How much of a performance boost one can expect when adding NVMe SSDs for WAL-devices to an HDD cluster? Plus, how much faster than that does it get with the DB also being on SSD. Are there rule-of-thumb number of that? Or maybe someone has done benchmarks in the past? The second question is of more practical nature. Are there any best-practices on how to implement this? I was thinking we won't do one SSD per HDD - surely an NVMe SSD is plenty fast to handle the traffic from multiple OSDs. But what is a good ratio? Do I have one NVMe SSD per 4 HDDs? Per 6 or even 8? Also, how should I chop-up the SSD, using partitions or using LVM? Last but not least, if I have one SSD handle WAL and DB for multiple OSDs, losing that SSD means losing multiple OSDs. How do people deal with this risk? Is it generally deemed acceptable or is this something people tend to mitigate and if so how? Do I run multiple SSDs in RAID? I do realize that for some of these, there might not be the one perfect answer that fits all use cases. I am looking for best practices and in general just trying to avoid any obvious mistakes. Any advice is much appreciated. Sincerely Niklaus Hofer -- stepping stone AG Wasserwerkgasse 7 CH-3011 Bern Telefon: +41 31 332 53 63 www.stepping-stone.ch niklaus.hofer(a)stepping-stone.ch

1 week, 1 day

7
6
0 0

Status of IPv4 / IPv6 dual stack?

by Robert Sander

Hi, as the documentation sends mixed signals in https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv… "Note Binding to IPv4 is enabled by default, so if you just add the option to bind to IPv6 you’ll actually put yourself into dual stack mode." and https://docs.ceph.com/en/latest/rados/configuration/msgr2/#address-formats "Note The ability to bind to multiple ports has paved the way for dual-stack IPv4 and IPv6 support. That said, dual-stack operation is not yet supported as of Quincy v17.2.0." just the quick questions: Is a dual stacked networking with IPv4 and IPv6 now supported or not? From which version on is it considered stable? Are OSDs now able to register themselves with two IP addresses in the cluster map? MONs too? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

1 week, 1 day

6
6
0 0

Why CEPH is better than other storage solutions?

by sebcio_t＠o2.pl

Hi, I have problem to answer to this question: Why CEPH is better than other storage solutions? I know this high level texts about - scalability, - flexibility, - distributed, - cost-Effectiveness What convince me, but could be received also against, is ceph as a product has everything what I need it mean: block storage (RBD), file storage (CephFS), object storage (S3, Swift) and "plugins" to run NFS, NVMe over Fabric, NFS on object storage. Also many other features which are usually sold as a option (mirroring, geo replication, etc) in paid solutions. I have problem to write it done piece by piece. I want convince my managers we are going in good direction. Why not something from robin.io or purestorage, netapp, dell/EMC. From opensource longhorn or openEBS. If you have ideas please write it. Thanks, S.

1 week, 1 day

8
8
0 0

Stuck in replay?

by Erich Weiler

Hi All, We have a somewhat serious situation where we have a cephfs filesystem (18.2.1), and 2 active MDSs (one standby). ThI tried to restart one of the active daemons to unstick a bunch of blocked requests, and the standby went into 'replay' for a very long time, then RAM on that MDS server filled up, and it just stayed there for a while then eventually appeared to give up and switched to the standby, but the cycle started again. So I restarted that MDS, and now I'm in a situation where I see this: # ceph fs status slugfs - 29 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 replay slugfs.pr-md-01.xdtppo 3958k 57.1k 12.2k 0 1 resolve slugfs.pr-md-02.sbblqq 0 3 1 0 POOL TYPE USED AVAIL cephfs_metadata metadata 997G 2948G cephfs_md_and_data data 0 87.6T cephfs_data data 773T 175T STANDBY MDS slugfs.pr-md-03.mclckv MDS version: ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable) It just stays there indefinitely. All my clients are hung. I tried restarting all MDS daemons and they just went back to this state after coming back up. Is there any way I can somehow escape this state of indefinite replay/resolve? Thanks so much! I'm kinda nervous since none of my clients have filesystem access at the moment... cheers, erich

1 week, 1 day

5
8
0 0

s3 bucket policy subusers - access denied

by sinan＠turka.nl

I want to achieve the following: - Create an user - Create 2 subusers - Create 2 buckets - Apply a policy for each bucket - A subuser should only have access to its own bucket Problem: Getting a 403 AccessDenied with subuser credentials when uploading files. I did the following: radosgw-admin user create --uid=foo-user --display_name="Foo Test User" radosgw-admin subuser create --uid=foo-user --gen-access-key --gen-secret --key-type=s3 --subuser=foo-user-subuser radosgw-admin subuser create --uid=foo-user --gen-access-key --gen-secret --key-type=s3 --subuser=foo-user-subuser2 Resulting in: { "user_id": "foo-user", "display_name": "Foo Test User", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [ { "id": "foo-user:foo-user-subuser", "permissions": "<none>" }, { "id": "foo-user:foo-user-subuser2", "permissions": "<none>" } ], "keys": [ { "user": "foo-user:foo-user-subuser", "access_key": "<key>", "secret_key": "<key>" }, { "user": "foo-user:foo-user-subuser2", "access_key": "<key>", "secret_key": "<key>" }, { "user": "foo-user", "access_key": "<key>", "secret_key": "<key>" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw", "mfa_ids": [] } Using the credentials of the main account (user: foo-user) creating buckets and setting policies: s3cmd mb s3://foo-bucket s3cmd mb s3://foo-bucket2 s3cmd setpolicy foo-test-subuser-policy s3://foo-bucket s3cmd setpolicy foo-test-subuser2-policy s3://foo-bucket2 Resulting in (I am showing just foo-bucket, but the same goes for foo-bucket2): # s3cmd info s3://foo-bucket s3://foo-bucket/ (bucket): Payer: BucketOwner Ownership: none Versioning:none Expiration rule: none Block Public Access: none Policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam:::user/foo-user:foo-user-subuser" ] }, "Action": [ "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:GetObject", "s3:ListBucketMultipartUploads", "s3:ListBucket", "s3:ListMultipartUploadParts", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::foo-bucket" ] } ] } CORS: none ACL: Foo Test User: FULL_CONTROL When I try to upload files (using the subuser foo-user-subuser credentials) it doesn't work: # s3cmd ls 2024-04-23 06:59 s3://foo-bucket 2024-04-23 10:05 s3://foo-bucket2 # s3cmd put ~/Documents/file_2.txt s3://foo-bucket upload: '/home/foo/Documents/file_2.txt' -> 's3://foo-bucket/file_2.txt' [1 of 1] 10 of 10 100% in 0s 18.96 B/s done ERROR: S3 error: 403 (AccessDenied) What is wrong with my policy? I thought that I did exactly the same earlier and it worked, but I am in doubt now.... Thanks!

1 week, 1 day

1
0
0 0

ceph api rgw/role

by farhad kh

hi , i used ceph api for create rgw/role but ther is not api for delete or edit rgw/role . how can i delete them or edit ?

1 week, 2 days

2
1
0 0

Question about PR merge

by Erich Weiler

Hello, We are tracking PR #56805: https://github.com/ceph/ceph/pull/56805 And the resolution of this item would potentially fix a pervasive and ongoing issue that needs daily attention in our cephfs cluster. I was wondering if it would be included in 18.2.3 which I *think* should be released soon? Is there any way of knowing if that is true? Thanks again, erich

1 week, 2 days

4
6
0 0

RGWs stop processing requests after upgrading to Reef

by Iain Stott

Hi, We have recently upgraded one of our clusters from Quincy 17.2.6 to Reef 18.2.1, since then we have had 3 instances of our RGWs stop processing requests. We have 3 hosts that run a single instance of RGW on each, and all 3 just seem to stop processing requests at the same time causing our storage to become unavailable. A restart or redeploy of the RGW service brings them back ok. The cluster was deployed using ceph ansible, but since we have adopted it to cephadm which is how the upgrade was performed. We have enabled debug logging as there was nothing out of the ordinary in normal logs and are currently sifting through them from the last crash. We are just wondering if it possible to run Quincy RGWs instead of Reef as we didn't have this issue prior to the upgrade? We have 3 clusters in a multisite setup, we are holding off on upgrading the other 2 clusters due to this issue. Thanks Iain Iain Stott OpenStack Engineer Iain.Stott(a)thg.com [THG Ingenuity Logo]<https://www.thg.com> www.thg.com<https://www.thg.com/> [LinkedIn]<https://www.linkedin.com/company/thgplc/?originalSubdomain=uk> [Instagram] <https://www.instagram.com/thg> [X] <https://twitter.com/thgplc?lang=en>

1 week, 2 days

2
2
0 0

Have a problem with haproxy/keepalived/ganesha/docker

by Ruslan Nurabayev

Hello! I've installed my 5-node CEPH cluster next install NFS server by command: ceph nfs cluster create nfshacluster 5 --ingress --virtual_ip 192.168.171.48/26 --ingress-mode haproxy-protocol. I don't understand fully how this must be works but when i stop NFS daemon even on one of this nodes I've see that writing on NFS shares is disappear (testing via vdbench). As i understand it is wrong and IO from stopped daemon must switching to another NFS daemon without any impact on IO. Can someone help me with troubleshoot this issue? Or explain how done full-fledged Active-Active HA NFS Cluster for production use. Thanks! Руслан Нурабаев Старший инженер Сектор ИТ платформы Отдел развития опорной сети Департамент развития сети +77012119272 Ruslan.Nurabayev(a)kcell.kz -----Original Message----- From: ceph-users-request(a)ceph.io <ceph-users-request(a)ceph.io> Sent: Thursday, April 11, 2024 15:07 To: Ruslan Nurabayev <Ruslan.Nurabayev(a)kcell.kz> Subject: Welcome to the "ceph-users" mailing list [You don't often get email from ceph-users-request(a)ceph.io. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Welcome to the "ceph-users" mailing list! To post to this list, send your email to: ceph-users(a)ceph.io You can unsubscribe or make adjustments to your options via email by sending a message to: ceph-users-request(a)ceph.io with the word 'help' in the subject or body (don't include the quotes), and you will get back a message with instructions. You will need your password to change your options, but for security purposes, this password is not included here. If you have forgotten your password you will need to reset it via the web UI. ________________________________ **************************************************************************************** Осы хабарлама және онымен берілетін кез келген файлдар құпия болып табылады және олар мекенжайда көрсетілген жеке немесе заңды тұлғалардың пайдалануына ғана арналған. Егер сіз болжамды алушы болып табылмайтын болсаңыз, осы арқылы осындай ақпаратты кез келген таратуға, жіберуге, көшіруге немесе пайдалануға қатаң тыйым салынатыны және осы электрондық хабарлама дереу жойылуға тиіс екендігін хабарлаймыз. KCELL осы хабарламадағы кез келген ақпараттың дәлдігіне немесе толықтығына қатысты ешқандай кепілдік бермейді және сол арқылы онда қамтылған ақпарат үшін немесе оны беру, қабылдау, сақтау немесе қандай да бір түрде пайдалану үшін кез келген жауапкершілікті болдырмайды. Осы хабарламада айтылған пікірлер тек жіберушіге ғана тиесілі және KCELL пікірін де білдіруі міндетті емес. Бұл электрондық хабарлама барлық танымал компьютерлік вирустарға тексерілді. **************************************************************************************** Данное сообщение и любые передаваемые с ним файлы являются конфиденциальными и предназначены исключительно для использования физическими или юридическими лицами, которым они адресованы. Если вы не являетесь предполагаемым получателем, настоящим уведомляем о том, что любое распространение, пересылка, копирование или использование такой информации строго запрещено, и данное электронное сообщение должно быть немедленно удалено. KCELL не дает никаких гарантий относительно точности или полноты любой информации, содержащейся в данном сообщении, и тем самым исключает любую ответственность за информацию, содержащуюся в нем, или за ее передачу, прием, хранение или использование каким-либо образом. Мнения, выраженные в данном сообщении, принадлежат только отправителю и не обязательно отражают мнение KCELL. Данное электронное сообщение было проверено на наличие всех известных компьютерных вирусов. **************************************************************************************** This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient you are hereby notified that any dissemination, forwarding, copying or use of any of the information is strictly prohibited, and the e-mail should immediately be deleted. KCELL makes no warranty as to the accuracy or completeness of any information contained in this message and hereby excludes any liability of any kind for the information contained therein or for the information transmission, reception, storage or use of such in any way whatsoever. The opinions expressed in this message belong to sender alone and may not necessarily reflect the opinions of KCELL. This e-mail has been scanned for all known computer viruses. ****************************************************************************************

1 week, 3 days

5
6
0 0

Multiple MDS Daemon needed?

by Erich Weiler

Hi All, We have a slurm cluster with 25 clients, each with 256 cores, each mounting a cephfs filesystem as their main storage target. The workload can be heavy at times. We have two active MDS daemons and one standby. A lot of the time everything is healthy but we sometimes get warnings about MDS daemons being slow on requests, behind on trimming, etc. I realize their may be a bug in play, but also, I was wondering if we simply didn't have enough MDS daemons to handle the load. Is there a way to know if adding a MDS daemon would help? We could add a third active MDS if needed. But I don't want to start adding a bunch of MDS's if that won't help. The OSD servers seem fine. It's mainly the MDS instances that are complaining. We are running reef 18.2.1. For reference, when things look healthy: # ceph fs status slugfs slugfs - 34 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active slugfs.pr-md-03.mclckv Reqs: 273 /s 2759k 2636k 362k 1079k 1 active slugfs.pr-md-01.xdtppo Reqs: 194 /s 868k 674k 67.3k 351k POOL TYPE USED AVAIL cephfs_metadata metadata 127G 3281G cephfs_md_and_data data 0 98.3T cephfs_data data 740T 196T STANDBY MDS slugfs.pr-md-02.sbblqq MDS version: ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable) # ceph -s cluster: id: 58bde08a-d7ed-11ee-9098-506b4b4da440 health: HEALTH_OK services: mon: 5 daemons, quorum pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d) mgr: pr-md-01.jemmdf(active, since 5w), standbys: pr-md-02.emffhz mds: 2/2 daemons up, 1 standby osd: 46 osds: 46 up (since 8d), 46 in (since 4w) data: volumes: 1/1 healthy pools: 4 pools, 1313 pgs objects: 271.17M objects, 493 TiB usage: 744 TiB used, 384 TiB / 1.1 PiB avail pgs: 1307 active+clean 4 active+clean+scrubbing 2 active+clean+scrubbing+deep io: client: 39 MiB/s rd, 108 MiB/s wr, 1.96k op/s rd, 54 op/s wr But when things are in "warning" mode, it looks like this: # ceph -s cluster: id: 58bde08a-d7ed-11ee-9098-506b4b4da440 health: HEALTH_WARN 1 filesystem is degraded 1 clients failing to advance oldest client/flush tid 1 MDSs report slow requests 1 MDSs behind on trimming services: mon: 5 daemons, quorum pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d) mgr: pr-md-01.jemmdf(active, since 5w), standbys: pr-md-02.emffhz mds: 2/2 daemons up, 1 standby osd: 46 osds: 46 up (since 8d), 46 in (since 4w) data: volumes: 1/1 healthy pools: 4 pools, 1313 pgs objects: 271.28M objects, 494 TiB usage: 746 TiB used, 382 TiB / 1.1 PiB avail pgs: 1307 active+clean 5 active+clean+scrubbing 1 active+clean+scrubbing+deep io: client: 55 MiB/s rd, 2.6 MiB/s wr, 15 op/s rd, 46 op/s wr And this: # ceph health detail HEALTH_WARN 2 clients failing to advance oldest client/flush tid; 2 MDSs report slow requests; 1 MDSs behind on trimming [WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest client/flush tid mds.slugfs.pr-md-01.xdtppo(mds.0): Client phoenix-06.prism failing to advance its oldest client/flush tid. client_id: 125780 mds.slugfs.pr-md-02.sbblqq(mds.1): Client phoenix-00.prism failing to advance its oldest client/flush tid. client_id: 99385 [WRN] MDS_SLOW_REQUEST: 2 MDSs report slow requests mds.slugfs.pr-md-01.xdtppo(mds.0): 4 slow requests are blocked > 30 secs mds.slugfs.pr-md-02.sbblqq(mds.1): 67 slow requests are blocked > 30 secs [WRN] MDS_TRIM: 1 MDSs behind on trimming mds.slugfs.pr-md-02.sbblqq(mds.1): Behind on trimming (109410/250) max_segments: 250, num_segments: 109410 The "cure" is the restart the active MDS daemons, one at a time. Then everything becomes healthy again, for a time. We also have the following MDS config items in play: mds_cache_memory_limit = 8589934592 mds_cache_trim_decay_rate = .6 mds_log_max_segments = 250 Thanks for any pointers! cheers, erich

1 week, 3 days

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users