November 2019 - ceph-users

by Mike Perez

Hi Cephers, I'm happy to announce the availability of the schedule for Ceph + Rook Day San Diego. Registration for this event will be free until Tuesday 11:59 UTC, so register now: https://ceph.io/cephdays/ceph-rook-day-san-diego-2019/ We still have some open spots in the schedule, but the lineup is already looking great. We are looking for more Rook related topics. If you're interested, you can submit them through our CFP form for the selection committee to review: https://zfrmz.com/hkg8EF9NYb6IvWnvoRUi We currently have SoftIron, Red Hat and SUSE set as sponsors for this event, we'd like to invite more to contact us at events(a)ceph.io: https://ceph.io/wp-content/uploads/2019/11/Ceph-Day-Partner-Sponsorship.pdf -- Mike Perez he/him Ceph Community Manager M: +1-951-572-2633 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA @Thingee <https://twitter.com/thingee> Thingee <https://www.linkedin.com/thingee> <https://www.facebook.com/RedHatInc> <https://www.redhat.com>

4 years, 5 months

1
0
0 0

Run optimizer to create a new plan on specific pool fails

by Thomas Schneider

Hi, I want to create an optimizer plan on each pool. My cluster has multiple crush roots, and multiple pools each representing a specific drive (HDD, SSD, NVME). Some pools are balanced, some are not. Therefore I want to run optimizer to create a new plan on specific pool. However this fails for any pool with this error message: root@ld3955:~# ceph balancer optimize hdd-plan hdd Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect root@ld3955:~# ceph balancer optimize ssd-plan ssd Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect root@ld3955:~# ceph balancer optimize hdb_backup-plan hdb_backup Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect root@ld3955:~# ceph osd pool ls hdb_backup hdd ssd nvme cephfs_data cephfs_metadata What is causing this error? THX

4 years, 5 months

1
0
0 0

RocksDB device selection (performance requirements)

by Huseyin Cotuk

Hi all, The only recommendation I can find about db device selection is about the capacity (4% of the data disk) on the documents. Is there any suggestions about technical specs like throughput, IOPS and db device per data disk? While designing a specific infrastructure with filestore, we were looking its specs to meet requirements of all the disks behind the journal device. But in bluestore, data is directly written into data device via bluefs adapter while metadata is written into db (RocksDB) device. I know that it depends on the workload, but is there any best practice or recommendation about selection of db device? IMO, using NVME disks that we used for filestore journals as db devices is not meaningful. Because NVME disks have minimal latency and extraordinary throughput and IOPS performance. So I am not sure that DB device needs that kind of performance. So I want to use those NVME disks for a full flash pool, and choose another disks for db device. Any suggestion or recommendation would be appreciated. Best regards, Huseyin Cotuk hcotuk(a)gmail.com

4 years, 5 months

1
0
0 0

Balancer is active, but not balancing

by Thomas

Hi, I activated balancer in order to balance data distribution: root@ld3955:~# ceph balancer status { "active": true, "plans": [], "mode": "upmap" } However, the data stored on 1.6TB HDD in specific pool "hdb_backup" is not balanced; the range starts with osd.265 size: 1.6 usage: 52.83 reweight: 1.00000 and ends with osd.145 size: 1.6 usage: 80.19 reweight: 1.00000 The affected drives are located on 4 nodes. The result is that not all available disk space is available for usage. I have attached pastebin <https://pastebin.com/dNyEwNR0> with - ceph osd df sorted by usage - ceph osd df tree Please advise how to start balancer to correct data distribution. THX

4 years, 5 months

1
0
0 0

iSCSI write performance

by Ryan

I'm in the process of testing the iscsi target feature of ceph. The cluster is running ceph 14.2.4 and ceph-iscsi 3.3. It consists of 5 hosts with 12 SSD OSDs per host. Some basic testing moving VMs to a ceph backed datastore is only showing 60MB/s transfers. However moving these back off the datastore is fast at 200-300MB/s. What should I be looking at to track down the write performance issue? In comparison with the Nimble Storage arrays I can see 200-300MB/s in both directions. Thanks, Ryan

4 years, 5 months

8
20
0 0

Weird blocked OP issue.

by Robert LeBlanc

We had an OSD host with 13 OSDs fail today and we have a weird blocked OP message that I can't understand. There are no OSDs with blocked ops, just `mon` (multiple times), and some of the rgw instances. cluster: id: 570bcdbb-9fdf-406f-9079-b0181025f8d0 health: HEALTH_WARN 1 large omap objects Degraded data redundancy: 2083023/195702437 objects degraded (1.064%), 880 pgs degraded, 880 pgs undersized 1609 pgs not deep-scrubbed in time 4 slow ops, oldest one blocked for 506699 sec, daemons [mon,sun-gcs02-rgw01,mon,sun-gcs02-rgw02,mon,sun-gcs02-rgw03] have slow ops. services: mon: 3 daemons, quorum sun-gcs02-rgw01,sun-gcs02-rgw02,sun-gcs02-rgw03 (age 6m) mgr: sun-gcs02-rgw02(active, since 5d), standbys: sun-gcs02-rgw03, sun-gcs02-rgw04 osd: 767 osds: 754 up (since 10m), 754 in (since 104m); 880 remapped pgs rgw: 16 daemons active (sun-gcs02-rgw01.rgw0, sun-gcs02-rgw01.rgw1, sun-gcs02-rgw01.rgw2, sun-gcs02-rgw01.rgw3, sun-gcs02-rgw02.rgw0, sun-gcs02-rgw02.rgw1, sun-gcs02-rgw02.rgw2, sun-gcs02-rgw02.rgw3, sun-gcs02-rgw03.rgw0, sun-gcs02-rgw03.rgw1, sun-gcs02-rgw03.rgw2, s un-gcs02-rgw03.rgw3, sun-gcs02-rgw04.rgw0, sun-gcs02-rgw04.rgw1, sun-gcs02-rgw04.rgw2, sun-gcs02-rgw04.rgw3) data: pools: 7 pools, 8240 pgs objects: 19.57M objects, 52 TiB usage: 88 TiB used, 6.1 PiB / 6.2 PiB avail pgs: 2083023/195702437 objects degraded (1.064%) 43492/195702437 objects misplaced (0.022%) 7360 active+clean 868 active+undersized+degraded+remapped+backfill_wait 12 active+undersized+degraded+remapped+backfilling io: client: 150 MiB/s rd, 642 op/s rd, 0 op/s wr recovery: 626 MiB/s, 223 objects/s $ ceph versions { "mon": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 754 }, "mds": {}, "rgw": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 16 }, "overall": { "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 754, "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 22 } } I restarted one of the monitors and it dropped out of the list only showing 2 blocked ops, but then showed up again a little while later. Any ideas on where to look? Thanks, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

4 years, 5 months

1
1
0 0

Ceph Health error right after starting balancer

by Thomas Schneider

Hi, after enabling ceph balancer (with command ceph balancer on) the health status changed to error. This is the current output of ceph health detail: root@ld3955:~# ceph health detail HEALTH_ERR 1438 slow requests are blocked > 32 sec; 861 stuck requests are blocked > 4096 sec; mon ld5505 is low on available space REQUEST_SLOW 1438 slow requests are blocked > 32 sec 683 ops are blocked > 2097.15 sec 436 ops are blocked > 1048.58 sec 191 ops are blocked > 524.288 sec 78 ops are blocked > 262.144 sec 35 ops are blocked > 131.072 sec 11 ops are blocked > 65.536 sec 4 ops are blocked > 32.768 sec osd.62 has blocked requests > 65.536 sec osds 39,72 have blocked requests > 262.144 sec osds 6,19,67,173,174,187,188,269,434 have blocked requests > 524.288 sec osds 8,16,35,36,37,61,63,64,68,73,75,178,186,271,369,420,429,431,433,436 have blocked requests > 1048.58 sec osds 3,5,7,24,34,38,40,41,59,66,69,74,180,270,370,421,432,435 have blocked requests > 2097.15 sec REQUEST_STUCK 861 stuck requests are blocked > 4096 sec 25 ops are blocked > 8388.61 sec 836 ops are blocked > 4194.3 sec osds 2,28,29,32,60,65,181,185,268,368,423,424,426 have stuck requests > 4194.3 sec osds 0,30,70,71,184 have stuck requests > 8388.61 sec I understand that when balancer starts shifting PGs to other OSDs that this caused IO load on the cluster. However I don't understand why this is affecting OSD so heavily. And I don't understand why OSD of specific type (SSD, NVME) suffer although there's no balancing occuring on them. Regards Thomas

4 years, 5 months

3
4
0 0

subtrees have overcommitted (target_size_bytes / target_size_ratio)

by Lars Täuber

Hello everybody! What does this mean? health: HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes 1 subtrees have overcommitted pool target_size_ratio and what does it have to do with the autoscaler? When I deactivate the autoscaler the warning goes away. $ ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE cephfs_metadata 15106M 3.0 2454G 0.0180 0.3000 4.0 256 on cephfs_data 113.6T 1.5 165.4T 1.0306 0.9000 1.0 512 on $ ceph health detail HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes Pools ['cephfs_data'] overcommit available storage by 1.031x due to target_size_bytes 0 on pools [] POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio Pools ['cephfs_data'] overcommit available storage by 1.031x due to target_size_ratio 0.900 on pools ['cephfs_data'] Thanks Lars

4 years, 5 months

3
10
0 0

librados aysnc I/O takes considerably longer to complete

by Ponnuvel Palaniyappan

Hi, Is anyone using librados AIO APIs? I seem to have a problem with that where the rados_aio_wait_for_complete() call just waits for a long period of time before it finishes without error. More info on my setup: I am using Ceph 14.2.4 and write 8MB objects. I run my AIO program on 24 nodes at the same time each writing a different data (splits into 8MB objects and ), each data is about 2G. Normally, it takes about 10 mins for all of them to complete. But often one or more nodes takes considerably longer to finish. When looking at the one of those, I mostly see that the IO requests have been submitted and waits at: #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00002aaaaad0c8fa in rados_aio_wait_for_complete () from /cgv/geovation/2/test/ceph/lib/librados.so.2 Then it eventually completes with no errors from rados_aio_wait_for_complete() call. The (pseudo) code looks like: while (data remains to be written) { size_t aio_ops_count = 0; rados_completion_t aio_comp[12]; for (size_t j = 0; j < 12; ++j) { int err = rados_aio_create_completion(NULL, NULL, NULL, &aio_comp[j]); if (err < 0) { cerr << "rados_aio_create_completion: " << strerror(-err) << endl; return 1; } string obj_ = getobjectid(); err = rados_aio_write_full(io, obj_.c_str(), aio_comp[j], read_buf[j], bytes); if (err < 0) { cerr << "rados_write_full: " << strerror(-err) << endl; return 1; } ++aio_ops_count; } for (size_t j = 0; j < aio_ops_count; ++j) { rados_aio_wait_for_complete(aio_comp[j]); int err = rados_aio_get_return_value(aio_comp[j]); // Considerably longer delay here ?? if (err < 0) { cerr << "rados_aio_get_return_value: " << strerror(-err) << endl; return 1; } rados_aio_release(aio_comp[j]); } } I ran under Valgrind and see no issues and also read the data back and checksum it to verify no corruption issues. So everything appears to "work" as expected except for longer delays at times. Wondering if anyone is using the AIO APIs to write objects and had experienced any similar problems. Please let me know if you need further information. (Originally posted this to dev(a)ceph.io and on Daniel's suggestion, I am posting here). Regards, Ponnuvel P -- Regards, Ponnuvel P

4 years, 5 months

1
0
0 0

V/v Multiple pool for data in Ceph object

by tuan dung

hi ceph-users, i have a cluster run ceph object using version 14.2.1. I want to creat 2 pool for bucket data for purposes for security: + one bucket-data pool for public client access from internet (name *zone1.rgw.buckets.data-pub) * + one bucket-data pool for private client access from local network (name *zone1.rgw.buckets.data-pub)* each pool bucket-data has one individual access key: access key public (access pool public) and access key private (access pool private). Can you give me a recomment for this or bestpractice that you've done? what needs to be done? Or give me your best solution for securiy a cluster ceph object with public client access and private client access? Thank you very much Br, ---------------------------------------------- Dương Tuấn Dũng Email: dungdt.aicgroup(a)gmail.com ĐT: 0986153686

4 years, 5 months

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2019