February 2020 - ceph-users

by Jacek Suchenia

Hello I have a cluster, (Nautilus 14.2.4) where one pool I'd like to keep on a dedicated OSDs. So I setup a rule that covers *3* dedicated OSDs (using device classes) and assigned it to pool with replication factor *3*. Only 10% PGs were assigned and rebalanced, where rest of them stuck in *undersized* state. What mechanism prevents CRUSH algorithm to assign the same set of OSDs to all PGs in a pool? How can I control it? Jacek -- Jacek Suchenia jacek.suchenia(a)gmail.com

4 years, 2 months

3
5
0 0

Performance of old vs new hw?

by jesper＠krogh.cc

Hi We have some oldish servers with ssds - all on 25gbit nics. R815 AMD - 2,4ghz+ Is there significant performance benefits in moving to a new NVMe based, new cpus? +20% IOPs? + 50% IOPs? Jesper Sent from myMail for iOS

4 years, 2 months

3
2
0 0

EC Pools w/ RBD - IOPs

by Anthony Brandelli (abrandel)

Hi Ceph Community, Wondering what experiences good/bad you have with EC pools for iops intensive workloads (IE: 4Kish random IO from things like VMWare ESXi). I realize that EC pools are a tradeoff between more usable capacity, and having larger latency/lower iops, but in my testing the tradeoff for small IO seems to be much worse than I had anticipated. On an all flash 3x replicated pool we’re seeing 45k random read, and 35k random write iops testing with fio on a client living on an iSCSI LUN presented to an ESXi host. Average latencies for these ops are 4.2ms, and 5.5ms, which is respectable at an io depth of 32. Take this same setup with an EC pool (k=2, m=1, tested with both ISA and jerasure, ISA does give better performance for our use case) and we see 30k random read, and 16k random write iops. Random reads see 6.5ms average, while random writes suffer with 12ms average. Are others using EC pools seeing similar hits to random writes with small IOs? Any way to improve this? Thanks, Anthony

4 years, 2 months

4
5
0 0

ceph status reports: slow ops - this is related to long running process /usr/bin/ceph-osd

by Thomas

Hi, ceph status reports: root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_ERR 1 filesystem is degraded 1 filesystem has a failed mds daemon 1 filesystem is offline insufficient standby MDS daemons available 4 nearfull osd(s) 1 pool(s) nearfull Reduced data availability: 59 pgs inactive, 16 pgs peering Degraded data redundancy: 597/153910758 objects degraded (0.000%), 2 pgs degraded, 1 pg undersized Degraded data redundancy (low space): 23 pgs backfill_toofull 1 pgs not deep-scrubbed in time 4 pgs not scrubbed in time 3 pools have too many placement groups 164 slow requests are blocked > 32 sec 1082 stuck requests are blocked > 4096 sec 1490 slow ops, oldest one blocked for 19711 sec, daemons [osd,0,osd,175,osd,186,osd,5,osd,6,osd,63,osd,68,osd,9,mon,ld5505,mon,ld5506]... have slow ops. services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 5h) mgr: ld5507(active, since 5h), standbys: ld5506, ld5505 mds: pve_cephfs:0/1, 1 failed osd: 419 osds: 416 up, 416 in; 6024 remapped pgs data: pools: 6 pools, 8864 pgs objects: 51.30M objects, 196 TiB usage: 594 TiB used, 907 TiB / 1.5 PiB avail pgs: 0.666% pgs not active 597/153910758 objects degraded (0.000%) 52964415/153910758 objects misplaced (34.412%) 5954 active+remapped+backfill_wait 2786 active+clean 40 active+remapped+backfilling 35 activating 23 active+remapped+backfill_wait+backfill_toofull 16 peering 7 activating+remapped 1 activating+undersized+degraded 1 active+clean+scrubbing 1 active+recovering+degraded io: client: 3.5 KiB/s wr, 0 op/s rd, 0 op/s wr recovery: 551 MiB/s, 137 objects/s I'm concerned about the slow ops on osd.0 and osd.9. On the relevant OSD node I can see 2 relevant services running for hours: ceph 14795 1 99 09:58 ? 08:49:22 /usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup ceph ceph 15394 1 99 09:58 ? 07:10:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph In the relevant osd log I can find similar messages: root@ld5505:~# tail -f /var/log/ceph/ceph-osd.0.log 2019-10-08 15:35:32.830 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) 2019-10-08 15:35:33.806 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) 2019-10-08 15:35:34.842 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) 2019-10-08 15:35:35.862 7ff60c7cc700 -1 osd.0 233323 get_health_metrics reporting 236 slow ops, oldest is osd_pg_create(e233257 38.0:199987) root@ld5505:~# tail -f /var/log/ceph/ceph-osd.9.log 2019-10-08 15:35:38.822 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) 2019-10-08 15:35:39.854 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) 2019-10-08 15:35:40.850 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) 2019-10-08 15:35:41.862 7f8957599700 -1 osd.9 233407 get_health_metrics reporting 818 slow ops, oldest is osd_op(client.53385387.0:23 30.f7 30.bcc140f7 (undecoded) ondisk+retry+read+known_if_redirected e233362) Question: How can I analyse and solve the issue with slow ops? THX

4 years, 2 months

3
2
0 0

Re: 转发: Causual survey on the successful usage of CephFS on production

by Marc Roos

cephfs is considered to be stable. I am using it with only one mds for 2-3 years in low load environment without any serious issues. -----Original Message----- Sent: 17 February 2020 16:07 To: ceph-users Subject: [ceph-users] =?eucgb2312_cn?q?=D7=AA=B7=A2=3A_Causual_survey_on_the_successful_usage_ of_CephFS_on_production?= Dear Folks， I am planning a files systems project and CephFS is under serious consideration. But browsering on the web i found some negative comments on CephFS about stability and losing data. So i would like to learn more on the latest development about CephFS statbility, either a successful story or a failure Could you please speak out, if you have recently successfully deployed CephFS on production, and which version? any statbility issues with CephFS? thanks a lot, samuel huxiaoyu(a)horebdata.cn _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

4 years, 2 months

1
0
0 0

转发: Causual survey on the successful usage of CephFS on production

by huxiaoyu＠horebdata.cn

Dear Folks， I am planning a files systems project and CephFS is under serious consideration. But browsering on the web i found some negative comments on CephFS about stability and losing data. So i would like to learn more on the latest development about CephFS statbility, either a successful story or a failure Could you please speak out, if you have recently successfully deployed CephFS on production, and which version? any statbility issues with CephFS? thanks a lot, samuel huxiaoyu(a)horebdata.cn

4 years, 2 months

1
0
0 0

cephfs metadata

by Frank R

Hi all, Is there a way to estimate how much storage space is required for CephFS metadata given an expected number of files in the filesystem? thx Frank

4 years, 2 months

1
0
0 0

Re: centos7 / nautilus where to get kernel 5.5 from?

by Ilya Dryomov

On Fri, Feb 14, 2020 at 3:19 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > > I have default centos7 setup with nautilus. I have been asked to install > 5.5 to check a 'bug'. Where should I get this from? I read that the > elrepo kernel is not compiled like rhel. Hi Marc, I'm not sure what you mean by "not compiled like RHEL". Follow [1] to enable elrepo repository and then: $ sudo yum --enablerepo=elrepo-kernel install kernel-ml "ml" stands for "mainline". This is the most recent mainline stable kernel from kernel.org, it should get you 5.5.3. [1] http://elrepo.org/tiki/tiki-index.php Thanks, Ilya

4 years, 2 months

2
1
0 0

Excessive write load on mons after upgrade from 12.2.13 -> 14.2.7

by Peter Woodman

Hey, I've been running a ceph cluster of arm64 SOCs on Luminous for the past year or so, with no major problems. I recently upgraded to 14.2.7, and the stability of the cluster immediately suffered. Seemed like any mon activity was subject to long pauses, and the cluster would hang frequently. Looking at ceph -s, it appeared the cluster was electing new masters very frequently - masters didn't seem to last longer than about 1-2 minutes. Looking further at the mons, two out of three of which are running on relatively slow-performing SD card storage on these SoCs, I saw them absolutely maxing out the root device's IO with writes. Logs show rocksdb constantly running compactions. I temporarily moved these mons to devices with better performing IO (but shouldn't be mons as they're also cephfs clients) and saw a sustained write rate of ~50MB/s. This seems pretty excessive, and is at least an order of magnitude higher than anything I saw when running Luminous. Not to mention this isn't really nice for SSD lifespan. As downgrading is not an option here, is there anything I can look to to figure out what exactly the mons are doing and how to prevent such heavy load? I seem to remember some bug related to telemetry, but can't find it on this list..

4 years, 2 months

4
3
0 0

Erasure Profile Pool caps at pg_num 1024

by Gunnar Bandelow

Hello Everyone, i've run into problems with placement groups. We have a 12 Host Ceph-Cluster with 408 OSDs (hdd and ssd). I create a replicated pool with a large pg_num (16384). No problems everything works. If I do this with an erasure pool, i get a warning which is fixable by extending the max_pgs_per_osd and afterwards i get a health_warn in the ceph status due too few PGs. Checking the pg_num after pool creation, it is capped at 1024. I'm stuck at this point. Maybe i did something fundamentally wrong? To illustrate my steps I tried to summarize everything in a small example: # ceph -v ceph version 14.2.7 (fb8e34a687d76cd3bd45c2a0fb445432ab69b4ff) nautilus (stable) # ceph osd erasure-code-profile get myerasurehdd crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=7 m=5 plugin=jerasure technique=reed_sol_van w=8 # ceph osd crush rule dump sas_rule { "rule_id": 0, "rule_name": "sas_rule", "ruleset": 0, "type": 3, "min_size": 1, "max_size": 12, "steps": [ { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "rack" }, { "op": "emit" } ] } # ceph osd pool create sas-pool 16384 16384 erasure myerasurehdd sas_rule Error ERANGE: pg_num 16384 size 12 would mean 196704 total pgs, which exceeds max 102000 (mon_max_pg_per_osd 250 * num_in_osds 408) # ceph tell mon.\* injectargs '--mon-max-pg-per-osd=500' mon.ceph-fs01: injectargs:mon_max_pg_per_osd = '500' (not observed, change may require restart) mon.ceph-fs05: injectargs:mon_max_pg_per_osd = '500' (not observed, change may require restart) mon.ceph-fs09: injectargs:mon_max_pg_per_osd = '500' (not observed, change may require restart) ceph-fs01:/opt/ceph-setup# ceph osd pool create sas-pool 16384 16384 erasure myerasurehdd sas_rule pool 'sas-pool' created # ceph -s cluster: id: b9471b57-95a2-4e58-8f69-b5e6048bea7c health: HEALTH_WARN Reduced data availability: 1024 pgs incomplete too few PGs per OSD (7 < min 30) # ceph osd pool get sas-pool pg_num pg_num: 1024 Best regards, Gunnar

4 years, 2 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2020