September 2019 - ceph-users

how many monitor should to deploy in a 1000+ osd cluster

by 展荣臻（信泰）

hi all: I have a production cluster, and it had 24 hosts (528 osds,3mons) at a former. Now we want to add 36 hosts so the osd increase to 1320 . does the monitor need to increase?how many numbers of monitor node is recommended? Another question is which monitor does monclient commnuicate with? And how it decide? Any suggestions are welcome!

4 years, 6 months

7
7
0 0

Re: download.ceph.com repository changes

by Alfredo Deza

Reviving this old thread. I still think this is something we should consider as users still experience problems: * Impossible to 'pin' to a version. User installs 14.2.0 and 4 months later they add other nodes but version moved to 14.2.2 * Impossible to use a version that is not what the latest is (e.g. if someone doesn't need the release from Monday, but wants the one from 6 months ago), similar to the above * When a release is underway, the repository breaks because syncing packages takes hours. The operation is not atomic. * It is not currently possible to "remove" a bad release, in the past, this means cutting a new release as soon as possible, which can take days The latest issue (my fault!) was to cut a release and get the packages out without communicating with the release manager, which caused users to note there is a new version *as soon as it was up* vs, a process that could've not touched the 'latest' url until the announcement goes out. If you have been affected by any of these issues (or others I didn't come up with), please let us know in this thread so that we can find some common ground and try to improve the process. Thanks! On Tue, Jul 24, 2018 at 10:38 AM Alfredo Deza <adeza(a)redhat.com> wrote: > > Hi all, > > After the 12.2.6 release went out, we've been thinking on better ways > to remove a version from our repositories to prevent users from > upgrading/installing a known bad release. > > The way our repos are structured today means every single version of > the release is included in the repository. That is, for Luminous, > every 12.x.x version of the binaries is in the same repo. This is true > for both RPM and DEB repositories. > > However, the DEB repos don't allow pinning to a given version because > our tooling (namely reprepro) doesn't construct the repositories in a > way that this is allowed. For RPM repos this is fine, and version > pinning works. > > To remove a bad version we have to proposals (and would like to hear > ideas on other possibilities), one that would involve symlinks and the > other one which purges the known bad version from our repos. > > *Symlinking* > When releasing we would have a "previous" and "latest" symlink that > would get updated as versions move forward. It would require > separation of versions at the URL level (all versions would no longer > be available in one repo). > > The URL structure would then look like: > > debian/luminous/12.2.3/ > debian/luminous/previous/ (points to 12.2.5) > debian/luminous/latest/ (points to 12.2.7) > > Caveats: the url structure would change from debian-luminous/ to > prevent breakage, and the versions would be split. For RPMs it would > mean a regression if someone is used to pinning, for example pinning > to 12.2.2 wouldn't be possible using the same url. > > Pros: Faster release times, less need to move packages around, and > easier to remove a bad version > > > *Single version removal* > Our tooling would need to go and remove the known bad version from the > repository, which would require to rebuild the repository again, so > that the metadata is updated with the difference in the binaries. > > Caveats: time intensive process, almost like cutting a new release > which takes about a day (and sometimes longer). Error prone since the > process wouldn't be the same (one off, just when a version needs to be > removed) > > Pros: all urls for download.ceph.com and its structure are kept the same.

4 years, 6 months

9
11
0 0

Ceph NIC partitioning (NPAR)

by Adrien Georget

Hi, I need your advice about the following setup. Currently, we have a Ceph nautilus cluster used by Openstack Cinder with single NIC in 10Gbps on OSD hosts. We will upgrade the cluster by adding 7 new hosts dedicated to Nova/Glance and we would like to add a cluster network to isolate replication and recovery traffic. For now, it's not possible to add a second NIC and FC so we are thinking about enabling DELL NPAR [1] which allows splitting a single physical NIC in 2 logical NICs (1 for public network and 1 for Cluster network). We can set max and min bandwidth and implement dynamic bandwidth balancing for NPAR to get the appropriate bandwidth when Ceph need it (default alloc is 66% for cluster network and 34% for public network). Any experiences with this kind of configuration? Do you see any disadvantages doing this? And one question, if we put this in production, adding cluster network value in ceph.conf and restarting each OSD is enough for Ceph? Best, Adrien [1] https://www.dell.com/support/article/fr/fr/frbsdt1/how12596/how-npar-works?…

4 years, 6 months

2
1
0 0

Announcing Ceph Buenos Aires 2019 on Oct 16th at Museo de Informatica

by Victoria Martinez de la Cruz

Hi all, I'm happy to announce that next Oct 16th we will have the Ceph Day Argentina in Buenos Aires. The event will be held in the Museo de Informatica de Argentina, so apart from hearing the latest features from core developers, real use cases from our users and usage experiences from customers and partners, you will be able to enjoy and contribute to this fantastic museum that holds a great collection of vintage hardware. If you are a local and/or you are in the area, we really hope you can join us! CFP is open in https://forms.zohopublic.com/thingee/form/CephDayArgentina2019/formperma/yf… Ceph Day Buenos Aires site is available in https://ceph.io/cephdays/ceph-day-argentina-2019/ Cheers, Victoria

4 years, 6 months

1
0
0 0

Wrong %USED and MAX AVAIL stats for pool

by nalexandrov＠innologica.com

Hi everyone, We are running Nautilus 14.2.2 with 6 nodes and a total of 44 OSDs, all are 2TB spinning disks. # ceph osd count-metadata osd_objectstore "bluestore": 44 # ceph osd pool get one size size: 3 # ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 80 TiB 33 TiB 47 TiB 47 TiB 58.26 TOTAL 80 TiB 33 TiB 47 TiB 47 TiB 58.26 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL one 2 15 TiB 4.06M 47 TiB 68.48 7.1 TiB bench 5 250 MiB 67 250 MiB 0 21 TiB Why pool's stats are showing incorrect values for %USED and MAX AVAIL? They should be much bigger. The first 24 OSDs was created on jewell release and the osd_objectstore was 'filestore'. While we were with mimic release, we added 20 more 'bluestore' OSDs. The first 24 was destroyed and recreated as 'bluestore'. After the upgrade from mimic release, all the OSD's was updated with ceph-bluestore-tool repair. The incorrect values appeared after the upgrade from 14.2.1 to 14.2.2. Any help will be appreciated :) BR, NAlexandrov

4 years, 6 months

2
1
0 0

OSD rebalancing issue - should drives be distributed equally over all nodes

by Thomas

Hi, I'm facing several issues with my ceph cluster (2x MDS, 6x ODS). Here I would like to focus on the issue with pgs backfill_toofull. I assume this is related to the fact that the data distribution on my OSDs is not balanced. This is the current ceph status: root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_ERR 1 MDSs report slow metadata IOs 78 nearfull osd(s) 1 pool(s) nearfull Reduced data availability: 2 pgs inactive, 2 pgs peering Degraded data redundancy: 304136/153251211 objects degraded (0.198%), 57 pgs degraded, 57 pgs undersized Degraded data redundancy (low space): 265 pgs backfill_toofull 3 pools have too many placement groups 74 slow requests are blocked > 32 sec 80 stuck requests are blocked > 4096 sec services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m) mgr: ld5505(active, since 3d), standbys: ld5506, ld5507 mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby osd: 368 osds: 368 up, 367 in; 302 remapped pgs data: pools: 5 pools, 8868 pgs objects: 51.08M objects, 195 TiB usage: 590 TiB used, 563 TiB / 1.1 PiB avail pgs: 0.023% pgs not active 304136/153251211 objects degraded (0.198%) 1672190/153251211 objects misplaced (1.091%) 8564 active+clean 196 active+remapped+backfill_toofull 57 active+undersized+degraded+remapped+backfill_toofull 35 active+remapped+backfill_wait 12 active+remapped+backfill_wait+backfill_toofull 2 active+remapped+backfilling 2 peering io: recovery: 18 MiB/s, 4 objects/s Currently I'm using 6 OSD nodes. Node A 48x 1.6TB HDD Node B 48x 1.6TB HDD Node C 48x 1.6TB HDD Node D 48x 1.6TB HDD Node E 48x 7.2TB HDD Node F 48x 7.2TB HDD Question: Is it advisable to distribute the drives equally over all nodes? If yes, how should this be executed w/o ceph disruption? Regards Thomas

4 years, 6 months

3
5
0 0

verify_upmap number of buckets 5 exceeds desired 4

by Eric Dold

Hello, I'm running ceph 14.2.3 on six hosts with each four osds. I did recently upgrade this from four hosts. The cluster is running fine. But i get this in my logs: Sep 11 11:02:41 ceph1 ceph-mon[1333]: 2019-09-11 11:02:41.953 7f26023a6700 -1 verify_upmap number of buckets 5 exceeds desired 4 Sep 11 11:02:41 ceph1 ceph-mon[1333]: 2019-09-11 11:02:41.953 7f26023a6700 -1 verify_upmap number of buckets 5 exceeds desired 4 Sep 11 11:02:41 ceph1 ceph-mon[1333]: 2019-09-11 11:02:41.953 7f26023a6700 -1 verify_upmap number of buckets 5 exceeds desired 4 It looks like the balancer is not doing any work. Here are some infos about the cluster: ceph1 ~ # ceph osd crush rule ls replicated_rule cephfs_ec ceph1 ~ # ceph osd crush rule dump replicated_rule { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ceph1 ~ # ceph osd crush rule dump cephfs_ec { "rule_id": 1, "rule_name": "cephfs_ec", "ruleset": 1, "type": 3, "min_size": 8, "max_size": 8, "steps": [ { "op": "set_chooseleaf_tries", "num": 5 }, { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -1, "item_name": "default" }, { "op": "choose_indep", "num": 4, "type": "host" }, { "op": "choose_indep", "num": 2, "type": "osd" }, { "op": "emit" } ] } ceph1 ~ # ceph osd erasure-code-profile ls default isa_62 ceph1 ~ # ceph osd erasure-code-profile get default k=2 m=1 plugin=jerasure technique=reed_sol_van ceph1 ~ # ceph osd erasure-code-profile get isa_62 crush-device-class= crush-failure-domain=osd crush-root=default k=6 m=2 plugin=isa technique=reed_sol_van The idea with four hosts was that the ec profile should take two osds on each host for the eight buckets. Now with six hosts i guess two hosts will have tow buckets on two osds and four hosts will have each one bucket for a piece of data. Any idea how to resolve this? Regards Eric

4 years, 6 months

1
2
0 0

configuration of Ceph-ISCSI gateway

by Gesiel Galvão Bernardes

Hi everyone, I'm configurating ISCSI gateway in Ceph Mimic (13.2.6) using ceph manual: https://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/ But i stopped in this problem: In manual says: "Set the client’s CHAP username to myiscsiusername and password to myiscsipassword: > /iscsi-target...at:rh7-client> auth chap=myiscsiusername/myiscsipassword" But I receive this response: /iscsi-target...at:rh7-client> auth chap=myiscsitest/myiscsitestpasswd Unexpected keyword parameter 'chap'. The options disponibles are: /iscsi-target...at:rh7-client> auth ? To set authentication, specify username=<user> password=<password> [mutual_username]=<user> [mutual_password]=<password> But if configure as asks: auth username=myiscsitest password=myiscsitestpasswd Failed to update the client's auth: Invalid password I tried with high password complexibility, but the problem persists. My questions: - How is the correct mode for configure authentication? - How contribute for update of documentation? A bug report has opened* for broken information of instalation of ceph-iscsi-gw, but was closed without update of documentation:https://github.com/ceph/ceph-ansible/issues/2707 Regards Gesiel Bernardeds

4 years, 7 months

2
1
0 0

RGW orphaned shadow objects

by P. O.

Hi All, I have a question about "orphaned" objects in default.rgw.buckets.data pool. Few days ago i ran "radosgw-admin orphans find ..." [dc-1 root@mon-1 tmp]$ radosgw-admin orphans list-jobs [ "orphans-find-1" ] Today I checked the result. I listed orphaned objects by command: $# for i in `rados -p default.rgw.log ls |grep orphan.scan.orphans-find-1.rados`; do rados -p default.rgw.log listomapkeys $i; done > orphaned_objects.txt There are a lot: xxx.__shadow_.yyy objects. Is it possible to check if these __shadow_ are orphaned (can be removed) or belongs to any good object? How to check if a shadow object is still in use and what object it belongs to? Some of them are very old. For example: default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_1 mtime 2017-08-16 04:01:49.000000, size 4194304 default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_2 mtime 2017-08-16 04:01:49.000000, size 4194304 default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_4 mtime 2017-08-16 04:01:49.000000, size 4194304 default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_5 mtime 2017-08-16 04:01:49.000000, size 4194304 default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_6 mtime 2017-08-16 04:01:49.000000, size 4194304 Best regards, PO

4 years, 7 months

2
1
0 0

Health error: 1 MDSs report slow metadata IOs, 1 MDSs report slow requests

by Thomas

Hi, ceph health reports 1 MDSs report slow metadata IOs 1 MDSs report slow requests This is the complete output of ceph -s: root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_ERR 1 MDSs report slow metadata IOs 1 MDSs report slow requests 72 nearfull osd(s) 1 pool(s) nearfull Reduced data availability: 33 pgs inactive, 32 pgs peering Degraded data redundancy: 123285/153918525 objects degraded (0.080%), 27 pgs degraded, 27 pgs undersized Degraded data redundancy (low space): 116 pgs backfill_toofull 3 pools have too many placement groups 54 slow requests are blocked > 32 sec 179 stuck requests are blocked > 4096 sec services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 21h) mgr: ld5507(active, since 21h), standbys: ld5506, ld5505 mds: pve_cephfs:1 {0=ld3955=up:active} 1 up:standby osd: 368 osds: 368 up, 368 in; 140 remapped pgs data: pools: 6 pools, 8872 pgs objects: 51.31M objects, 196 TiB usage: 591 TiB used, 561 TiB / 1.1 PiB avail pgs: 0.372% pgs not active 123285/153918525 objects degraded (0.080%) 621911/153918525 objects misplaced (0.404%) 8714 active+clean 90 active+remapped+backfill_toofull 26 active+undersized+degraded+remapped+backfill_toofull 16 peering 16 remapped+peering 7 active+remapped+backfill_wait 1 activating 1 active+recovery_wait+degraded 1 active+recovery_wait+undersized+remapped In the log I find these relevant entries: 2019-09-24 13:24:37.073695 mds.ld3955 [WRN] 2 slow requests, 0 included below; oldest blocked for > 18618.873983 secs 2019-09-24 13:24:42.073757 mds.ld3955 [WRN] 2 slow requests, 0 included below; oldest blocked for > 18623.874055 secs 2019-09-24 13:24:47.073852 mds.ld3955 [WRN] 2 slow requests, 0 included below; oldest blocked for > 18628.874149 secs 2019-09-24 13:24:52.073941 mds.ld3955 [WRN] 2 slow requests, 0 included below; oldest blocked for > 18633.874237 secs 2019-09-24 13:24:57.074073 mds.ld3955 [WRN] 2 slow requests, 0 included below; oldest blocked for > 18638.874354 secs 2019-09-24 13:25:02.074118 mds.ld3955 [WRN] 2 slow requests, 0 included below; oldest blocked for > 18643.874415 secs Cephfs is residing on a pool "hdd" with dedicated HDDs (4x 17 1.6TB). This pool is used for RBDs, too. Question: How can I identify the 2 slow requests? And how can I kill these requests? Regards Thomas

4 years, 7 months

4
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users September 2019