March 2020 - ceph-users - lists.ceph.io

EC pool 4+2 - failed to guarantee a failure domain

by Maks Kowalik

Hello, I have created a small 16pg EC pool with k=4, m=2. Then I applied following crush rule to it: rule test_ec { id 99 type erasure min_size 5 max_size 6 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step choose indep 3 type host step chooseleaf indep 2 type osd step emit } The OSD tree looks as following: -1 43.38448 root default -9 43.38448 region lab1 -7 43.38448 room dc1.lab1 -5 43.38448 rack r1.dc1.lab1 -3 14.44896 host host1.r1.dc1.lab1 6 hdd 3.63689 osd.6 up 1.00000 1.00000 8 hdd 3.63689 osd.8 up 1.00000 1.00000 7 hdd 3.63689 osd.7 up 1.00000 1.00000 11 hdd 3.53830 osd.11 up 1.00000 1.00000 -11 14.44896 host host2.r1.dc1.lab1 4 hdd 3.63689 osd.4 up 1.00000 1.00000 9 hdd 3.63689 osd.9 up 1.00000 1.00000 5 hdd 3.63689 osd.5 up 1.00000 1.00000 10 hdd 3.53830 osd.10 up 1.00000 1.00000 -13 14.48656 host host3.r1.dc1.lab1 0 hdd 3.57590 osd.0 up 1.00000 1.00000 1 hdd 3.63689 osd.1 up 1.00000 1.00000 2 hdd 3.63689 osd.2 up 1.00000 1.00000 3 hdd 3.63689 osd.3 up 1.00000 1.00000 My expectation was that each host will contain 2 shards of any PG of the pool. When I dumped PGs, it was true, but one group is placed on OSDs 0,2,3 which will cause downtime in case of host3 failure. root@host1:~/mkw # ceph pg dump|grep "^66\."|awk '{print $17}' dumped all [4,5,7,6,1,2] [8,11,9,3,0,2] <<< - this one is problematic [6,7,10,9,2,0] [2,3,7,6,5,9] [7,8,10,5,3,1] [4,5,8,6,0,2] [7,11,9,4,1,2] [5,9,0,2,7,11] [9,5,3,1,7,8] [8,11,2,0,5,9] [2,0,8,6,10,9] [3,2,5,9,7,11] [6,7,9,5,1,2] [10,5,1,3,11,8] [4,5,7,8,2,0] [7,8,3,2,9,10] Is there a way to ensure that host failure is not disruptive to the cluster? During the experiment I used info from this thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030227.html Kind regards, Maks Kowalik

4 years, 1 month

2
1
0 0

Ceph storage distribution between pools

by alexander.v.litvak＠gmail.com

I have a small cluster with a single crash map. I use 3 pools one (Openebula VMs on rbd), cephfs_data and cephfs_metadata for cephfs. Here is my ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 94 TiB 78 TiB 17 TiB 17 TiB 17.75 TOTAL 94 TiB 78 TiB 17 TiB 17 TiB 17.75 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 1 3.3 TiB 6.62M 10 TiB 12.36 24 TiB cephfs_metadata 2 2.1 GiB 447.63k 2.5 GiB 0 24 TiB one 5 2.2 TiB 598.12k 6.6 TiB 8.42 24 TiB What confuses me is an even distribution of MAX_AVAIL storage between those pools. When I mount cephfs on a client host, df -h shows me pool utilization. 28T 3.4T 24T 13% I also have an old hammer cluster where I see a similar picture for ceph df for a one crash map (covering rbd, cephfs-data, cephfs-metadata) GLOBAL: SIZE AVAIL RAW USED %RAW USED 87053G 31306G 55747G 64.04 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 12907G 69.37 5700G 3312474 cephfs-data 12 2873G 33.52 5700G 5859947 cephfs-meta 13 90035k 0 5700G 443961 cloud12g 14 2857G 43.41 3726G 623737 However, df -h on clients show total cluster utilization 86T 55T 31T 65% It seems that the hammer dynamically changes allocation data between the pools on the same crush map as needed. Does nautilus do the same? In this case, does 24 TB means actually avaialble space divided 3 (all my pools are set with 3/2 replication)? Thank you and sorry for the confusion

4 years, 1 month

1
0
0 0

preventing the spreading of corona virus on ceph.io

by Marc Roos

Since the spreading of the corona virus is taking such drastic proportions that flights between Europe and the US are being halted. I would suggest we show some support and we temporary use in the mailing list only :) and not :D

4 years, 1 month

1
0
0 0

centos7 / nautilus where to get kernel 5.5 from?

by Marc Roos

I have default centos7 setup with nautilus. I have been asked to install 5.5 to check a 'bug'. Where should I get this from? I read that the elrepo kernel is not compiled like rhel.

4 years, 1 month

2
2
0 0

Re: Single machine / multiple monitors

by Brian Topping

Ok, I think that answers my question then, thanks! Too risky to be playing with patterns that will get increasingly difficult to support over time. > On Mar 12, 2020, at 12:48 PM, Anthony D'Atri <anthony.datri(a)gmail.com> wrote: > > They won’t be AFAIK. Few people ever did this. > >> On Mar 12, 2020, at 11:08 AM, Brian Topping <brian.topping(a)gmail.com> wrote: >> >> If the ceph roadmap is getting rid of named clusters, how will multiple clusters be supported? How (for instance) would `/var/lib/ceph/mon/{name}` directories be resolved?

4 years, 1 month

1
0
0 0

Re: Single machine / multiple monitors

by Brian Topping

If the ceph roadmap is getting rid of named clusters, how will multiple clusters be supported? How (for instance) would `/var/lib/ceph/mon/{name}` directories be resolved? > On Mar 11, 2020, at 8:29 PM, Brian Topping <brian.topping(a)gmail.com> wrote: > >> On Mar 11, 2020, at 7:59 PM, Anthony D'Atri <anthony.datri(a)gmail.com> wrote: >> >>> This is all possible with a single cluster, but this limited node also needs storage. >> >> Are you saying that the limited node needs to access Ceph-based storage? Is this some sort of converged architecture? > > It is a converged architecture in that all three boxes are running Kubernetes. There is one k8s cluster on each side of the link, let’s call them “primary” and “secondary”: > * The primary k8s cluster will only access storage from the primary Ceph cluster, secondary k8s only accesses storage from secondary Ceph. > * Primary Ceph gets monitors on both sides of the link. Secondary Ceph only has monitors on the secondary side. > > In a netsplit situation, the primary Ceph will maintain quorum with both nodes on the primary side. The secondary Ceph cluster only exists separately for this netsplit situation and the secondary k8s cluster can continue unaffected. > > With this in place, the primary side can continue operating with either primary node downed for maintenance via a suboptimal quorum over the WAN link. I cannot do that today. > > I am sacrificing the case where there is a netsplit at the same time I am doing maintenance. > > Thanks for your input! > Brian > >

4 years, 1 month

1
0
0 0

RGWReshardLock::lock failed to acquire lock ret=-16

by Josh Haft

Hi, Currently running Mimic 13.2.5. We had reports this morning of timeouts and failures with PUT and GET requests to our Ceph RGW cluster. I found these messages in the RGW log: RGWReshardLock::lock failed to acquire lock on bucket_name:bucket_instance ret=-16 NOTICE: resharding operation on bucket index detected, blocking block_while_resharding ERROR: bucket is still resharding, please retry Which were preceded by many of these, which I think are normal/expected. check_bucket_shards: resharding needed: stats.num_objects=6415879 shard max_objects=6400000 Our RGW cluster sits behind haproxy which notified me approx 90 seconds after the first 'resharding needed' message that no backends were available. It appears this dynamic reshard process caused the RGWs to lock up for a period of time. Roughly 2 minutes later the reshard error messages stop and operation returns to normal. Looking back through previous RGW logs, I see a similar event from about a week ago, on the same bucket. We have several buckets with shard counts exceeding 1k (this one only has 128), and much larger object counts, so clearly this isn't the first time dynamic sharding has been invoked on this cluster. Has anyone seen this? I expect it will come up again, and can turn up debugging if that'll help. Thanks for any assistance! Josh

4 years, 1 month

1
1
0 0

ceph-mon store.db disk usage increase on OSD-Host fail

by Hartwig Hauschild

Hi, I'm (still) testing upgrading from Luminous to Nautilus and ran into the following situation: The lab-setup I'm testing in has three OSD-Hosts. If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my Mon-Nodes starts to rapidly grow in size until either the OSD-host comes back up or disks are full. On another cluster that's still on Luminous I don't see any growth at all. Is that a difference in behaviour between Luminous and Nautilus or is that caused by the lab-setup only having three hosts and one lost host causing all PGs to be degraded at the same time? -- Cheers, Hardy

4 years, 1 month

3
7
0 0

Single machine / multiple monitors

by Brian Topping

Hi, I’m getting conflicting reads from the documentation. It seems that by using the “cluster name”[1], multiple clusters can be run in parallel on the same hardware. In trying to set this up with `ceph-deploy`, I see the man page[2] says "if it finds the distro.init to be sysvinit (Fedora, CentOS/RHEL etc), it doesn't allow installation with custom cluster name and uses the default name ceph for the cluster”. Is it possible to run multiple clusters on the same hardware with CentOS 7 as the base OS? Thanks, Brian [1] https://docs.ceph.com/docs/nautilus/install/manual-deployment/#monitor-boot… [2] https://docs.ceph.com/docs/nautilus/man/8/ceph-deploy/?highlight=ceph-deplo…

4 years, 1 month

2
2
0 0

Rados example: create namespace, user for this namespace, read and write objects with created namespace and user

by Rodrigo Severo - Fábrica

Hi, I'm trying to create a namespace in rados, create a user that has access to this created namespace and with rados command line utility read and write objects in this created namespace using the created user. I can't find an example on how to do it. Can someone point me to such example or show me how to do it? Regards, Rodrigo Severo

4 years, 1 month

3
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2020