November 2020 - ceph-users

by Adrian Nicolae

Hi guys, I was looking at some Huawei ARM-based servers and the datasheets are very interesting. The high CPU core numbers and the SoC architecture should be ideal for a distributed storage like Ceph, at least in theory. I'm planning to build a new Ceph cluster in the future and my best case scenario right now is to buy 7 servers with 2 x Intel Silver 12-core or 2 x Gold 20-core CPUs, 32 SATA drives each. And of course - SSDs/NVMEs for wal/db and the metadata pools , 256GB RAM and so on. I'm curious however if the ARM servers are better or not for this use case (object-storage only). For example, instead of using 2xSilver/Gold server, I can use a Taishan 5280 server with 2x Kungpen 920 ARM CPUs with up to 128 cores in total . So I can have twice as many CPU cores (or even more) per server comparing with x86. Probably the price is lower for the ARM servers as well. Has anyone tested Ceph in such scenario ? Is the Ceph software really optimised for the ARM architecture ? What do you think about this ? Thanks !

3 years, 5 months

11
11
1 0

Certificate for Dashboard / Grafana

by E Taka

Hello, I'm having difficulties with setting up the web certificates for the Dashboard on hostnames ceph*01..n*.domain.tld. I set the keys and crt with ceph-config-key. ceph-config-key get mgr/dashbord/crt shows the correct certificate, the same applies to mgr/dashbord/key, mgr/cephadm/grafana_key and mgr/cephadm/grafana_crt. The hosts were rebooted since then, I used different browsers etc. It is no cache or proxy problem. But: 1. The browser complains Grafana web page via https://ceph01.domain.tld:3000 about the certificate and uses the self-signed certificate from cephadm. 2. The Dashboard forwards always from https://ceph01.domain.tld:8443 to https://ceph01:8443, which means that the browser complains again, since it needs the FQDN. How do you handle this? Thanks, Erich

3 years, 5 months

1
0
0 0

Unable to find further optimization, or distribution is already perfect

by Toby Darling

Hi We're having problems getting our erasure coded ec82pool to upmap balance. "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 554 The pool consists of 20 nodes in 10 racks, each rack containing a pair of nodes 1@45*8TB drives and 1@10*16TB. https://pastebin.com/YLwu8VVi The problem is the 8TB drives are roughly 62-74% full, while the 16TB drives are 84-87% full. https://pastebin.com/j7Dx883i Neither osdmaptool or reweight-by-utilization are able to improve the distribution . There's an osdmap ftp://ftp.mrc-lmb.cam.ac.uk/pub/toby/osdmap.2135441. Any thoughts/pointers much appreciated. Cheers Toby -- Toby Darling, Scientific Computing (2N249) MRC Laboratory of Molecular Biology

3 years, 5 months

2
2
0 0

Documentation of older Ceph version not accessible anymore on docs.ceph.com

by Martin Palma

Hello, maybe I missed the announcement but why is the documentation of the older ceph version not accessible anymore on docs.ceph.com? Best, Martin

3 years, 5 months

6
7
0 0

Cephfs snapshots and previous version

by Oliver Weinmann

Today I played with a samba gateway and cephfs. I couldn’t get previous versions displayed on a windows client and found very little info on the net how to accomplish this. It seems that I need a vfs module called ceph_snapshots. It’s not included in the latest samba version on Centos 8. by this I also noticed that there is no vfs ceph module. Are these modules not stable and therefore not included in centos8? I can compile them but I would like to know why they are not included. And one more question. Are there any plans to add samba gateway support to cephadm? Best regards, Oliver Von meinem iPhone gesendet

3 years, 5 months

4
3
0 0

Re: 14. 2.15: Question to collection_list_legacy osd bug fixed in 14.2.15

by Konstantin Shalygin

No, you are not affected. Affected only clusters with mixed versions. k Sent from my iPhone > On 24 Nov 2020, at 18:25, Rainer Krienke <krienke(a)uni-koblenz.de> wrote: > > Hello, > > thanks for your answer. If I understand you correctly then only if I > upgrade from 14.2.11 to 14.2.(12|14) this could lead to problems. Does > this mean that running 14.2.13 what I have currently installed is not > affected? > > The release note says: > > * BlueStore: Fixes a bug in collection_list_legacy which makes pgs > inconsistent during scrub when running mixed versions of osds, prior to > 14.2.12 with newer. > > This sounds to me like if I create a new OSD having ceph vers < 14.2.15 > (like 14.2.13 in my case) installed and have OSDs created before 14.2.12 > and then create a new one I could be affected by inconsitent pgs during > scrubbing. > > Have a nice day > Rainer > >> Am 24.11.20 um 11:18 schrieb Konstantin Shalygin: >> This bug may be affect when you upgrade from 14.2.11 to 14.2.(12|14) with low speed (e.g. one by one node). If you already upgrade from 14.2.11 you just jump over this bug. >> >> >> k >> >> Sent from my iPhone >> >>>> On 24 Nov 2020, at 10:43, Rainer Krienke <krienke(a)uni-koblenz.de> wrote: >>> >>> Hello, >>> >>> I am running a productive ceph cluster with Nautilus 14.2.13. All OSDs >>> are bluestore and were created with a ceph version prior to 14.2.12. >>> >>> What I would like to know is how urgent I should consider the >>> collection_list_legacy bug since at the moment I am not going to add a >>> brand new OSD to the system. However any time a disk could fail and so I >>> would have to destroy the OSD with the failed disk and them with a new >>> disk run ceph-volume to create a new bluestore OSD. >>> >>> Would this scenario also lead to inconsistent pgs? >>> >>> Thanks >>> Rainer >>> >>>> Am 24.11.20 um 02:35 schrieb David Galloway: >>>> This is the 15th backport release in the Nautilus series. This release >>>> fixes a ceph-volume regression introduced in v14.2.13 and includes few >>>> other fixes. We recommend users to update to this release. >>>> >>>> For a detailed release notes with links & changelog please refer to the >>>> official blog entry at https://ceph.io/releases/v14-2-15-nautilus-released >>>> >>>> >>>> Notable Changes >>>> --------------- >>>> * ceph-volume: Fixes lvm batch --auto, which breaks backward >>>> compatibility when using non rotational devices only (SSD and/or NVMe). >>>> * BlueStore: Fixes a bug in collection_list_legacy which makes pgs >>>> inconsistent during scrub when running mixed versions of osds, prior to >>>> 14.2.12 with newer. >>>> * MGR: progress module can now be turned on/off, using the commands: >>>> `ceph progress on` and `ceph progress off`. >>>> >>>> >>>> Getting Ceph >>>> ------------ >>>> * Git at git://github.com/ceph/ceph.git >>>> * Tarball at http://download.ceph.com/tarballs/ceph-14.2.15.tar.gz >>>> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ >>>> * Release git sha1: afdd217ae5fb1ed3f60e16bd62357ca58cc650e5 >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>>> >>> >>> -- >>> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 >>> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312 >>> Web: http://userpages.uni-koblenz.de/~krienke >>> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users(a)ceph.io >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> > > > -- > Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 > 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312 > Web: http://userpages.uni-koblenz.de/~krienke > PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html

3 years, 5 months

1
0
0 0

Prometheus monitoring

by Michael Thomas

I am gathering prometheus metrics from my (unhealthy) Octopus (15.2.4) cluster and notice a discrepency (or misunderstanding) with the ceph dashboard. In the dashboard, and with ceph -s, it reports 807 million objects objects: pgs: 169747/807333195 objects degraded (0.021%) 78570293/807333195 objects misplaced (9.732%) 24/101158245 objects unfound (0.000%) But in the prometheus metrics (and in ceph df), it reports almost a factor of 10 fewer objects (dominated by pool 7): # HELP ceph_pool_objects DF pool objects # TYPE ceph_pool_objects gauge ceph_pool_objects{pool_id="4"} 3920.0 ceph_pool_objects{pool_id="5"} 372743.0 ceph_pool_objects{pool_id="7"} 86972464.0 ceph_pool_objects{pool_id="8"} 9287431.0 ceph_pool_objects{pool_id="13"} 8961.0 ceph_pool_objects{pool_id="15"} 0.0 ceph_pool_objects{pool_id="17"} 4.0 ceph_pool_objects{pool_id="18"} 206.0 ceph_pool_objects{pool_id="19"} 8.0 ceph_pool_objects{pool_id="20"} 7.0 ceph_pool_objects{pool_id="21"} 22.0 ceph_pool_objects{pool_id="22"} 203.0 ceph_pool_objects{pool_id="23"} 4415522.0 Why are these two values different? How can I get the total number of objects (807 million) from the prometheus metrics? --Mike

3 years, 5 months

1
0
0 0

ssd suggestion

by mj

Hi, We are going to replace our spinning SATA 4GB filestore disks with new 4GB SSD bluestore disks. Our cluster is reading far more than writing. Comparing options, I found the interesting and cheap Micron 5210 ION 3,84TB SSDs. The way we understand it, there is a performance hit, when it comes to continuous writing speeds. But costwise those SSD's are very interesting. (only 450 euro each) Our cluster is only small: consisting of three servers, in a 3/2 redundancy config. I was planning to replace the 8 OSDs on one server, and then take some time to checkout how well (or not...) they perform. We just wanted to ask here: anyone with suggestions on alternative SSDs we should consider? Or other tips we should take into consideration..? Thanks, MJ

3 years, 5 months

2
1
0 0

PGs undersized for no reason?

by Frank Schilder

Hi all, I'm upgrading ceph mimic 13.2.8 to 13.2.10 and make a strange observation. When restarting OSDs on the new version, the PGs come back as undersized. They are missing 1 OSD and I get a lot of objects degraded/misplaced. I have only the noout flag set. Can anyone help me out why the PGs don't peer until they are all complete? Is there a flag I can set to get complete PGs before starting backfill/recovery? Ceph is currently rebuilding objects even though all data should still be there. Hence, the update takes an unreasonable amount of time now and I remember that with the update from 13.2.2 to 13.2.8 PGs came back complete really fast. There was no such extended period with incomplete PGs and degraded redundancy. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 years, 5 months

1
1
0 0

HA_proxy setup

by Szabo, Istvan (Agoda)

Hi, I wonder is there anybody have a setup like I want to setup? 1st subnet: 10.118.170.0/24 (FE users) 2nd subnet: 10.192.150.0/24 (BE users) The users are coming from these subnets, and I want that the FE users will come on the 1st interface on the loadbalancer, the BE users will come one the 2nd interface of the HA_Proxy loadbalancer, so somehow need to create 2 backends maybe in the HA_Proxy config? Both users would go to the same rados gateways and to the same ceph cluster. Somehow I want to create static routes on the loadbalancer, but not sure how can I define in the HA_Proxy config, that go to that specific interface ? Thanks ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 5 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2020