Hi guys,
I was looking at some Huawei ARM-based servers and the datasheets are
very interesting. The high CPU core numbers and the SoC architecture
should be ideal for a distributed storage like Ceph, at least in theory.
I'm planning to build a new Ceph cluster in the future and my best
case scenario right now is to buy 7 servers with 2 x Intel Silver
12-core or 2 x Gold 20-core CPUs, 32 SATA drives each. And of course -
SSDs/NVMEs for wal/db and the metadata pools , 256GB RAM and so on.
I'm curious however if the ARM servers are better or not for this use
case (object-storage only). For example, instead of using 2xSilver/Gold
server, I can use a Taishan 5280 server with 2x Kungpen 920 ARM CPUs
with up to 128 cores in total . So I can have twice as many CPU cores
(or even more) per server comparing with x86. Probably the price is
lower for the ARM servers as well.
Has anyone tested Ceph in such scenario ? Is the Ceph software
really optimised for the ARM architecture ? What do you think about this ?
Thanks !
Hello,
I'm having difficulties with setting up the web certificates for the
Dashboard on hostnames ceph*01..n*.domain.tld.
I set the keys and crt with ceph-config-key. ceph-config-key get
mgr/dashbord/crt shows the correct certificate,
the same applies to mgr/dashbord/key, mgr/cephadm/grafana_key and
mgr/cephadm/grafana_crt.
The hosts were rebooted since then, I used different browsers etc. It is no
cache or proxy problem. But:
1. The browser complains Grafana web page via https://ceph01.domain.tld:3000
about the certificate and uses the self-signed certificate from cephadm.
2. The Dashboard forwards always from https://ceph01.domain.tld:8443 to
https://ceph01:8443, which means that the browser complains again, since it
needs the FQDN.
How do you handle this?
Thanks, Erich
Hi
We're having problems getting our erasure coded ec82pool to upmap balance.
"ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf)
nautilus (stable)": 554
The pool consists of 20 nodes in 10 racks, each rack containing a pair
of nodes 1@45*8TB drives and 1@10*16TB.
https://pastebin.com/YLwu8VVi
The problem is the 8TB drives are roughly 62-74% full, while the 16TB
drives are 84-87% full.
https://pastebin.com/j7Dx883i
Neither osdmaptool or reweight-by-utilization are able to improve the
distribution .
There's an osdmap ftp://ftp.mrc-lmb.cam.ac.uk/pub/toby/osdmap.2135441.
Any thoughts/pointers much appreciated.
Cheers
Toby
--
Toby Darling, Scientific Computing (2N249)
MRC Laboratory of Molecular Biology
Today I played with a samba gateway and cephfs. I couldn’t get previous versions displayed on a windows client and found very little info on the net how to accomplish this. It seems that I need a vfs module called ceph_snapshots. It’s not included in the latest samba version on Centos 8. by this I also noticed that there is no vfs ceph module. Are these modules not stable and therefore not included in centos8? I can compile them but I would like to know why they are not included. And one more question. Are there any plans to add samba gateway support to cephadm?
Best regards,
Oliver
Von meinem iPhone gesendet
No, you are not affected. Affected only clusters with mixed versions.
k
Sent from my iPhone
> On 24 Nov 2020, at 18:25, Rainer Krienke <krienke(a)uni-koblenz.de> wrote:
>
> Hello,
>
> thanks for your answer. If I understand you correctly then only if I
> upgrade from 14.2.11 to 14.2.(12|14) this could lead to problems. Does
> this mean that running 14.2.13 what I have currently installed is not
> affected?
>
> The release note says:
>
> * BlueStore: Fixes a bug in collection_list_legacy which makes pgs
> inconsistent during scrub when running mixed versions of osds, prior to
> 14.2.12 with newer.
>
> This sounds to me like if I create a new OSD having ceph vers < 14.2.15
> (like 14.2.13 in my case) installed and have OSDs created before 14.2.12
> and then create a new one I could be affected by inconsitent pgs during
> scrubbing.
>
> Have a nice day
> Rainer
>
>> Am 24.11.20 um 11:18 schrieb Konstantin Shalygin:
>> This bug may be affect when you upgrade from 14.2.11 to 14.2.(12|14) with low speed (e.g. one by one node). If you already upgrade from 14.2.11 you just jump over this bug.
>>
>>
>> k
>>
>> Sent from my iPhone
>>
>>>> On 24 Nov 2020, at 10:43, Rainer Krienke <krienke(a)uni-koblenz.de> wrote:
>>>
>>> Hello,
>>>
>>> I am running a productive ceph cluster with Nautilus 14.2.13. All OSDs
>>> are bluestore and were created with a ceph version prior to 14.2.12.
>>>
>>> What I would like to know is how urgent I should consider the
>>> collection_list_legacy bug since at the moment I am not going to add a
>>> brand new OSD to the system. However any time a disk could fail and so I
>>> would have to destroy the OSD with the failed disk and them with a new
>>> disk run ceph-volume to create a new bluestore OSD.
>>>
>>> Would this scenario also lead to inconsistent pgs?
>>>
>>> Thanks
>>> Rainer
>>>
>>>> Am 24.11.20 um 02:35 schrieb David Galloway:
>>>> This is the 15th backport release in the Nautilus series. This release
>>>> fixes a ceph-volume regression introduced in v14.2.13 and includes few
>>>> other fixes. We recommend users to update to this release.
>>>>
>>>> For a detailed release notes with links & changelog please refer to the
>>>> official blog entry at https://ceph.io/releases/v14-2-15-nautilus-released
>>>>
>>>>
>>>> Notable Changes
>>>> ---------------
>>>> * ceph-volume: Fixes lvm batch --auto, which breaks backward
>>>> compatibility when using non rotational devices only (SSD and/or NVMe).
>>>> * BlueStore: Fixes a bug in collection_list_legacy which makes pgs
>>>> inconsistent during scrub when running mixed versions of osds, prior to
>>>> 14.2.12 with newer.
>>>> * MGR: progress module can now be turned on/off, using the commands:
>>>> `ceph progress on` and `ceph progress off`.
>>>>
>>>>
>>>> Getting Ceph
>>>> ------------
>>>> * Git at git://github.com/ceph/ceph.git
>>>> * Tarball at http://download.ceph.com/tarballs/ceph-14.2.15.tar.gz
>>>> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
>>>> * Release git sha1: afdd217ae5fb1ed3f60e16bd62357ca58cc650e5
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>
>>>
>>> --
>>> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
>>> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
>>> Web: http://userpages.uni-koblenz.de/~krienke
>>> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>
>
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
I am gathering prometheus metrics from my (unhealthy) Octopus (15.2.4)
cluster and notice a discrepency (or misunderstanding) with the ceph
dashboard.
In the dashboard, and with ceph -s, it reports 807 million objects objects:
pgs: 169747/807333195 objects degraded (0.021%)
78570293/807333195 objects misplaced (9.732%)
24/101158245 objects unfound (0.000%)
But in the prometheus metrics (and in ceph df), it reports almost a
factor of 10 fewer objects (dominated by pool 7):
# HELP ceph_pool_objects DF pool objects
# TYPE ceph_pool_objects gauge
ceph_pool_objects{pool_id="4"} 3920.0
ceph_pool_objects{pool_id="5"} 372743.0
ceph_pool_objects{pool_id="7"} 86972464.0
ceph_pool_objects{pool_id="8"} 9287431.0
ceph_pool_objects{pool_id="13"} 8961.0
ceph_pool_objects{pool_id="15"} 0.0
ceph_pool_objects{pool_id="17"} 4.0
ceph_pool_objects{pool_id="18"} 206.0
ceph_pool_objects{pool_id="19"} 8.0
ceph_pool_objects{pool_id="20"} 7.0
ceph_pool_objects{pool_id="21"} 22.0
ceph_pool_objects{pool_id="22"} 203.0
ceph_pool_objects{pool_id="23"} 4415522.0
Why are these two values different? How can I get the total number of
objects (807 million) from the prometheus metrics?
--Mike
Hi,
We are going to replace our spinning SATA 4GB filestore disks with new
4GB SSD bluestore disks. Our cluster is reading far more than writing.
Comparing options, I found the interesting and cheap Micron 5210 ION
3,84TB SSDs. The way we understand it, there is a performance hit, when
it comes to continuous writing speeds. But costwise those SSD's are very
interesting. (only 450 euro each)
Our cluster is only small: consisting of three servers, in a 3/2
redundancy config. I was planning to replace the 8 OSDs on one server,
and then take some time to checkout how well (or not...) they perform.
We just wanted to ask here: anyone with suggestions on alternative SSDs
we should consider? Or other tips we should take into consideration..?
Thanks,
MJ
Hi all,
I'm upgrading ceph mimic 13.2.8 to 13.2.10 and make a strange observation. When restarting OSDs on the new version, the PGs come back as undersized. They are missing 1 OSD and I get a lot of objects degraded/misplaced.
I have only the noout flag set.
Can anyone help me out why the PGs don't peer until they are all complete?
Is there a flag I can set to get complete PGs before starting backfill/recovery?
Ceph is currently rebuilding objects even though all data should still be there. Hence, the update takes an unreasonable amount of time now and I remember that with the update from 13.2.2 to 13.2.8 PGs came back complete really fast. There was no such extended period with incomplete PGs and degraded redundancy.
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
I wonder is there anybody have a setup like I want to setup?
1st subnet: 10.118.170.0/24 (FE users)
2nd subnet: 10.192.150.0/24 (BE users)
The users are coming from these subnets, and I want that the FE users will come on the 1st interface on the loadbalancer, the BE users will come one the 2nd interface of the HA_Proxy loadbalancer, so somehow need to create 2 backends maybe in the HA_Proxy config?
Both users would go to the same rados gateways and to the same ceph cluster.
Somehow I want to create static routes on the loadbalancer, but not sure how can I define in the HA_Proxy config, that go to that specific interface ?
Thanks
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.