Hi all,
Does anyone know when we can expect Crimson/Seastor to be "Production
Ready" and/or what level of performance increase can be expected?
thx
Frank
Hello everyone,
Below is our current setup, the master zone is on the master datacenter and the backup zone is on the standby datacenter which has very slow bandwidth internet connection. The client application connects to the Master zone load balancer for uploading objects to cluster. Later on, the
Gate way nodes from the Backup zone will synchronize new objects into the Backup cluster by requesting object data to the load balancer of the master zone.
[A screenshot of a cell phone Description automatically generated]
The current connection inside Gateway nodes of the backup zone
[A screenshot of a computer Description automatically generated]
From the above image, you can see that the first Gateway node create a lot of connection to the master zone balancer (cephmm-03) for checking objects status. That means this gateway mostly will take the role for downloading new objects from the master zone to the backup zone (very high load on this node). On the other hand, the second Gateway node just has only 2 connections to the master node balancer (mostly free all the time). This second node only increase connections if the first node (or may be the third node) is down…
Can you explain this behavior of Ceph in this case and how to let all the gateway node in active mode?
I appreciate any comments from you!
--
Nghia Viet Tran (Mr)
Yes. After the time-out of 600 secs the OSDs got marked down, all PGs got remapped and recovery/rebalancing started as usual. In the past, I did service on servers with the flag noout set and would expect that mon_osd_down_out_subtree_limit=host has the same effect when shutting down an entire host. Unfortunately, in my case these two settings behave differently.
If I understand the documentation correctly, the OSDs should not get marked out automatically.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Anthony D'Atri <anthony.datri(a)gmail.com>
Sent: 14 July 2020 04:32:05
To: Frank Schilder
Subject: Re: [ceph-users] mon_osd_down_out_subtree_limit not working?
Did it start rebalancing?
> On Jul 13, 2020, at 4:29 AM, Frank Schilder <frans(a)dtu.dk> wrote:
>
> if I shut down all OSDs on this host, these OSDs should not be marked out automatically after mon_osd_down_out_interval(=600) seconds. I did a test today and, unfortunately, the OSDs do get marked as out. Ceph status was showing 1 host down as expected.
> If I may ask, which version of the virtio drivers do you use?
https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/latest-vir…
Looks like virtio-win-0.1.185.*
> And do you use caching on libvirt driver level?
In the ONE interface, we use
DISK = [ driver = "raw" , cache = "none"]
which translates to
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
in the XML. We have no qemu settings in the ceph.conf. Looks like caching is disabled. Not sure if this is the recommended way though and why caching is disabled by default.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: André Gemünd <andre.gemuend(a)scai.fraunhofer.de>
Sent: 13 July 2020 11:18
To: Frank Schilder
Subject: Re: [ceph-users] Re: Poor Windows performance on ceph RBD.
If I may ask, which version of the virtio drivers do you use?
And do you use caching on libvirt driver level?
Greetings
André
----- Am 13. Jul 2020 um 10:43 schrieb Frank Schilder frans(a)dtu.dk:
>> > To anyone who is following this thread, we found a possible explanation for
>> > (some of) our observations.
>
>> If someone is following this, they probably want the possible
>> explanation and not the knowledge of you having the possible
>> explanation.
>
>> So you are saying if you do eg. a core installation (without gui) of
>> 2016/2019 disable all services. The fio test results are signficantly
>> different to eg. a centos 7 vm doing the same fio test? Are you sure
>> this is not related to other processes writing to disk?
>
> Right, its not an explanation but rather a further observation. We don't really
> have an explanation yet.
>
> Its an identical installation of both server versions, same services configured.
> Our operators are not really into debugging Windows, that's why we were asking
> here. Their hypothesis is, that the VD driver for accessing RBD images has
> problems with Windows servers newer than 2016. I'm not a Windows guy, so can't
> really comment on this.
>
> The test we do is a simple copy-test of a single 10g file and we monitor the
> transfer speed. This info was cut out of this e-mail, the original report for
> reference is:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ANHJQZLJT4…
> .
>
> We are very sure that it is not related to other processes writing to disk, we
> monitor that too. There is also no competition on the RBD pool at the time of
> testing.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Marc Roos <M.Roos(a)f1-outsourcing.eu>
> Sent: 13 July 2020 10:24
> To: ceph-users; Frank Schilder
> Subject: RE: [ceph-users] Re: Poor Windows performance on ceph RBD.
>
>>> To anyone who is following this thread, we found a possible
> explanation for
>>> (some of) our observations.
>
> If someone is following this, they probably want the possible
> explanation and not the knowledge of you having the possible
> explanation.
>
> So you are saying if you do eg. a core installation (without gui) of
> 2016/2019 disable all services. The fio test results are signficantly
> different to eg. a centos 7 vm doing the same fio test? Are you sure
> this is not related to other processes writing to disk?
>
>
>
> -----Original Message-----
> From: Frank Schilder [mailto:frans@dtu.dk]
> Sent: maandag 13 juli 2020 9:28
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.
>
> To anyone who is following this thread, we found a possible explanation
> for (some of) our observations.
>
> We are running Windows servers version 2016 and 2019 as storage servers
> exporting data on an rbd image/disk. We recently found that Windows
> server 2016 runs fine. It is still not as fast as Linux + SAMBA share on
> an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth.
> With Windows server 2019, however, we observe near-complete stall of
> file transfers and time-outs using standard copy tools (robocopy). We
> don't have an explanation yet and are downgrading Windows servers where
> possible.
>
> If anyone has a hint what we can do, please let us know.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Dipl.-Inf. André Gemünd, Leiter IT / Head of IT
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemuend(a)scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend
Hi all,
I want to use cephfs-shell dealing with operations like directory creation,
instead of mounting the root directory and create manually. But I get
errors
when I execute the command 'cephfs-shell'.
Traceback (most recent call last):
File "./cephfs-shell", line 9, in <module>
import cephfs as libcephfs
ModuleNotFoundError: No module named 'cephfs'
CentOS7 use python2 and I've already installed python3 in the system.
What else should I do, reinstall libcephfs?
Thanks
Hi,
We use the official Ceph RPM repository (http://download.ceph.com/rpm-
nautilus/el7) for installing packages on the client nodes running
CentOS7.
But we noticed today that the repo only provides the latest version
(2:14.2.10-0.el7) of nautilus so that we couldn't install an older
(2:14.2.7-0.el7) version of the ceph-common package.
The ceph.repo file contains:
---
[Ceph]
name=Ceph packages for $basearch
baseurl=http://download.ceph.com/rpm-nautilus/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
---
and
---
$ yum clean all
$ yum makecache
$ yum list --disablerepo=* --enablerepo=Ceph --showduplicates ceph-
common
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Available Packages
ceph-common.x86_64 2:14.2.10-
0.el7 Ceph
---
Does it mean that the official repo no longer provides RPM packages for
older versions? Thanks!
Cheers, Hong
--
Hurng-Chun (Hong) Lee, PhD
ICT manager
Donders Institute for Brain, Cognition and Behaviour,
Centre for Cognitive Neuroimaging
Radboud University Nijmegen
e-mail: h.lee(a)donders.ru.nl
tel: +31(0) 243610977
web: http://www.ru.nl/donders/
Hi,
I’ve been getting errors in the NFS section of the web interface. I’ve just tried upgrading to 15.2.4 to see if that helped but no joy.
The initial NFS page loads OK and when I click Add, a form loads. However, when this form attempts to update its values, I get a red box informing me that the server returned an error 500. it makes four HTTP calls - to daemon, clients, filesystems and fsals - these all fail.
Any suggestions on what might be wrong or where some useful logs might be?
Ta,
Will
Hi All,
I'm investigating what appears to be a bug in RGW stats. This is a brand
new cluster running 15.2.3
One of our customers reached out, saying they were hitting their quota (S3
error: 403 (QuotaExceeded)). The user-wide max_objects quota we set is 50
million objects, so this would be impossible since the entire cluster isn't
even close to 50 million objects yet:
[root@os1 ~]# ceph status | grep objects
objects: 7.58M objects, 6.8 TiB
The customer in question has three buckets, and if I query the bucket
stats, the total number of objects for all 3 buckets comes to about ~372k:
[root@os1 ~]# radosgw-admin bucket stats --bucket=df-fs1 | grep num_objects
"num_objects": 324880
[root@os1 ~]# radosgw-admin bucket stats --bucket=df-oldrepo | grep
num_objects
"num_objects": 47476
[root@os1 ~]# radosgw-admin bucket stats --bucket=df-test | grep num_objects
"num_objects": 1
But things get interesting when I query the user stats: