Hello,
I have ceph cluster version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
4 nodes - each node 11 HDD, 1 SSD, 10Gbit network
Cluster was empty, fresh install. We filled cluster with data (small blocks) using RGW.
Cluster is now used for testing so no client was using it during my admin operations mentioned below
After a while (7TB of data / 40M objects uploaded) we decided, that we increase pg_num from 128 to 256 to better spread data and to speedup
this operation, I've set
ceph config set mgr target_max_misplaced_ratio 1
so that whole cluster rebalance as quickly as it can.
I have 3 issues/questions below:
1)
I noticed, that manual increase from 128 to 256 caused approx. 6 OSD's to restart with logged
heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f8c84b8b700' had suicide timed out after 150
after a while OSD's were back so I continued after a while with my tests.
My question - increasing number of PG with maximal target_max_misplaced_ratio was too much for that OSDs? It is not recommended to do it
this way? I had no problem with this increase before, but configuration of cluster was slightly different and it was luminous version.
2)
Rebuild was still slow so I increased number of backfills
ceph tell osd.* injectargs "--osd-max-backfills 10"
and reduced recovery sleep time
ceph tell osd.* injectargs "--osd-recovery-sleep-hdd 0.01"
and after few hours I noticed, that some of my OSD's were restarted during recovery, in log I can see
...
|2020-03-21 06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21
06:41:28.343 7fe1f8bee700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.780
7fe1da154700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe1da154700' had timed out after 15 2020-03-21 06:41:36.888
7fe1e7769700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.7 down, but it is still running 2020-03-21 06:41:36.888
7fe1e7769700 0 log_channel(cluster) log [DBG] : map e3574 wrongly marked me down at e3573 2020-03-21 06:41:36.888 7fe1e7769700 1 osd.7 3574
start_waiting_for_healthy |
I observed network graph usage and network utilization was low during recovery (10Gbit was not saturated).
So lot of IOPS on OSD causes also hartbeat operation to timeout? I thought that OSD is using threads and HDD timeouts are not influencing
heartbeats to other OSD's and MON. It looks like it is not true.
3)
After OSD was wrongly marked down I can see that cluster has object degraded. There were no degraded object before that.
Degraded data redundancy: 251754/117225048 objects degraded (0.215%), 8 pgs degraded, 8 pgs undersized
It means that this OSD disconnection causes data degraded? How is it possible, when no OSD was lost. Data should be on that OSD and after
peering should be everything OK. With luminous I had no problem, after OSD up degraded objects where recovered/found during few seconds and
cluster was healthy within seconds.
Thank you very much for additional info. I can perform additional tests you recommend because cluster is used for testing purpose now.
With regards
Jan Pekar
--
============
Ing. Jan Pekař
jan.pekar(a)imatic.cz
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326
============
--
Has anyone ever tried using this feature? I've added it to the [global]
section of the ceph.conf on my POC cluster but I'm not sure how to tell if
it's actually working. I did find a reference to this feature via Google and
they had it in their [OSD] section?? I've tried that too..
TIA
Adam
We have deployed a small test cluster consisting of three nodes. Each node is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two partitions), so six osds in total. We started with Ceph 14.2.7 some weeks ago (upgraded to 14.2.9 later) and ran different tests using fio against some rbd volumes in order to get an overview what performance we could expect. The configuration is unchanged compared to the defaults, we only set several debugging options to 0/0.
Yesterday we upgraded the whole cluster following the upgrade guidelines to Ceph 15.2.3, which worked without any problems so far. Nevertheless when running the same tests as before with Ceph 14.2.9, we are seeing some clear degradations in write-performance (beside some performance improvements, which shall also be mentioned).
Here the results of concern (each with the relevant fio settings used):
Test "read-latency-max"
(rw=randread, iodepth=64, bs=4k)
read_iops: 32500 -> 87000
Test "write-latency-max"
(rw=randwrite, iodepth=64, bs=4k)
write_iops: 22500 -> 11500
Test "write-throughput-iops-max"
(rw=write, iodepth=64, bs=4k)
write_iops: 7000 -> 14000
Test "usecase1"
(rw=randrw, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/, rwmixread=1, rate_process=poisson, iodepth=64)
write_iops: 21000 -> 8500
Test "usecase1-readonly"
(rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/, rate_process=poisson, iodepth=64)
read_iops: 28000 -> 58000
The last two tests represent a typical use case on our systems. Therefore we are especially concerned by the drop in performance from 21000 w/ops to 8500 w/ops (about 60%) after upgrading to Ceph 15.2.3.
We ran all tests several times, the values are averaged over all iterations and fairly consistent and reproducible. We even tried wiping the whole cluster, downgrading to Ceph 14.2.9 again, setting up a new cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The tests have been performed on one of the three cluster nodes using a 50G rbd volume, which had been prefilled with random data before each test-run.
Have any changes been introduced with Octopus that could explain the observed changes in performance?
What we already tried:
- Disabling rbd cache
- Reverting rbc cache policy to writeback (default in 14.2)
- Setting rbd io scheduler to none
- Deploying a fresh cluster starting with Ceph 15.2.3
Kernel is 5.4.38 … I don't know if some other system specs would be helpful besides the already mentioned (since we are talking about a relative change in performance after upgrading Ceph without any further changes) - if so, please let us know.
Dear all,
maybe someone can give me a pointer here. We are running OpenNebula with ceph RBD as a back-end store. We have a pool of spinning disks to create large low-demand data disks, mainly for backups and other cold storage. Everything is fine when using linux VMs. However, Windows VMs perform poorly, they are like a factor 20 slower than a similarly created linux VM.
If anyone has pointers what to look for, we would be very grateful.
The OpenNebula installation is more or less default. The current OS and libvirt versions we use are:
Centos 7.6 with stock kernel 3.10.0-1062.1.1.el7.x86_64
libvirt-client.x86_64 4.5.0-23.el7_7.1 @updates
qemu-kvm-ev.x86_64 10:2.12.0-33.1.el7 @centos-qemu-ev
Some benchmark results from good to worse workloads:
rbd bench --io-size 4M --io-total 4G --io-pattern seq --io-type write --io-threads 16 : 450MB/s
rbd bench --io-size 4M --io-total 4G --io-pattern seq --io-type write --io-threads 1 : 230MB/s
rbd bench --io-size 1M --io-total 4G --io-pattern seq --io-type write --io-threads 1 : 190MB/s
rbd bench --io-size 64K --io-total 4G --io-pattern seq --io-type write --io-threads 1 : 150MB/s
rbd bench --io-size 64K --io-total 1G --io-pattern rand --io-type write --io-threads 1 : 26MB/s
dd with conv=fdatasync gives awesome 500MB/s inside linux VM for sequential write of 4GB.
We copied a couple of large ISO files inside the Windows VM and for the first ca. 1 to 1.5G it performs as expected. Thereafter, however, write speed drops rapidly to ca. 25MB/s and does not recover. It is almost as if Windows translates large sequential writes to small random writes.
If anyone has seen and solved this before, please let us know.
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hello,
I've upgraded ceph to Octopus (15.2.3 from repo) on one of the Ubuntu 18.04 host servers. The update caused problem with libvirtd which hangs when it tries to access the storage pools. The problem doesn't exist on Nautilus. The libvirtd process simply hangs. Nothing seem to happen. The log file for the libvirtd shows:
2020-06-29 19:30:51.556+0000: 12040: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 11
2020-06-29 19:30:51.556+0000: 12040: debug : virNetlinkEventCallback:720 : event not handled.
2020-06-29 19:30:51.556+0000: 12040: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 11
2020-06-29 19:30:51.556+0000: 12040: debug : virNetlinkEventCallback:720 : event not handled.
2020-06-29 19:30:51.557+0000: 12040: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 11
2020-06-29 19:30:51.557+0000: 12040: debug : virNetlinkEventCallback:720 : event not handled.
2020-06-29 19:30:51.591+0000: 12040: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 11
2020-06-29 19:30:51.591+0000: 12040: debug : virNetlinkEventCallback:720 : event not handled.
Running strace on the libvirtd process shows:
root@ais-cloudhost1:/home/andrei# strace -p 12040
strace: Process 12040 attached
restart_syscall(<... resuming interrupted poll ...>
Nothing happens after that point.
The same host server can get access to the ceph cluster and the pools by running ceph -s or rbd -p <pool> ls -l commands for example.
Need some help to get the host servers working again with Octopus.
Cheers
As a follow-up to our recent memory problems with OSDs (with high pglog
values:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LJPJZPBSQRJ…
), we also see high buffer_anon values. E.g. more than 4 GB, with "osd
memory target" set to 3 GB. Is there a way to restrict it?
As it is called "anon", I guess that it would first be necessary to find
out what exactly is behind this?
Well maybe it is just as Wido said, with lots of small objects, there
will be several problems.
Cheers
Harry
Thanks Ramana and David.
So we are using the Shaman search API to get the latest build for
ceph_nautilus flavor of NFS Ganesha, and that's how we get to the mentioned
build. We are doing this since it's part of our CI and it's better for
automation.
Should we use different repos?
Thanks,
V
On Wed, Jun 24, 2020 at 3:33 PM Victoria Martinez de la Cruz <
vkmc(a)redhat.com> wrote:
> Thanks Ramana and David.
>
> So we are using the Shaman search API to get the latest build for
> ceph_nautilus flavor of NFS Ganesha, and that's how we get to the mentioned
> build. We are doing this since it's part of our CI and it's better for
> automation.
>
> Should we use different repos?
>
> Thanks,
>
> V
>
> On Tue, Jun 23, 2020 at 2:42 PM David Galloway <dgallowa(a)redhat.com>
> wrote:
>
>>
>>
>> On 6/23/20 1:21 PM, Ramana Venkatesh Raja wrote:
>> > On Tue, Jun 23, 2020 at 6:59 PM Victoria Martinez de la Cruz
>> > <victoria(a)redhat.com> wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> I'm hitting issues with the nfs-ganesha-stable packages [0], the repo
>> url
>> >> [1] is broken. Is there a known issue for this?
>> >>
>> >
>> > The missing packages in chacra could be due to the recent mishap in
>> > the sepia long running cluster,
>> >
>> https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/YQMAHTB7MUHL25QP7V…
>>
>> Hi Victoria,
>>
>> Ramana is correct. Do you need 2.7.4 specifically? If not, signed
>> nfs-ganesha packages can also be found here:
>> http://download.ceph.com/nfs-ganesha/
>>
>> >
>> >> Thanks,
>> >>
>> >> Victoria
>> >>
>> >> [0]
>> >>
>> https://shaman.ceph.com/repos/nfs-ganesha-stable/V2.7-stable/1a1fb71cdb811c…
>> >> [1]
>> >>
>> https://chacra.ceph.com/r/nfs-ganesha-stable/V2.7-stable/1a1fb71cdb811c1bac…
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users(a)ceph.io
>> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>
>> >
>>
>>
Thanks for your reply Anastasios,
I was waiting for some answer.
My /etc/apt/sources.list.d/ceph.list content is:
deb https://download.ceph.com/debian-nautilus/ buster main
Even if I do “apt-get update”, the packages still the same.
The Ceph client (CephFS mount) is working well, but I can´t deploy new osds.
The error that I posted occurs when I do : “ceph-deploy osd create --data /dev/sdb node1”
I appreciate any help.
Rafael.
De: Anastasios Dados <tdados(a)hotmail.com>
Enviada em: segunda-feira, 29 de junho de 2020 20:01
Para: Rafael Quaglio <quaglio(a)bol.com.br>; ceph-users(a)ceph.io
Assunto: Re: [ceph-users] Debian install
Hello Rafael,
Can you check the apt sources list that exist from your ceph-deploy node? Maybe there you have put luminous debian packages version?
Regards,
Anastasios
On Mon, 2020-06-29 at 06:59 -0300, Rafael Quaglio wrote:
Hi,
We have already installed a new Debian (10.4) server and I need put it in a
Ceph cluster.
When I execute the command to install ceph on this node:
ceph-deploy install --release nautilus node1
It starts to install a version 12.x in my node...
(...)
[serifos][DEBUG ] After this operation, 183 MB of additional disk space will
be used.
[serifos][DEBUG ] Selecting previously unselected package python-cephfs.
(Reading database ... 30440 files and directories currently installed.)
[serifos][DEBUG ] Preparing to unpack
.../python-cephfs_12.2.11+dfsg1-2.1+b1_amd64.deb ...
[serifos][DEBUG ] Unpacking python-cephfs (12.2.11+dfsg1-2.1+b1) ...
[serifos][DEBUG ] Selecting previously unselected package ceph-common.
[serifos][DEBUG ] Preparing to unpack
.../ceph-common_12.2.11+dfsg1-2.1+b1_amd64.deb ...
[serifos][DEBUG ] Unpacking ceph-common (12.2.11+dfsg1-2.1+b1) ...
(...)
How do I upgrade this packages?
Even installed packages in this version, the installation
completes without erros.
The question is due to an error message that I'm recieving
when deploy a new osd.
ceph-deploy osd create --data /dev/sdb node1
At this point:
[ceph_deploy.osd][INFO ] Distro info: debian 10.4 buster
[ceph_deploy.osd][DEBUG ] Deploying osd to node1
[node1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[node1][DEBUG ] find the location of an executable
[node1][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph
lvm create --bluestore --data /dev/sdb
[node1][WARNIN] --> RuntimeError: Unable to create a new OSD id
[node1][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[node1][DEBUG ] Running command: /bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i -
osd new 76da6c51-8385-4ffc-9a8e-0dfc11e31feb
[node1][DEBUG ] stderr:
/build/ceph-qtARip/ceph-12.2.11+dfsg1/src/mon/MonMap.cc: In function 'void
MonMap::sanitize_mons(std::map<std::__cxx11::basic_string<char>,
entity_addr_t>&)' thread 7f2bc7fff700 time 2020-06-29 06:56:17.331350
[node1][DEBUG ] stderr:
/build/ceph-qtARip/ceph-12.2.11+dfsg1/src/mon/MonMap.cc: 77: FAILED
assert(mon_info[p.first].public_addr == p.second)
[node1][DEBUG ] stderr: ceph version 12.2.11
(26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
[node1][DEBUG ] stderr: 1: (ceph::__ceph_assert_fail(char const*, char
const*, int, char const*)+0xf5) [0x7f2bdaff5f75]
[node1][DEBUG ] stderr: 2:
(MonMap::sanitize_mons(std::map<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >, entity_addr_t,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const, entity_addr_t> >
&)+0x568) [0x7f2bdb050038]
[node1][DEBUG ] stderr: 3:
(MonMap::decode(ceph::buffer::list::iterator&)+0x4da) [0x7f2bdb05500a]
[node1][DEBUG ] stderr: 4: (MonClient::handle_monmap(MMonMap*)+0x216)
[0x7f2bdb042a06]
[node1][DEBUG ] stderr: 5: (MonClient::ms_dispatch(Message*)+0x4ab)
[0x7f2bdb04729b]
[node1][DEBUG ] stderr: 6: (DispatchQueue::entry()+0xeba) [0x7f2bdb06bf5a]
[node1][DEBUG ] stderr: 7: (DispatchQueue::DispatchThread::entry()+0xd)
[0x7f2bdb1576fd]
[node1][DEBUG ] stderr: 8: (()+0x7fa3) [0x7f2be499dfa3]
[node1][DEBUG ] stderr: 9: (clone()+0x3f) [0x7f2be45234cf]
[node1][DEBUG ] stderr: NOTE: a copy of the executable, or `objdump -rdS
<executable>` is needed to interpret this.
[node1][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume
--cluster ceph lvm create --bluestore --data /dev/sdb
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
I think this error occurs because the wrong package that was
installed.
Thanks,
Rafael
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>
To unsubscribe send an email to ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>
Hello...it's the first time I need to use the lifecycle, and I created a
bucket and set it to expire in one day with s3cmd:
s3cmd expire --expiry-days=1 s3://bucket
The rgw_lifecycle_work_time is set to the default values(00:00-06:00). But
I noticed in the rgw logs a lot of messages like:
2020-06-16 00:00:00.311369 7fe2cac87700 0 RGWLC::process() failed to get
obj entry lc.8
2020-06-16 00:00:00.311623 7fe2c8c83700 0 RGWLC::process() failed to get
obj entry lc.16
2020-06-16 00:00:00.311862 7fe2c6c7f700 0 RGWLC::process() failed to get
obj entry lc.4
2020-06-16 00:00:00.319424 7fe2cac87700 0 RGWLC::process() failed to get
obj entry lc.10
2020-06-16 00:00:00.319647 7fe2c8c83700 0 RGWLC::process() failed to get
obj entry lc.18
2020-06-16 00:00:00.320682 7fe2c6c7f700 0 RGWLC::process() failed to get
obj entry lc.16
2020-06-16 00:00:00.327770 7fe2cac87700 0 RGWLC::process() failed to get
obj entry lc.6
2020-06-16 00:00:00.328941 7fe2c8c83700 0 RGWLC::process() failed to get
obj entry lc.17
2020-06-16 00:00:00.332463 7fe2c6c7f700 0 RGWLC::process() failed to get
obj entry lc.20
2020-06-16 00:00:00.336788 7fe2cac87700 0 RGWLC::process() failed to get
obj entry lc.1
2020-06-16 00:00:00.336924 7fe2c8c83700 0 RGWLC::process() failed to get
obj entry lc.24
2020-06-16 00:00:00.340915 7fe2c6c7f700 0 RGWLC::process() failed to get
obj entry lc.2
The object was deleted, but these messages keep appearing.
Is it safe to ignore them?
For the records, i'm using redhat luminous 12.2.12
Thanks, Marcelo.