November 2020 - Dev - lists.ceph.io

by Seena Fallah

Hi. In https://github.com/ceph/ceph-container/releases there is no tag available for nautilus 14.2.13 and 14.2.14! Any plan to create them? Thanks.

3 years, 5 months

3
6
0 0

Bug report for rbd cache setting

by norman

Hi, I'm using ceph nautilus for openstack, I found an odd problem(or a bug?) while turn on the rbd cache: 1. run a rbd read bench rbd -c /etc/ceph/ceph.conf -k /etc/ceph/keyring2 -n client.rbd-openstack-002 bench \ --io-size 4K \ --io-threads 1 \ --io-pattern rand \ --io-type read \ --io-total 100G \ openstack-volumes/image1 bench type read io_size 4096 io_threads 1 bytes 107374182400 pattern random SEC OPS OPS/SEC BYTES/SEC 1 180 176.75 723985.61 2 369 185.37 759263.46 3 581 194.26 795669.09 4 742 184.27 754778.64 2. run another test for write(the same host or not) rbd -n client.rbd-openstack-002 -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.rbd-openstack-002 bench \ --io-size 4K \ --io-threads 1 \ --io-pattern rand \ --io-type write \ --io-total 100G \ openstack-volumes/image2 3. waiting for seconds, check the read ops: bench type read io_size 4096 io_threads 1 bytes 107374182400 pattern random SEC OPS OPS/SEC BYTES/SEC 1 140 137.16 561794.28 2 249 125.50 514046.00 3 384 128.16 524942.96 10 432 39.83 163128.47 11 435 39.48 161700.52 12 457 28.86 118208.87 25 473 9.38 38414.24 26 483 4.30 17621.06 29 494 3.42 14005.46 30 536 5.33 21828.19 31 555 5.16 21144.12 33 558 10.77 44114.68 34 576 11.22 45938.31 36 591 12.20 49987.93 37 601 9.29 38055.28 The read op is blocked by write? I think it should not be reasonable.

3 years, 5 months

1
0
0 0

using fio tool in ceph development cluster (vstart.sh)

by Bobby

Hi, I am using the Ceph development cluster through vstart.sh script. I would like to measure/benchmark read and write performance (benchmark ceph at a low level). For that I want to use the fio tool. Can we use fio on the development cluster? AFAIK, we can..... I have seen the fio option in the CMakeLists.txt of the Ceph source code. Thanks in advance. BR

3 years, 5 months

2
1
0 0

Do rados objects have a version or epoch?

by Peter Lieven

Hi Ceph Dev Team, i am new to ceph and I am wondering if it would be possible to implement a command that lists all objects in a pool/namespace/rbd image that have been modified after a certain period in time? The background of this question is that I would like to implement incremental backup and restore of rbd images for a long period (e.g. 90 days daily) without keeping snapshots for each of the backups. I would instead like to store some extra info alongside with the backups that gives me the possible to later issue a call like: ceph tell me with objects of the rbd image have been changed since I made this backup and then I would like to have the opportunity to only restore those segments of the rbd that have been changed since then. I have learnt that each object has an mtime, but I have learnt that mtime is not a good choice and it would be better to have something that is strictly monotonically increasing. If there is nothing like epoch or version that can be used, would you consider mtime stable enough for this purpose if some extra time is added (e.g. does the rados object mtime have an update_interval like the rbd image mtime)? Thanks for your advice/hints, Peter

3 years, 5 months

1
0
0 0

RGW Lua Compilation error

by Abhinav Singh

Hello everyone, When I rebased my branch with master, and tried to build it I m getting this error /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:79:5: error: ‘lua_seti’ was not declared in this scope; did you mean ‘luaL_setn’? 79 | lua_seti(L, -2, i); | ^~~~~~~~ | luaL_setn /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:86:32: error: ‘LUA_OK’ was not declared in this scope; did you mean ‘LUA_QS’? 86 | if (lua_pcall(L, 0, 1, 0) != LUA_OK) { | ^~~~~~ | LUA_QS /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:100:10: error: ‘lua_isinteger’ was not declared in this scope; did you mean ‘lua_tointeger’? 100 | if (!lua_isinteger(L, -2) || !lua_isnumber(L, -1)) { | ^~~~~~~~~~~~~ | lua_tointeger /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc: In constructor ‘Mantle::Mantle()’: /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:123:21: error: ‘luaopen_coroutine’ was not declared in this scope; did you mean ‘luaopen_string’? 123 | {LUA_COLIBNAME, luaopen_coroutine}, | ^~~~~~~~~~~~~~~~~ | luaopen_string /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:127:6: error: ‘LUA_UTF8LIBNAME’ was not declared in this scope; did you mean ‘LUA_STRLIBNAME’? 127 | {LUA_UTF8LIBNAME, luaopen_utf8}, | ^~~~~~~~~~~~~~~ | LUA_STRLIBNAME /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:127:23: error: ‘luaopen_utf8’ was not declared in this scope; did you mean ‘luaopen_math’? 127 | {LUA_UTF8LIBNAME, luaopen_utf8}, | ^~~~~~~~~~~~ | luaopen_math /home/abhinav/GSOC/PR/ceph/src/mds/Mantle.cc:133:7: error: ‘luaL_requiref’ was not declared in this scope; did you mean ‘luaL_unref’? 133 | luaL_requiref(L, lib->name, lib->func, 1); OS - ubuntu 18.04 Lua &Lua dev - 5.1 ceph 16.0.0-6381-g4304ebeca8 (4304ebeca8a7c55b7c583eaf35a0aede807692be) pacific (dev) Thank You, Abhinav Singh

3 years, 5 months

2
1
0 0

Design discussion about replicated persistent write-back cache in librbd.

by xiaoyan li

Hi all, Thanks to Jason, Josh and others, we discussed replicated persistent write-back cache during last CDM. This email is to continue discussing detailed info about errror handling. The following describes background knowledge and the error case handling, welcome for any comments. Current implementation: ====================== A persistent write-back cache [1] is implemented in librbd, which provides an LBA-based, ordered write-back cache using NVDIMM as cache medium. The data layout on the cache device is split into three parts: header, a vector of log entries, and customer data. The customer data part store all the customer data. Every update request like write/discard etc is mapped to a log, and these logs are stored sequentially into a vector of log entries. The vector acts like a ring buffer and it is used repeatedly. The header part records the overall information about the cache pool, especially header and tail. The header indicates the first valid entry in log entries, and the tail indicates the next free entry in log entries. Replicated write-back cache ======================= The above is the overall implementation of persistent write-back cache in librbd, and currently the data is stored in local computer server with a single copy. To improve the redundancy, we are planning to add more copies across different servers. That is replicated write-back cache in client side through NVDIMM + RDMA. Except librbd, some replica daemon services will be started in other servers which provide management of NVDIMM devices in these servers. When a librbd starts and persistent write-back cache is required, it allocates a cache pool in local NVDIMM device. Meanwhile, it talks with replica daemons to allocate remote replica copies. After initialization, replica daemons register the replica pools and expose them through RDMA connections. All the cache metadata information is stored as part of the rbd image’s metadata. The librbd sets up RDMA connection with the corresponding replica daemons and access the data. With NVDIMM + RDMA, all the copies will have exactly the same data layout and data. The simple idea is to register NVDIMM through RDMA, and then using RDMA read/write to access the data which doesn’t need the involvement of CPUs in remote servers. These parts will use the RPMA library [2]. When an update request comes, it cached the request in local NVDIMM, and meanwhile persist in the same position of remote replica copies. This email is to focus on discussing how to handle kinds of failed scenarios. 1. Librbd crashes or local NVDIMM error As local cache pool is mmap to librbd application, the librbd process crashes when error happens in NVDIMM. So this NVDIMM error is the same as librbd crashes. Once the librbd process crashes, the RDMA connection to replicas will lose. The replica daemons are monitoring the connection’s status. Once they find the disconnection and wait some timeout time, they start to get the exclusive lock of the rbd image. There is only one replica daemon that can get the exclusive lock and it starts to flush the cache data to OSDs. Once flush is completed, it needs to do the following work: a. The cache metadata of the volume is updated as none and the exclusive lock is released. b. Notify other replica daemons to release the cache pools. 2. Librbd restarts. When the librbd process restarts, its corresponding replica daemons check that the RDMA connection is lost, wait some time, and try to flush. To prevent unnecessary flush by replica daemons, a timeout time can be configured by users. Only after waiting for the timeout time, replica daemons start to flush cached data. 3. Replica daemon crashes Same as above, the timeout time is needed to define. When librbd finds out the disconnection, it tries to recreate the connection. Once it fails after the timeout, it starts to find a new replica and sync data. Based on our tests, it takes about 1s to sync 1G data through two ports of 100Gb/s connection. The failover time includes 1) the time to find the error, 2) timeout time, 3) time to allocate a new replica copy, and 4) time to sync data. The overall time won’t exceed 300s (common IO timeout time). Once the failed replica daemon recovers in time, the librbd checks the data integrity by comparing the pool header. If data is sync, recover IO handling. 4. RDMA connections between librbd and replica daemons lost If only connections are lost, replica daemons try to get the exclusive lock. As the exclusive lock is held by librbd, it fails for replica daemon. As a result, no flush happens. In the librbd, it also finds out the disconnection. Its behaviors are the same as the fail case ‘replica daemon crashes’. [1] https://github.com/ceph/ceph/pull/35060 [2] https://github.com/pmem/rpma -- Best wishes Lisa

3 years, 5 months

1
0
0 0

11/19/2020 perf meeting is on at 8AM PST!

by Mark Nelson

Hi Folks, The weekly performance meeting will start in approx 15 minutes! Ben England will be presenting his work on Container networking performance today, and Gabi will also be presenting some of his work looking at rocksdb performance with different column family sharding and compaction options. Hope to see you there! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Thanks, Mark

3 years, 5 months

1
0
0 0

14.2.14 Nautilus validation status

by Yuri Weinstein

Details of this release summarized here: https://tracker.ceph.com/issues/48200#note-1 Asking dev leads for early approval as we target the release date - early next week. Some suites are still in progress (will try to finish over the weekend) rados - approved Neha? rgw - approved Casey? rbd - approved Jason? krbd - approved Jason, Ilya? fs - approved Patrick? multimds - approved Patrick? ceph-deploy - in progress upgrade/client-upgrade-jewel-nautilus (nautilus) - in progress upgrade/client-upgrade-mimic (nautilus) - in progress upgrade/client-upgrade-luminous-nautilus (nautilus) - in progress upgrade/client-upgrade-nautilus-octopus-octopus (octopus) - in progress upgrade/nautilus-p2 - in progress upgrade/luminous-x (nautilus) - in progress upgrade/mimic-x (nautilus) - in progress upgrade/nautilus-x (octopus) - in progress ceph-volume - in progress (Jan pls see) Thx YuriW

3 years, 5 months

12
23
0 0

v15.2.6 Octopus released

by David Galloway

This is the 6th backport release in the Octopus series. This releases fixes a security flaw affecting Messenger V2 for Octopus & Nautilus. We recommend users to update to this release. Notable Changes --------------- * CVE 2020-25660: Fix a regression in Messenger V2 replay attacks Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-15.2.6.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: cb8c61a60551b72614257d632a574d420064c17a

3 years, 5 months

2
1
0 0

v14.2.14 Nautilus released

by David Galloway

This is the 14th backport release in the Nautilus series. This releases fixes a security flaw affecting Messenger V2 for Octopus & Nautilus, among other fixes across components. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/releases/v14-2-14-nautilus-released Notable Changes --------------- * CVE 2020-25660: Fix a regression in Messenger V2 replay attacks Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-14.2.14.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 7e94c5afc28f3eaf36151ad1e1457de5f16c4fdf

3 years, 5 months

1
0
0 0

2024

2023

2022

2021

2020

2019

Dev November 2020