The next DocuBetter meeting is scheduled for tomorrow. This is at
the following time:
1800 PST 26 Feb 2020
0100 UTC 26 Feb 2020
1200 AEST 27 Feb 2020
Agenda: This week we will be discussing documentation requests,
the ceph osd df man page and documentation, the PG repair issue (finally
and a request to include information in the bootstrapping procedure about
the bootstrapping command takes to execute.
I am Duplex kamdjou (you can called me Duplex) a web developer with good
I was looking for a good project to improve my C programming skills and
come across this project.
I will like to learn more about Ceph and its applications. Then I will
appreciate any guidance you can point me to get me involve and contribute
to this project.
Thanks in advance and I hope to read from you soon.
Kamdjou Temfack Duplex M Phone: +237 670274538
*Software **Engineer* / Full-Stack developer
*Twitter:* @tony14pro <https://twitter.com/Tony14Pro>
*Website:* http://bproo.com <https://bproo.com>
Hey all, we're excited to be returning properly to SCaLE in
Pasadena this year (March 5-8) with a Thursday Birds-of-a-Feather
session and a booth in the expo hall. Please come by if you're
attending the conference or are in the area to get face time with
other area users and Ceph developers. :)
Also, I got drafted into organizing this so if you'd be willing to
help man the booth in exchange for an Expo pass, shoot me an email! I
think I've got 3 spots left.
I ran "make check" senta03.front.sepia.ceph.com but I get following
error on stdout -
Ignoring mock: markers 'python_version <= "3.3"' don't match your environment
Ignoring ipaddress: markers 'python_version < "3.3"' don't match your
Looking in links: file:///home/rishabh/master/src/pybind/mgr/wheelhouse
Obtaining file:///home/rishabh/master/src/python-common (from -r
requirements.txt (line 4))
ERROR: Command errored out with exit status 1: python setup.py
egg_info Check the logs for full command output.
Here's the requirements.txt being referred above. Do I need python
version > 2.7 but lower than 3.3? I don't spot any such version on a
different machine (it runs Fedora 31) where "make check" launched
successfully (although the tests there failed). I see every version
from python3.4 to python3.9 and python2.7 on my machine but nothing
that matches "python_version <= 3.3". On senta03 I can see python3.6,
python3.7 and python2.7. Also, where and by what name is the log
mentioned in error message saved as?
I tried running run-make-check.sh too. It too aborted before launching
any tests. For CentOS, 8 it complained that packages
colm-0.13.0.7-1.el8.x86_64.rpm and ragel-126.96.36.199-2.el8.x86_64.rpm are
unsigned and on Fedora 31 it complained that python37-coverage is
I've attached output for make check, run-make-check.sh on senta03 and
fedora 31 as make-check.log, run-make-check-centos8.log and
My name is Zac Dover and I was hired by Sage to improve the Ceph
For the past few months, I have been reading the documentation that exists
and making bugfixes where I am able. Now I think it's time to ask the
general Ceph community for complaints about and request for improvements to
There is a general documentation meeting called the "DocuBetter Meeting",
and it is held every two weeks. The next DocuBetter Meeting will be on
February 26, 2020 at 6 PM PST, and will run for thirty minutes. Everyone
with a documentation-related request or complaint is invited. The meeting
will be held here: https://bluejeans.com/908675367
Send documentation-related requests and complaints to me by replying to
this email and CCing me at zac.dover(a)gmail.com.
This message will be sent to dev(a)ceph.io every Monday morning, North
Josh and I area back from PTO so it's time to get the perf meeting going
again! Today I'd like to talk a little bit about some testing I did
wile on PTO lookling at CephFS performance using the HPC io500
benchmark. If Igor is able to make it, I'm hoping we can also talk a
little bit about his new hybrid AVL/bitmap allocator for bluestore and
deferred write PR. Hope to see you there!
I would like to see https://github.com/ceph/ceph/pull/28848 backported to Nautilus, as I'm currently unable to use devicehealth on 14.2.7 due to the fact that smartctl exist code > 0 is not handled properly.
I cherry-picked those commits on the nautilus branch, and they all apply cleanly, but when I try to follow https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#cr…, I'm stuck because (as far as I can tell) the "master tracker issue" doesn't exist.
What would be the best way forward in this case? Submit a PR without a backport tracker issue? Manually create the backport issue?
We are using RBD Snapshots as timely backup for DBs, 24 hourly
snapshot + 30 daily snapshots are taken for each RBDs. It works perfect at
the beginning however with the # of volumes increasing, more and more
significant pitfalls were seen. we are at ~ 700 volumes which will create
700 snapshots and rotate 700 snapshots every hour.
1. Huge and frequent OSDMap update
The OSDMap is ~640K in size , with a long and scattered
"removed_snaps". The holes in the removed_snap interval set are from two
- In our use case as we keep daily snapshots for longer ,which turn out
to be a hole in the removed_snap interval set for each daily snapshots.
a new snapid for each snapshot removal, according to the comment the new
snapid is intent to keep the interval_set contiguous. However I cannot
understand how it works, it seems to me like this behavior is creating more
holes when create/delete interleaving with each other.
- After processing 4 or 5 versions of map, the rocksdb write-ahead log
(WAL) is full and the corresponding memtable has to be flushed to disk.
2. pgstat update burn out MGR
starting from Mimic, PG by default update 500
purged_snapshot interval to MGR, which significant inflate the size of
pg_stat and causing MGR using 20GB+ Mem, 260%+ CPU(mostly on
messenger threads and MGR_FIN thread), and very unresponsive. Reduce
to 10 fix the issue in our env.
3. SnapTrim IO overhead
Though there are tuning knobs to control the speed of snaptrim however it
anyway need to catch up with the snapshot creation speed. What is more,
the snaptrim introduce huge amplification in RocksDB WAL, maybe due to to
4K alignment in WAL. We observed 156GB WAL was written during trimming 100
snapshots, however the generated L0 is 4.63GB which seems related with WAL
page align amplification. The PG purged snapshot from snaptrim_q one by
one , we are thinking if several purged snapshots for a given volume, can
be compacted and trim together, perhaps we can get better efficiency (we
only need change snapset for given obj once).
4. Deep-scurb on objects with hundreds of snapshots are super slow and
resulting osd_op_w_latency surged up 10x in our env, not yet deep dived.
5. How cache tier works with snapshots? does cache tier help with write
performance in this case?
There are several outstanding PRs like
https://github.com/ceph/ceph/pull/28330 to optimize the Snaptrim especially
get rid of the removed_snaps, we believe it will helps partly on #1 but
not sure how significant it helps others. As the env is a production env
so upgrading to Octopus RC is not flexible at the moment, will try out
once stable released.
I noticed a locking issue in kernel device.
When I stopped the ceph cluster and all daemons, the kernel device _lock somehow is still held and this line below will return r < 0:
int r = ::flock(fd_directs[WRITE_LIFE_NOT_SET], LOCK_EX | LOCK_NB);
The way I stop the cluster and daemons:
sudo bin/init-ceph --verbose forcestop
This error happens even after the reboot when I try to use vstart:
bdev _lock flock failed on ceph/build/dev/osd0/block
bdev open failed to lock /home/yzhan298/ceph/build/dev/osd0/block: (11) Resource temporarily unavailable
OSD::mkfs: couldn't mount ObjectStore: error (11) Resource temporarily unavailable
** ERROR: error creating empty object store in ceph/build/dev/osd0: (11) Resource temporarily unavailable
Please advice. (On master branch)