September 2020 - Dev - lists.ceph.io

Dashboard Developer Guide now live -- formerly HACKING.rst

by John Zachary Dover

The document formerly known as "HACKING.rst" is now live on the docs website at the following location: https://docs.ceph.com/docs/master/dev/developer_guide/dash-devel/ This represents a significant addition to the Developer Guide. If you haven't seen it before, take a look. It's big. Please direct pull requests against this new file and not against HACKING.rst. Zac Dover Upstream Docs Ceph

3 years, 7 months

2
1
0 0

Fwd: is ceph-deploy still used?

by kefu chai

sorry for cross-posting. i sent this mail to ceph-maintainers two months ago, but got no responses so far. but after reading the comments in https://github.com/ceph/ceph-deploy/pull/496, i think i should check with ceph-devel as well. so i am forwarding this mail to ceph-devel for more inputs. ---------- Forwarded message --------- From: kefu chai <tchaikov(a)gmail.com> Date: Thu, Jun 4, 2020 at 6:39 PM Subject: is ceph-deploy still used? To: <ceph-maintainers(a)ceph.io> Cc: Neha Ojha <nojha(a)redhat.com>, Josh Durgin <jdurgin(a)redhat.com>, Brad Hubbard <bhubbard(a)redhat.com>, James Page <james.page(a)ubuntu.com> hi ceph maintainers, when reviewing ceph-deploy PRs, i am wondering why are we still maintaining this tool. as IIUC, we are supposed to deploy ceph using the Ansible playbooks offered by ceph-ansble[0]. and in future, we are more likely to deploy a ceph cluster using cephadm[1]. so the question is, are you still packaging / using ceph-deploy? cheers, -- [0] https://github.com/ceph/ceph-ansible [1] https://ceph.io/ceph-management/introducing-cephadm/ -- Regards Kefu Chai -- Regards Kefu Chai

3 years, 7 months

5
7
0 0

15.2.5 QE Octopus validation status

by Yuri Weinstein

Details of this release summarized here: https://tracker.ceph.com/issues/47173#note-1 smoke - PASSED rados - approved Neha? rgw - approved Casey? rbd - approved Jason? krbd - approved Jason, Ilya? fs - approved Patrick? kcephfs - approved Patrick? multimds - approved Patrick? upgrade/mimic-x (octopus) - PASSED upgrade/nautilus-x (octopus) - PASSED upgrade/octopus-p2p - Josh, Naha approved? upgrade/client-upgrade-luminous-octopus - PASSED upgrade/client-upgrade-mimic-octopus - PASSED upgrade/client-upgrade-nautilus-octopus - PASSED powercycle - approved Neha? ceph-ansible - approved Brad? ceph-volume - approved Jan? (ceph-ansible bug? number?) (please speak up if something is missing) David, Josh - is it to upgrade time sepia ?. Thx YuriW

3 years, 7 months

9
14
0 0

My materials on pool removal stuff shared at perf call

by Igor Fedotov

Hi folks, here are the links for slides/sheets I presented at yesterday's perf call. Slides: https://docs.google.com/presentation/d/1Qid__UuHmE5PhVmFT8aviZADuiLp32zzbhq… Sheets: https://docs.google.com/spreadsheets/d/1ngQA-x7Qpk0HARlkfZIOVFW8TGuAYhmhoW4… @Josh - some feedback to one of your comments at the call: Today I made another experiment and run original(!) delete with sleep=1s (dropped from default 2s just to complete faster). The second stage's parallel writes were executed for 2500s (initial ones had been run for 1000s as before) . Pool removal completed in 2127 seconds and one can also observe writing performance drop for some seconds before the completion in this scenario. See "long original deleting" sheet under the second link above. Hence it looks like bulk removals aren't worse than original stuff in this aspect... Thanks, Igor

3 years, 7 months

1
0
0 0

RGW Couldnt init storage (RGW Tracing)

by Abhinav Singh

Hello everyone, When I m starting this my rgw server I m getting this error "couldn't init storage provider" Command I used : "RGW=1 ../src/vstart.sh -d -n -x" OS:ubuntu : 18.04 Ceph version : ceph version 15.1.0-1866-g053fd8f816 (053fd8f816ec0583fddfc63918dda521a3cf821e) octopus (rc) I see that error is happening inside "rgw_sal_rados.cc" in function `rgw::sal::RadosStore::init_storage_provider(...)` but I dont understand why it is happening I tried to restart my system and do it is still not working, I dont understand which file should I share with you all to find my issue so you can ask me for specific file :) Thank You, Abhinav Singh

3 years, 7 months

2
3
0 0

[ceph]Is it a bug for EC pool?

by norman

Hi, I have changed most of pools from 3-replica to ec 4+2 in my cluster, when I use ceph df command to show the used capactiy of the cluster: RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1.8 PiB 788 TiB 1.0 PiB 1.0 PiB 57.22 ssd 7.9 TiB 4.6 TiB 181 GiB 3.2 TiB 41.15 ssd-cache 5.2 TiB 5.2 TiB 67 GiB 73 GiB 1.36 TOTAL 1.8 PiB 798 TiB 1.0 PiB 1.0 PiB 56.99 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL default-oss.rgw.control 1 0 B 8 0 B 0 1.3 TiB default-oss.rgw.meta 2 22 KiB 97 3.9 MiB 0 1.3 TiB default-oss.rgw.log 3 525 KiB 223 621 KiB 0 1.3 TiB default-oss.rgw.buckets.index 4 33 MiB 34 33 MiB 0 1.3 TiB default-oss.rgw.buckets.non-ec 5 1.6 MiB 48 3.8 MiB 0 1.3 TiB .rgw.root 6 3.8 KiB 16 720 KiB 0 1.3 TiB default-oss.rgw.buckets.data 7 274 GiB 185.39k 450 GiB 0.14 212 TiB default-fs-metadata 8 488 GiB 153.10M 490 GiB 10.65 1.3 TiB default-fs-data0 9 374 TiB 1.48G 939 TiB 74.71 212 TiB ... The USED = 3 * STORED in 3-replica mode is completely right, but for EC 4+2 pool (for default-fs-data0 ) P.S. I have another cluster with the same config, its ceph df output is right. The diff between them is that the cluster has different HDD OSD(size 8T and 12T). I'm not sure it's a bug or something, but it's not reasonable for the spaces used.

3 years, 7 months

1
1
0 0

09/10/2020 perf meeting is on at 8AM PST!

by Mark Nelson

Hi Folks, The weekly performance meeting will be starting in 5 minutes! Today, we are going to continue discussing refactoring onodes in bluestore to improve memory usage and CPU overhead. See you there! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Thanks, Mark

3 years, 7 months

1
0
0 0

Ceph API tests required for Ceph Pull Requests

by Ernesto Puerta

Hi cephers, At the CLT meeting today there's been agreement to *make Ceph API tests "required" *again for Pull Request to be merged: - The current approach (*"honoring the agreement not to merge failing PRs"*) is simply not working: PRs have been merged with API tests in red. While most of these are harmless due to random failures (*we are working to improve this*), other times API tests warned about real issues... which eventually slipped into the code. [1] <https://tracker.ceph.com/issues/47306> [2] <https://tracker.ceph.com/issues/45717> [3] <https://github.com/ceph/ceph/pull/36091> - The cost & risk of debugging issues a posteriori is usually higher than the pain of retriggering the API tests (*we are working to improve this*). - Ceph API tests, even with their downsides, are providing true integration testing at CI time: this doesn't simply mean complex unit tests or component testing, it means running a vstart Ceph cluster and actually testing RADOS, RBD, RGW, CephFS... *What does this mean?* If Ceph API tests are in green, great! It's not that hard to achieve: ~*75% PRs pass the Ceph API tests from the beginning.* [image: image.png] What if they *are NOT* passing? [image: image.png] From Github you may access Ceph API tests results in Jenkins by clicking in *"Details"* and you'll see a report: 1. The test may fail due to multiple causes: issues in a Jenkins node, Github repo fetching, "make" stage, ... (if this is the case you may easily retrigger the Ceph API test by adding a comment to the PR with the text "jenkins test api"). 2. If the failure actually happens as a result of the Ceph API tests themselves, the report will look like this <https://jenkins.ceph.com/job/ceph-api/2726/>: [image: image.png] From there: - You can quickly check whether this has already been reported <https://tracker.ceph.com/search?q=FAIL:%20test_all%20%28tasks.mgr.dashboard…> (a known issue or a flapping test) or otherwise raise a new issue report <https://tracker.ceph.com/projects/mgr/issues/new?issue[subject]=FAIL:%20tes…> . - If the failure looks like a flapping one, you may retrigger the tests. - If, however, the failure is caused by an intentional change in behaviour, please reach out to Dashboard team for help. *What may you expect from the Dashboard team?* - We are working to harden Ceph API tests, increase their coverage and make them more stable. You may check our backlog <https://pad.ceph.com/p/dashboard-api-test-improvements> of improvements. You are welcome to contribute with ideas or, even better, working code ;-) - We are monitoring every day how Ceph API tests are doing: failure rate, runtime, ... - You can find us in #IRC (#ceph-dashboard), Github (@ceph/dashboard), in this very mail-list or pinging us directly: Lenz (in CC) is the component lead, Laura (in CC too) is taking care of Dashboard QA, or myself. Kind regards, Ernesto

3 years, 7 months

2
1
0 0

RE: Ceph with RDMA

by Liu, Changcheng

@Haomai, Does HAVE_IBV_EXP still work with any RNIC in current Ceph repository? @Nasution: I have never used below options yet ms_async_rdma_roce_ver = 0 #RoCEv1, all nodes with same networks. Should I use RoCEv2? ms_async_rdma_local_gid = fe80:0000:0000:0000:****:****:****:**** #should I use 0000:0000:0000:0000:0000 :****:****:**** one? To use RDMA, you may need: 1) configure “ulimit -l” to be unlimited 2) For RNIC with SRQ function: a. below configuration should be OK ms_async_rdma_device_name = mlx5_bond_0 ms_cluster_type = async+rdma ms_public_type = async+posix b. If you need to different RoCEv1 or RoCEv2, you need to configure “ms_async_rdma_gid_idx” Reference: https://github.com/ceph/ceph/pull/31517/commits/b971cff51a9179c02f85a27cc19… From: Lazuardi Nasution <mrxlazuardin(a)gmail.com> Sent: Thursday, September 10, 2020 12:23 AM To: Liu, Changcheng <changcheng.liu(a)intel.com> Subject: Ceph with RDMA Hi, I'm reading your post regarding Ceph with RDMA. Have you solved your problem? I'm trying the same way, but currently I'm facing a problem that some OSDs are automatically down not so long after it up due to no heartbeat reply, even for the newly installed cluster. I'm using the following RDMA related configuration. [global] ....... ms_async_rdma_device_name = mlx5_bond_0 ms_cluster_type = async+rdma ms_public_type = async+posix #/rbd does not support rdma ms_async_rdma_polling_us = 0 ms_async_rdma_roce_ver = 0 #RoCEv1, all nodes with same networks. Should I use RoCEv2? ms_async_rdma_local_gid = fe80:0000:0000:0000:****:****:****:**** #should I use 0000:0000:0000:0000:0000 :****:****:**** one? [mgr] ms_type = async+posix I have put "LimitMEMLOCK on OSD (because it is the only one that failed to start without it) systemd unit file. "Would you mind sharing your configuration of working Ceph with RDMA? Do I miss something? Best regards,

3 years, 7 months

1
0
0 0

Proper Merge Commit Format -- an announcement

by John Zachary Dover

Ceph Developers, There is a proper format for Merge Commits, which has been documented here: https://docs.ceph.com/docs/master/dev/developer_guide/basic-workflow/#prope… Kefu is quite keen for us to adhere to this format. If this needs to be beefed up or slimmed down, let me know. Zac Dover Upstream Docs Ceph

3 years, 7 months

4
4
0 0

2024

2023

2022

2021

2020

2019

Dev September 2020