September 2020 - ceph-users

RBD-Mirror: snapshots automatically created?

by Eugen Block

Hi *, I encountered a rather strange (or at least unexpected) behaviour of the rbd-mirror. Maybe I don't fully understand the feature so please correct me if my assumptions are wrong. My two (virtual one-node) clusters are still on ceph version 15.2.4-864-g0f510cb110 and the following happens if I'm trying to configure snapshot-based rbd-mirroring: - I create a one-way replication from site A to site B - Mirror mode is image (for snapshot-based mirroring) - Import image on site A (no journal-feature enabled to make sure) - Image is replicated as soon as I enable the mirror mode on that image with: > rbd mirror image enable pool/image3 snapshot - rbd info shows "snapshot_count: 1" although I haven't created a snapshot of this image yet, and there's also no schedule: > siteA:~ # rbd mirror snapshot schedule ls > siteA:~ # This is the third image in this test, I only created one snapshot for image1, not for the other images. Is this expected? From the docs I assumed I have to either create snapshots manually or configure the rbd mirror snapshot schedule. Could anyone please clarify? Thank you! Eugen

3 years, 7 months

2
3
0 0

Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

by Stefan Kooman

On 2020-09-21 21:12, Wout van Heeswijk wrote: > Hi Rene, > > Yes, cephfs is a good filesystem for concurrent writing. When using CephFS with ganesha you can even scale out. > > It will perform better but why don't you mount CephFS inside the VM? ^^ This. But it depends on the VMs you are going to use as clients. Do you trust those clients enough that they are allowed to be part of your cluster. Clients are really part of the cluster, at least that is how I see it. If possible, you want to use modern (5.7, 5.8) linux kernels for cephfs (rm operations: is slower on 4.15/5.3/5.4 for files created with 5.3/5.4 kernel). We have issues (sometimes) with older kernel clients (Ubuntu xenial, 4.15 kernel) and "MDS messages client failed to rdlock") but we don't have 100% proof yet it is because of this kernel version. They generally fix themselves though, so not a big issue. Gr. Stefan

3 years, 7 months

2
1
0 0

ceph docs redirect not good

by Marc Roos

https://docs.ceph.com/docs/mimic/man/8/ceph-volume-systemd/

3 years, 7 months

1
1
0 0

Setting up a small experimental CEPH network

by Philip Rhoades

People, I am interested in experimenting with CEPH on say 4 or 8 small form factor computers (SBCs?) - any suggestions about how to get started? I haven't bought anything yet - I have some working Fedora Workstations and Servers and a laptop but I don't want to experiment on them . . Thanks, Phil. -- Philip Rhoades PO Box 896 Cowra NSW 2794 Australia E-mail: phil(a)pricom.com.au

3 years, 7 months

7
9
0 0

Troubleshooting stuck unclean PGs?

by Matt Larson

Hi, Our Ceph cluster is reporting several PGs that have not been scrubbed or deep scrubbed in time. It is over a week for these PGs to have been scrubbed. When I checked the `ceph health detail`, there are 29 pgs not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to manually start a scrub on the PGs, but it appears that they are actually in an unclean state that needs to be resolved first. This is a cluster running: ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable) Following the information at [Troubleshooting PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-…, I checked for PGs that are stuck stale | inactive | unclean. There were no PGs that are stale or inactive, but there are several that are stuck unclean: ``` PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 8.3c active+remapped+backfill_wait [124,41,108,8,87,16,79,157,49] 124 [139,57,16,125,154,65,109,86,45] 139 8.3e active+remapped+backfill_wait [108,2,58,146,130,29,37,66,118] 108 [127,92,24,50,33,6,130,66,149] 127 8.3f active+remapped+backfill_wait [19,34,86,132,59,78,153,99,6] 19 [90,45,147,4,105,61,30,66,125] 90 8.40 active+remapped+backfill_wait [19,131,80,76,42,101,61,3,144] 19 [28,106,132,3,151,36,65,60,83] 28 8.3a active+remapped+backfilling [32,72,151,30,103,131,62,84,120] 32 [91,60,7,133,101,117,78,20,158] 91 8.7e active+remapped+backfill_wait [108,2,58,146,130,29,37,66,118] 108 [127,92,24,50,33,6,130,66,149] 127 8.3b active+remapped+backfill_wait [34,113,148,63,18,95,70,129,13] 34 [66,17,132,90,14,52,101,47,115] 66 8.7f active+remapped+backfill_wait [19,34,86,132,59,78,153,99,6] 19 [90,45,147,4,105,61,30,66,125] 90 8.78 active+remapped+backfill_wait [96,113,159,63,29,133,73,8,89] 96 [138,121,15,103,55,41,146,69,18] 138 8.7d active+remapped+backfilling [0,90,60,124,159,19,71,101,135] 0 [150,72,124,129,63,10,94,29,41] 150 8.7c active+remapped+backfill_wait [124,41,108,8,87,16,79,157,49] 124 [139,57,16,125,154,65,109,86,45] 139 8.79 active+remapped+backfill_wait [59,15,41,82,131,20,73,156,113] 59 [13,51,120,102,29,149,42,79,132] 13 ``` If I query one of the PGs that is backfilling, 8.3a, it shows it's state as : "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2020-09-19T20:45:44.027759+0000", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [ "30(3)", "32(0)", "62(6)", "72(1)", "84(7)", "103(4)", "120(8)", "131(5)", "151(2)" ], Q1: Is there anything that I should check/fix to enable the PGs to resolve from the `unclean` state? Q2: I have also seen that the podman containers on one of our OSD servers are taking large amounts of disk space. Is there a way to limit the growth of disk space for podman containers, when administering a Ceph cluster using `cephadm` tools? At last check, a server running 16 OSDs and 1 MON is using 39G of disk space for its running containers. Can restarting containers help to start with a fresh slate or reduce the disk use? Thanks, Matt ------------------------ Matt Larson Associate Scientist Computer Scientist/System Administrator UW-Madison Cryo-EM Research Center 433 Babcock Drive, Madison, WI 53706

3 years, 7 months

2
1
0 0

What is the advice, one disk per OSD, or multiple disks

by Kees Bakker

Hello, Being new to CEPH, I need some advice how to setup a cluster. Given a node that has multiple disks, should I create one OSD for all disks, or is it better to have one OSD per disk. -- Kees Bakker

3 years, 7 months

3
2
0 0

Understanding what ceph-volume does, with bootstrap-osd/ceph.keyring, tmpfs

by Marc Roos

When I create a new encrypted osd with ceph volume[1] I assume something like this is being done, please correct what is wrong. - it creates the pv on the block device - it creates the ceph vg on the block device - it creates the osd lv in the vg - it uses cryptsetup to encrypt this lv (or is there some internal support for luks in lvm?) - it sets all the tags on the vg (shown by: lvs -o lv_tags vg) - it creates and enables ceph-volume@lvm-osdid-osdfsid - it creates and enables ceph-osd@osdid When a node is restarted, these lvm osds are started with - running ceph-volume@lvm-osdid-osdfsid (creating this tmpfs mount?) - running ceph-osd@osdid Q1: I had to create bootstrap-osd/ceph.keyring (ownership root.root). For what is that being used? Does it need to exist upon node restart? Q2: I had some issues with a node starting, solving this with adding a nofail to the fstab. How is this done with ceph-volume? Q3: Why these strange permissions on the mounted folder? drwxrwxrwt 2 ceph ceph 340 Sep 19 15:24 ceph-40 Q4: Where is this luks passphrase stored? Q5: Where does this tmpfs+content come from? How can I mount this myself from the command line? Q6: My lvm tags show ceph.crush_device_class=None, while ceph osd tree shows the correct class. Is this correct? Q7: I saw in my ceph-volume output sometimes 'disabling cephx', what does this mean? How can I verify this and fix it? Links to manuals are also welcome, these ceph-volume[2] are not to clear about this. [1] ceph-volume lvm create --data /dev/sdk --dmcrypt [2] https://docs.ceph.com/en/latest/ceph-volume/lvm/activate/

3 years, 7 months

1
0
0 0

ceph 14.2.8 tracing ceph with blking compile error

by 陈晓波

I want to trace a request in ceph processing in 14.2.8 version. I open the blkin with do_cmake.sh -DWITH_BLKIN=ON, when compile there is a error in below: ../lib/libblkin.a(tp.c.o): undefined reference to symbol 'lttng_probe_register' //lib64/liblttng-ust.so.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status make[2]: *** [src/CMakeFiles/ceph-osd.dir/build.make:130: bin/ceph-osd] Error 1 make[1]: *** Waiting for unfinished jobs.......

3 years, 7 months

1
0
0 0

Record recuperation mistake prompting Unlock Yahoo Account? Contact help group.

by mary smith

The record recuperation mistake can make a disappointment Unlock Yahoo Account. This issue can be settled by utilizing the assist that with canning be found in the tech consultancies. What's more, you can generally visit Youtube and take a gander at some tech vids that will help you in managing the issue. https://www.customercare-email.com/blog/unlock-my-yahoo-account/

3 years, 7 months

1
0
0 0

Issue in the Epson interface? Find uphold from Epson Printer Support.

by mary smith

In the event that you face an interface issue in your gadget, by then you should reboot it. Regardless, on the off chance that the issue doesn't get unraveled, by then you can take help from the tech consultancies and utilize their investigating procedures. You can also contact the Epson Printer Support for finding support. https://www.epsonprintersupportpro.net/

3 years, 7 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users September 2020