January 2024 - ceph-users

by Sake Ceph

Hi again, hopefully for the last time with problems. We had a MDS crash earlier with the MDS staying in failed state and used a command to reset the filesystem (this was wrong, I know now, thanks Patrick Donnelly for pointing this out). I did a full scrub on the filesystem and two files were damaged. One of those got repaired, but the following file keeps giving errors and can't be removed. What can I do now? Below some information. # ceph tell mds.atlassian-prod:0 damage ls [ { "damage_type": "backtrace", "id": 2244444901, "ino": 1099534008829, "path": "/app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01" } ] Trying to repair the error (online research shows this should work for a backtrace damage type) ---------- # ceph tell mds.atlassian-prod:0 scrub start /app1/shared/data/repositories/11271 recursive,repair,force { "return_code": 0, "scrub_tag": "d10ead42-5280-4224-971e-4f3022e79278", "mode": "asynchronous" } Cluster logs after this ---------- 1/2/24 9:37:05 AM [INF] scrub summary: idle 1/2/24 9:37:02 AM [INF] scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271] 1/2/24 9:37:01 AM [INF] scrub summary: active paths [/app1/shared/data/repositories/11271] 1/2/24 9:37:01 AM [INF] scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271] 1/2/24 9:37:01 AM [INF] scrub queued for path: /app1/shared/data/repositories/11271 But the error doesn't disappear and still can't remove the file. On the client trying to remove the file (we got a backup) ---------- $ rm -f /mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01 rm: cannot remove '/mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01': Input/output error Best regards, Sake

1 month, 3 weeks

2
1
0 0

MDS subtree pinning

by Sake Ceph

Hi! As I'm reading through the documentation about subtree pinning, I was wondering if the following is possible. We've got the following directory structure. / /app1 /app2 /app3 /app4 Can I pin /app1 to MDS rank 0 and 1, the directory /app2 to rank 2 and finally /app3 and /app4 to rank 3? I would like to load balance the subfolders of /app1 to 2 (or 3) MDS servers. Best regards, Sake

1 month, 3 weeks

2
2
0 0

Upgrading from Pacific to Quincy fails with "Unexpected error"

by Reza Bakhshayeshi

Hi all, I have a problem regarding upgrading Ceph cluster from Pacific to Quincy version with cephadm. I have successfully upgraded the cluster to the latest Pacific (16.2.11). But when I run the following command to upgrade the cluster to 17.2.5, After upgrading 3/4 mgrs, the upgrade process stops with "Unexpected error". (everything is on a private network) ceph orch upgrade start my-private-repo/quay-io/ceph/ceph:v17.2.5 I also tried the 17.2.4 version. cephadm fails to check the hosts' status and marks them as offline: cephadm 2023-04-06T10:19:59.998510+0000 mgr.host9.arhpnd (mgr.4516356) 5782 : cephadm [DBG] host host4 (x.x.x.x) failed check cephadm 2023-04-06T10:19:59.998553+0000 mgr.host9.arhpnd (mgr.4516356) 5783 : cephadm [DBG] Host "host4" marked as offline. Skipping daemon refresh cephadm 2023-04-06T10:19:59.998581+0000 mgr.host9.arhpnd (mgr.4516356) 5784 : cephadm [DBG] Host "host4" marked as offline. Skipping gather facts refresh cephadm 2023-04-06T10:19:59.998609+0000 mgr.host9.arhpnd (mgr.4516356) 5785 : cephadm [DBG] Host "host4" marked as offline. Skipping network refresh cephadm 2023-04-06T10:19:59.998633+0000 mgr.host9.arhpnd (mgr.4516356) 5786 : cephadm [DBG] Host "host4" marked as offline. Skipping device refresh cephadm 2023-04-06T10:19:59.998659+0000 mgr.host9.arhpnd (mgr.4516356) 5787 : cephadm [DBG] Host "host4" marked as offline. Skipping osdspec preview refresh cephadm 2023-04-06T10:19:59.998682+0000 mgr.host9.arhpnd (mgr.4516356) 5788 : cephadm [DBG] Host "host4" marked as offline. Skipping autotune cluster 2023-04-06T10:20:00.000151+0000 mon.host8 (mon.0) 158587 : cluster [ERR] Health detail: HEALTH_ERR 9 hosts fail cephadm check; Upgrade: failed due to an unexpected exception cluster 2023-04-06T10:20:00.000191+0000 mon.host8 (mon.0) 158588 : cluster [ERR] [WRN] CEPHADM_HOST_CHECK_FAILED: 9 hosts fail cephadm check cluster 2023-04-06T10:20:00.000202+0000 mon.host8 (mon.0) 158589 : cluster [ERR] host host7 (x.x.x.x) failed check: Unable to reach remote host host7. Process exited with non-zero exit status 3 cluster 2023-04-06T10:20:00.000213+0000 mon.host8 (mon.0) 158590 : cluster [ERR] host host2 (x.x.x.x) failed check: Unable to reach remote host host2. Process exited with non-zero exit status 3 cluster 2023-04-06T10:20:00.000220+0000 mon.host8 (mon.0) 158591 : cluster [ERR] host host8 (x.x.x.x) failed check: Unable to reach remote host host8. Process exited with non-zero exit status 3 cluster 2023-04-06T10:20:00.000228+0000 mon.host8 (mon.0) 158592 : cluster [ERR] host host4 (x.x.x.x) failed check: Unable to reach remote host host4. Process exited with non-zero exit status 3 cluster 2023-04-06T10:20:00.000240+0000 mon.host8 (mon.0) 158593 : cluster [ERR] host host3 (x.x.x.x) failed check: Unable to reach remote host host3. Process exited with non-zero exit status 3 and here are some outputs of the commands: [root@host8 ~]# ceph -s cluster: id: xxx health: HEALTH_ERR 9 hosts fail cephadm check Upgrade: failed due to an unexpected exception services: mon: 5 daemons, quorum host8,host1,host7,host2,host9 (age 2w) mgr: host9.arhpnd(active, since 105m), standbys: host8.jowfih, host1.warjsr, host2.qyavjj mds: 1/1 daemons up, 3 standby osd: 37 osds: 37 up (since 8h), 37 in (since 3w) data: io: client: progress: Upgrade to 17.2.5 (0s) [............................] [root@host8 ~]# ceph orch upgrade status { "target_image": "my-private-repo/quay-io/ceph/ceph@sha256 :34c763383e3323c6bb35f3f2229af9f466518d9db926111277f5e27ed543c427", "in_progress": true, "which": "Upgrading all daemon types on all hosts", "services_complete": [], "progress": "3/59 daemons upgraded", "message": "Error: UPGRADE_EXCEPTION: Upgrade: failed due to an unexpected exception", "is_paused": true } [root@host8 ~]# ceph cephadm check-host host7 check-host failed: Host 'host7' not found. Use 'ceph orch host ls' to see all managed hosts. [root@host8 ~]# ceph versions { "mon": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 5 }, "mgr": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 1, "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 3 }, "osd": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 37 }, "mds": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 4 }, "overall": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 47, "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 3 } } The strange thing is I can rollback the cluster status by failing to not-upgraded mgr like this: ceph mgr fail ceph orch upgrade start my-private-repo/quay-io/ceph/ceph:v16.2.11 Would you happen to have any idea about this? Best regards, Reza

1 month, 3 weeks

4
9
0 0

Disable signature url in ceph rgw

by marc＠singer.services

Hi Ceph users We are using Ceph Pacific (16) in this specific deployment. In our use case we do not want our users to be able to generate signature v4 URLs because they bypass the policies that we set on buckets (e.g IP restrictions). Currently we have a sidecar reverse proxy running that filters requests with signature URL specific request parameters. This is obviously not very efficient and we are looking to replace this somehow in the future. 1. Is there an option in RGW to disable this signed URLs (e.g returning status 403)? 2. If not is this planned or would it make sense to add it as a configuration option? 3. Or is the behaviour of not respecting bucket policies in RGW with signature v4 URLs a bug and they should be actually applied? Thanks you for your help and let me know if you have any questions Marc Singer

2 months

4
4
0 0

erasure-code-lrc Questions regarding repair

by Ansgar Jazdzewski

hi folks, I currently test erasure-code-lrc (1) in a multi-room multi-rack setup. The idea is to be able to repair a disk-failures within the rack itself to lower bandwidth-usage ```bash ceph osd erasure-code-profile set lrc_hdd \ plugin=lrc \ crush-root=default \ crush-locality=rack \ crush-failure-domain=host \ crush-device-class=hdd \ mapping=__DDDDD__DDDDD__DDDDD__DDDDD \ layers=' [ [ "_cDDDDD_cDDDDD_cDDDDD_cDDDDD", "" ], [ "cDDDDDD_____________________", "" ], [ "_______cDDDDDD______________", "" ], [ "______________cDDDDDD_______", "" ], [ "_____________________cDDDDDD", "" ], ]' \ crush-steps='[ [ "choose", "room", 4 ], [ "choose", "rack", 1 ], [ "chooseleaf", "host", 7 ], ]' ``` The roule picks 4 out of 5 rooms and keeps the PG in one rack like expected! However it looks like the PG will not move to another Room if the PG is undersized or the entire Room or Rack is down! Questions: * do I miss something to allow LRC (PG's) to move across Racks/Rooms for repair? * Is it even possible to build such a 'Multi-stage' grushmap? Thanks for your help, Ansgar 1) https://docs.ceph.com/en/quincy/rados/operations/erasure-code-jerasure/

2 months

2
2
0 0

Successfully using dm-cache

by Michael Lipp

Just in case anybody is interested: Using dm-cache works and boosts performance -- at least for my use case. The "challenge" was to get 100 (identical) Linux-VMs started on a three node hyperconverged cluster. The hardware is nothing special, each node has a Supermicro server board with a single CPU with 24 cores and 4 x 4 TB hard disks. And there's that extra 1 TB NVMe... I know that the general recommendation is to use the NVMe for WAL and metadata, but this didn't seem appropriate for my use case and I'm still not quite sure about failure scenarios with this configuration. So instead I made each drive a logical volume (managed by an OSD) and added 85 GiB NVMe to each LV as read-only cache. Each VM uses as system disk an RBD based on a snapshot from the master image. The idea was that with this configuration, all VMs should share most (actually almost all) of the data on their system disk and this data should be available from the cache. Well, it works. When booting the 100 VMs, almost all read operations are satisfied from the cache. So I get close to NVMe speed but have payed for conventional hard drives only (well, SSDs aren't that much more expensive nowadays, but the hardware is 4 years old). So, nothing sophisticated, but as I couldn't find anything about this kind of setup, it might be of interest nevertheless. - Michael

2 months

13
25
0 0

pacific 16.2.15 QE validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/64151#note-1 Seeking approvals/reviews for: rados - Radek, Laura, Travis, Ernesto, Adam King rgw - Casey fs - Venky rbd - Ilya krbd - in progress upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed) upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed) upgrade/pacific-x (quincy) - in progress upgrade/pacific-p2p - Ilya PTL (maybe rbd related?) ceph-volume - Guillaume TIA YuriW

2 months, 1 week

12
27
0 0

FS down - mds degraded

by Sake Ceph

Starting a new thread, forgot subject in the previous. So our FS down. Got the following error, what can I do? # ceph health detail HEALTH_ERR 1 filesystem is degraded; 1 mds daemon damaged [WRN] FS_DEGRADED: 1 filesystem is degraded fs atlassian/prod is degraded [ERR] MDS_DAMAGE: 1 mds daemon damaged fs atlassian-prod mds.1 is damaged # ceph fs get atlassian-prod Filesystem 'atlassian-prod' (2) fs_name atlassian-prod epoch 43440 flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay created 2023-05-10T08:45:46.911064+0000 modified 2023-12-21T06:47:19.291154+0000 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features {} last_failure 0 last_failure_osd_epoch 29480 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 3 in 0,1,2 up {0=1073573,2=1073583} failed damaged 1 stopped data_pools [5] metadata_pool 4 inline_data disabled balancer standby_count_wanted 1 [mds.atlassian-prod.pwsoel13142.egsdfl{0:1073573} state up:resolve seq 573 join_fscid=2 addr [v2:10.233.127.22:6800/61692284,v1:10.233.127.22:6801/61692284] compat {c=[1],r=[1],i=[7ff]}] [mds.atlassian-prod.pwsoel13143.qlvypn{2:1073583} state up:resolve seq 571 join_fscid=2 addr [v2:10.233.127.18:6800/3627858294,v1:10.233.127.18:6801/3627858294] compat {c=[1],r=[1],i=[7ff]}] Best regards, Sake

2 months, 2 weeks

4
7
0 0

Re: Debian 12 (bookworm) / Reef 18.2.1 problems

by Chris Palmer

I have logged this as https://tracker.ceph.com/issues/64213 On 16/01/2024 14:18, DERUMIER, Alexandre wrote: > Hi, > >>> ImportError: PyO3 modules may only be initialized once per >>> interpreter >>> process >>> >>> and ceph -s reports "Module 'dashboard' has failed dependency: PyO3 >>> modules may only be initialized once per interpreter process > We have the same problem on proxmox8 (based on debian12) with ceph > quincy or reef. > > It seem to be related to python version on debian12 > > (we have no fix for this currently) > > >

2 months, 2 weeks

6
7
0 0

Ceph as rootfs?

by Jeremy Hansen

Is it possible to use Ceph as a root filesystem for a pxe booted host? Thanks

3 months

5
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2024