August 2023 - ceph-users

by Curt

Hello, We recently upgraded our cluster to version 18 and I've noticed some things that I'd like feedback on before I go down a rabbit hole for non-issues. cephadm was used for the upgrade and there were no issues. Cluster is 56 OSD's spinners for right now only used for RBD images. I've noticed active scrubs/deep scrubs. I don't remember seeing a large amount before, usually around 20-30 scrubs and 15 deep I think, now I will have 70 scrubs and 70 deep scrubs happening. Which I thought were limited to 1 per OSD or am I misunderstanding osd_max_scrubs? Everything on the cluster is currently at default values. The other thing I've noticed is since the upgrade it seems like any time backfill happens the client io drops, but neither is high to begin with, 30MiB/s read/write client IO drops to 10-15 with 200MiB/s backfill. Before upgrading backfill would be hitting 5-600 with 30 clientio. I realize lots of things could affect this and it could be separate from the cluster, I'm still investigating, but wanted to mention it incase someone could recommend a check or some change to Reef that could cause this. mclock profile is client_io. Thanks, Curt

8 months, 3 weeks

1
0
0 0

16.2.14 pacific QE validation status

by Yuri Weinstein

Details of this release are summarized here: https://tracker.ceph.com/issues/62527#note-1 Release Notes - TBD Seeking approvals for: smoke - Venky rados - Radek, Laura rook - Sébastien Han cephadm - Adam K dashboard - Ernesto rgw - Casey rbd - Ilya krbd - Ilya fs - Venky, Patrick upgrade/pacific-p2p - Laura powercycle - Brad (SELinux denials) Thx YuriW

8 months, 3 weeks

10
17
0 0

Status of diskprediction MGR module?

by Robert Sander

Hi, Several years ago the diskprediction module was added to the MGR collecting SMART data from the OSDs. There were local and cloud modes available claiming different accuracies. Now only the local mode remains. What is the current status of that MGR module (diskprediction_local)? We have a cluster where SMART data is available from the disks (tested with smartctl and visible in the Ceph dashboard), but even with an enabled diskprediction_local module no health and lifetime info is shown. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin

8 months, 3 weeks

3
3
0 0

What does 'removed_snaps_queue' [d5~3] means?

by Work Ceph

Hello guys, We are facing/seeing an unexpected mark in one of our pools. Do you guys know what does "removed_snaps_queue" it mean? We see some notation such as "d5~3" after this tag. What does it mean? We tried to look into the docs, but could not find anything meaningful. We are running Ceph Octopus on top of Ubuntu 18.04.

8 months, 3 weeks

2
3
0 0

Windows 2016 RBD Driver install failure

by Robert Ford

Hello, We have been running into an issue installing the pacific windows rbd driver on windows 2016. It has no issues with either 2019 or 2022. It looks like it fails at checkpoint creation. We are installing it as admin. Has anyone seen this before or know of a solution? The closest thing I can find to why it wont install: ******* Product: D:\software\ceph_pacific_beta.msi ******* Action: INSTALL ******* CommandLine: ********** MSI (s) (CC:24) [12:31:30:315]: Machine policy value 'DisableUserInstalls' is 0 MSI (s) (CC:24) [12:31:30:315]: Note: 1: 2203 2: C:\windows\Installer\inprogressinstallinfo.ipi 3: -2147287038 MSI (s) (CC:24) [12:31:30:315]: Machine policy value 'LimitSystemRestoreCheckpointing' is 0 MSI (s) (CC:24) [12:31:30:315]: Note: 1: 1715 2: Ceph for Windows MSI (s) (CC:24) [12:31:30:315]: Calling SRSetRestorePoint API. dwRestorePtType: 0, dwEventType: 102, llSequenceNumber: 0, szDescription: "Installed Ceph for Windows". MSI (s) (CC:24) [12:31:30:315]: The call to SRSetRestorePoint API failed. Returned status: 0. GetLastError() returned: 127 -- -- Robert Ford GoDaddy | SRE III 9519020587 Phoenix, AZ rford(a)godaddy.com

8 months, 3 weeks

2
1
0 0

rbd export-diff/import-diff hangs

by Tony Liu

Hi, I'm using rbd import and export to copy image from one cluster to another. Also using import-diff and export-diff to update image in remote cluster. For example, "rbd --cluster local export-diff ... | rbd --cluster remote import-diff ...". Sometimes, the whole command is stuck. I can't tell it's stuck on which end of the pipe. I did some search, [1] seems the same issue and [2] is also related. Wonder if there is any way to identify where it's stuck and get more debugging info. Given [2], I'd suspect the import-diff is stuck, cause rbd client is importing to the remote cluster. Networking latency could be involved here? Ping latency is 7~8 ms. Any comments is appreciated! [1] https://bugs.launchpad.net/cinder/+bug/2031897 [2] https://stackoverflow.com/questions/69858763/ceph-rbd-import-hangs Thanks! Tony

8 months, 3 weeks

3
5
0 0

rbd export with export-format 2 exports all snapshots?

by Tony Liu

Hi, Say, source image has snapshot s1, s2 and s3. I expect "export" behaves the same as "deep cp", when specify a snapshot, with "--export-format 2", only the specified snapshot and all snapshots earlier than that will be exported. What I see is that, no matter which snapshot I specify, "export" with "--export-format 2" always exports the whole image with all snapshots. Is this expected? Could anyone help to clarify? Thanks! Tony

8 months, 3 weeks

2
2
0 0

Can ceph-volume manage the LVs optionally used for DB / WAL at all?

by Christian Rohmann

Hey ceph-users, I was wondering if ceph-volume did anything in regards to the management (creation, setting metadata, ....) of LVs which are used for DB / WAL of an OSD? Reading the documentation at https://docs.ceph.com/en/latest/man/8/ceph-volume/#new-db is seems to indicate that the LV to be used as e.g. DB needs to be created manually (without ceph-volume) and exist prior to using ceph-volume to move the DB to that LV? I suppose the same is true for "ceph-volume lvm create" or "ceph-volume lvm prepare" and "--block.db" It's not that creating a few LVs is hard... it's just that ceph volume does apply some structure to the naming of LVM VGs and LVs on the OSD device and also adds metadata. That would then be up to the user, right? Regards Christian

8 months, 3 weeks

2
5
0 0

cephadm to setup wal/db on nvme

by Satish Patel

Folks, I have 3 nodes with each having 1x NvME (1TB) and 3x 2.9TB SSD. Trying to build ceph storage using cephadm on Ubuntu 22.04 distro. If I want to use NvME for Journaling (WAL/DB) for my SSD based OSDs then how does cephadm handle it? Trying to find a document where I can tell cephadm to deploy wal/db on nvme so it can speed up write optimization. Do I need to create or cephadm will create each partition for the number of OSD? Help me to understand how it works and is it worth doing?

8 months, 4 weeks

3
4
0 0

lun allocation failure

by Opánszki Gábor

Hi folks, we deployed new reef cluster to our lab. all of the nodes are up and running, but we can't allocate lun to target. on the gui we got "disk create/update failed on ceph-iscsigw0. LUN allocation failure" message. We created images on gui do you have any idea? Thanks root@ceph-mgr0:~# ceph -s *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** cluster: id: ad0aede2-4100-11ee-bc14-1c40244f5c21 health: HEALTH_OK services: mon: 5 daemons, quorum ceph-mgr0,ceph-mgr1,ceph-osd5,ceph-osd7,ceph-osd6 (age 28h) mgr: ceph-mgr0.sapbav(active, since 45h), standbys: ceph-mgr1.zwzyuc osd: 44 osds: 44 up (since 4h), 44 in (since 4h) tcmu-runner: 1 portal active (1 hosts) data: pools: 5 pools, 3074 pgs objects: 27 objects, 453 KiB usage: 30 GiB used, 101 TiB / 101 TiB avail pgs: 3074 active+clean io: client: 2.7 KiB/s rd, 2 op/s rd, 0 op/s wr root@ceph-mgr0:~# root@ceph-mgr0:~# rados lspools .mgr ace1 1T-r3-01 ace0 x root@ceph-mgr0:~# rbd ls 1T-r3-01 111 aaaa bb pool2 teszt root@ceph-mgr0:~# rbd ls x x-a root@ceph-mgr0:~# root@ceph-mgr0:~# rbd info 1T-r3-01/111 rbd image '111': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 0 id: 5f927ce161de block_name_prefix: rbd_data.5f927ce161de format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Aug 24 17:33:37 2023 access_timestamp: Thu Aug 24 17:33:37 2023 modify_timestamp: Thu Aug 24 17:33:37 2023 root@ceph-mgr0:~# rbd info 1T-r3-01/aaaa rbd image 'aaaa': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 0 id: 5f926a0e299f block_name_prefix: rbd_data.5f926a0e299f format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Aug 24 17:18:06 2023 access_timestamp: Thu Aug 24 17:18:06 2023 modify_timestamp: Thu Aug 24 17:18:06 2023 root@ceph-mgr0:~# rbd info x/x-a rbd image 'x-a': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 0 id: 5f922dbdf6c6 block_name_prefix: rbd_data.5f922dbdf6c6 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Aug 24 17:48:28 2023 access_timestamp: Thu Aug 24 17:48:28 2023 modify_timestamp: Thu Aug 24 17:48:28 2023 root@ceph-mgr0:~# root@ceph-mgr0:~# ceph orch ls --service_type iscsi *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** NAME PORTS RUNNING REFRESHED AGE PLACEMENT iscsi.gw-1 ?:5000 2/2 4m ago 6m ceph-iscsigw0;ceph-iscsigw1 root@ceph-mgr0:~# GW: root@ceph-iscsigw0:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d677a8abd2d8 quay.io/ceph/ceph "/usr/bin/rbd-target…" 6 seconds ago Up 5 seconds ceph-ad0aede2-4100-11ee-bc14-1c40244f5c21-iscsi-gw-1-ceph-iscsigw0-fmuyhi ead503586cdd quay.io/ceph/ceph "/usr/bin/tcmu-runner" 6 seconds ago Up 5 seconds ceph-ad0aede2-4100-11ee-bc14-1c40244f5c21-iscsi-gw-1-ceph-iscsigw0-fmuyhi-tcmu 3ae0014bcc41 quay.io/ceph/ceph "/usr/bin/ceph-crash…" About an hour ago Up About an hour ceph-ad0aede2-4100-11ee-bc14-1c40244f5c21-crash-ceph-iscsigw0 1a7bc044ed8a quay.io/ceph/ceph "/usr/bin/ceph-expor…" About an hour ago Up About an hour ceph-ad0aede2-4100-11ee-bc14-1c40244f5c21-ceph-exporter-ceph-iscsigw0 c746a4da2bbb quay.io/prometheus/node-exporter:v1.5.0 "/bin/node_exporter …" About an hour ago Up About an hour ceph-ad0aede2-4100-11ee-bc14-1c40244f5c21-node-exporter-ceph-iscsigw0 root@ceph-iscsigw0:~# docker exec -it d677a8abd2d8 /bin/bash [root@ceph-iscsigw0 /]# gwcli ls o- / ......................................................................................................................... [...] o- cluster ......................................................................................................... [Clusters: 1] | o- ceph ............................................................................................................ [HEALTH_OK] | o- pools .......................................................................................................... [Pools: 5] | | o- .mgr .................................................................. [(x3), Commit: 0.00Y/33602764M (0%), Used: 3184K] | | o- 1T-r3-01 ................................................................ [(x3), Commit: 0.00Y/5793684M (0%), Used: 108K] | | o- ace0 ................................................................... [(2+1), Commit: 0.00Y/11587368M (0%), Used: 24K] | | o- ace1 ................................................................... [(2+1), Commit: 0.00Y/55665220M (0%), Used: 12K] | | o- x ....................................................................... [(x3), Commit: 0.00Y/33602764M (0%), Used: 36K] | o- topology ............................................................................................... [OSDs: 44,MONs: 5] o- disks ....................................................................................................... [0.00Y, Disks: 0] o- iscsi-targets ............................................................................... [DiscoveryAuth: None, Targets: 1] o- iqn.2001-07.com.ceph:1692892702115 ................................................................ [Auth: None, Gateways: 2] o- disks .......................................................................................................... [Disks: 0] o- gateways ............................................................................................ [Up: 2/2, Portals: 2] | o- ceph-iscsigw0 ............................................................................ [10.202.5.21,10.202.4.21 (UP)] | o- ceph-iscsigw1 ............................................................................ [10.202.3.21,10.202.2.21 (UP)] o- host-groups .................................................................................................. [Groups : 0] o- hosts ....................................................................................... [Auth: ACL_ENABLED, Hosts: 0] [root@ceph-iscsigw0 /]#

8 months, 4 weeks

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users August 2023