December 2020 - ceph-users

by Chris Palmer

I just went to setup an iscsi gateway on a Debian Buster / Octopus cluster and hit a brick wall with packages. I had perhaps naively assumed they were in with the rest. Now I understand that it can exist separately, but then so can RGW. I found some ceph-iscsi rpm builds for Centos, but nothing for Debian. Are they around somewhere? The prerequisite packages python-rtslib-2.1.fb68 and tcmu-runner-1.4.0 also don't seem to be readily available for Debian. Has anyone done this for Debian? Thanks, Chris

3 years, 3 months

4
5
0 0

device management and failure prediction

by Suresh Rama

Dear All, Hope you all had a great Christmas and much needed time off with family! Have any of you used "*device management and failure prediction"* in Nautilus? If yes, what is your feedback? Do you use LOCAL or CLOUD prediction models? https://ceph.io/update/new-in-nautilus-device-management-and-failure-predic… Your feedback and input is valuable. -- Regards, Suresh

3 years, 3 months

2
1
0 0

Nautilus Health Metrics

by DHilsbos＠performair.com

All; I turned on device health metrics in one of our Nautilus clusters. Unfortunately, it doesn't seem to be collecting any information. When I do "ceph device get-health-metrics <device>, I get the following; { "20200821-223626": { "dev": "/dev/sdc", "error": "smartctl failed", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 1", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "samsung_ssd_860_evo_4tb", "smartctl_error_code": -22, "smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" } } The cluster is Nautilus 14.2.16 (updated from 14.2.11 just after turning on health metrics). Smartctl is release 7.0 dated 2018-12-30 at 14:47:55 UTC. Thoughts? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. DHilsbos(a)PerformAir.com www.PerformAir.com

3 years, 3 months

1
0
0 0

Copying data from OneFS source to CEPHFS both shared via SAMBA

by Oskari Koivisto

Hi, I have case where storage backend is about to change from OneFS to Ceph. Both are mainly for windows clients used as object storage. Both ends have SAMBA configured as a GW with integration to AD. Both of the backends work and can be accessed and utilized but the problem is the data-transfer in between. So far I have tested only robocopy but that would be a preferred tool since xfer is done via Windows machine mounting both source and target. The copying it self works, no problem but copying the ACL for the files and folders doesn’t. If I provide “robocopy source target /MIR /SEC” I get access denied, only target top folder is created and it has wrong previleges. The CEPHFS should be correctly mounted via SAMBA since I can create and copy files and folders when they are created by, in this case, the administrator. And I can change the permission after wards but this is not the way the copying is to be done. There is a lot of data and permissions are crucial to preserve. I’ve tried many ways to save the permissions, among robocopy I’ve tried saving the permissions with "icacls save/restore” and with powershells command also. None of those seem to work. Here’s a conclusion of what works: creating new folder/files with administrator to source and robocopying them to target with /MIR and /SEC. copying without /SEC. Here’s also what doesn’t work or help: Taking over permission for Administrator. Remove non-existent users from source folder/files. Removing in-heritage for the folder in source. Changing owner to administrator. The CEPH is Nautilus. Any ideas or suggestiong for the topic? Best regards and happy new year! -Oskari

3 years, 4 months

1
0
0 0

To be able to post to Ceph-users mailing list

by EKTA SINGHAI

email id: esinghai(a)es.iitr.ac.in

3 years, 4 months

1
0
0 0

Random heartbeat_map timed out

by Seena Fallah

Hi, All my OSD nodes in the SSD tier are getting heartbeat_map timed out randomly and I don't find why! 7ff2ed3f2700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7ff2c8943700' had timed out after 15 It occurs many times in a day and causes my cluster to be down. Is there any way to find why the OSDs get time out? I don't think it's because of heartbeat and there is an issue with OSD that came to the heartbeat to be timeout because ODSs don't suicide and OSDs get too slow and cause downtime on RBD and S3 gateway because the queue is full! Thanks.

3 years, 4 months

2
3
0 0

post

by mariyam sofi

3 years, 4 months

1
0
0 0

cephfs flags question

by Stefan Kooman

Hi List, In order te reproduce an issue we see on a production cluster (cephFS client: ceph-fuse outperform kernel client by a factor of 5) we would like to have a test cluster to have the same cephfs "flags" as production. However, it's not completely clear how certain features influence the cephfs flags. What I could find in the source code, cephfs_features.h, is that it *seems* to correspond to the Ceph release. For example CEPHFS_FEATURE_NAUTILUS gets a "12" as feature bit. An upgraded (Luminous -> Mimic -> Nautilus) cephfs gives us the following cephfs flags: "1c". A (newly installed) Nautilus cluster gives "10" when new snapshots are not allowed (ceph fs set cephfs allow_new_snaps false) and "12" when new snapshots are allowed (ceph fs set cephfs allow_new_snaps true). We would like to have the test cluster get the "1c" flags and see if we can reproduce the issue. How can we achieve that? Any info on how those cephfs flags are constructed is welcome. Thanks, Gr. Stefan

3 years, 4 months

2
8
0 0

after octopus cluster reinstall, rbd map fails with timeout

by Philip Brown

More banging on my prototype cluster, and ran into an odd problem. Used to be, when I create an rbd device, then try to map it, it would initially fail, saying I have to disable some features. Then I just run the suggested disable line -- usually rbd feature disable poolname/rbdname object-map fast-diff deep-flatten and then I can map it fine. but now after the latest cluster recreation, when I try to map, I just get # rbd map testpool/zfs02 rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (110) Connection timed out and no errors in dmesg output if I try to disable those features anyway, I get librbd::Operations: one or more requested features are already disabled(22) Invalid argument nothing in /var/log/ceph/cephadm.log either Any suggestions? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 4 months

1
1
0 0

Recreate pool device_health_metrics

by Victor Rodriguez

Hello, TL;DR How can I recreate the device_health_metrics pool? I'm experimenting with Ceph Octopus v15.2.8 in a 3 node cluster under Proxmox 6.3. After initializing CEPH the usual way, a "device_health_metrics" pool is created as soon as I create the first manager. That pool has just 1 PG but no OSD assigned, as OSD have not been created yet. After creating a few OSD and waiting for a couple days, that PG is still in "stale+undersized+peered" state. So I thought I could just disable monitoring (ceph device monitoring off), delete that pool and create it again with something like: ceph osd pool create device_health_metrics 1 --autoscale-mode=off The issue is that after recreating the pool and enabling monitoring (ceph device monitoring on), I get no data stored in it regardind my devices, even after running a manual scraping with ceph device scrape-health-metrics. Thank you in advance. Victor

3 years, 4 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users December 2020