I just went to setup an iscsi gateway on a Debian Buster / Octopus
cluster and hit a brick wall with packages. I had perhaps naively
assumed they were in with the rest. Now I understand that it can exist
separately, but then so can RGW.
I found some ceph-iscsi rpm builds for Centos, but nothing for Debian.
Are they around somewhere? The prerequisite packages
python-rtslib-2.1.fb68 and tcmu-runner-1.4.0 also don't seem to be
readily available for Debian.
Has anyone done this for Debian?
Thanks, Chris
Dear All,
Hope you all had a great Christmas and much needed time off with family!
Have any of you used "*device management and failure prediction"* in
Nautilus? If yes, what is your feedback? Do you use LOCAL or CLOUD
prediction models?
https://ceph.io/update/new-in-nautilus-device-management-and-failure-predic…
Your feedback and input is valuable.
--
Regards,
Suresh
All;
I turned on device health metrics in one of our Nautilus clusters. Unfortunately, it doesn't seem to be collecting any information.
When I do "ceph device get-health-metrics <device>, I get the following;
{
"20200821-223626": {
"dev": "/dev/sdc",
"error": "smartctl failed",
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 1",
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "samsung_ssd_860_evo_4tb",
"smartctl_error_code": -22,
"smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n"
}
}
The cluster is Nautilus 14.2.16 (updated from 14.2.11 just after turning on health metrics). Smartctl is release 7.0 dated 2018-12-30 at 14:47:55 UTC.
Thoughts?
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com
Hi,
I have case where storage backend is about to change from OneFS to Ceph. Both are mainly for windows clients used as object storage.
Both ends have SAMBA configured as a GW with integration to AD. Both of the backends work and can be accessed and utilized but the problem is the data-transfer in between.
So far I have tested only robocopy but that would be a preferred tool since xfer is done via Windows machine mounting both source and target.
The copying it self works, no problem but copying the ACL for the files and folders doesn’t. If I provide “robocopy source target /MIR /SEC” I get access denied, only target top folder is created and it has wrong previleges.
The CEPHFS should be correctly mounted via SAMBA since I can create and copy files and folders when they are created by, in this case, the administrator. And I can change the permission after wards but this is not the way the copying is to be done. There is a lot of data and permissions are crucial to preserve.
I’ve tried many ways to save the permissions, among robocopy I’ve tried saving the permissions with "icacls save/restore” and with powershells command also. None of those seem to work.
Here’s a conclusion of what works:
creating new folder/files with administrator to source and robocopying them to target with /MIR and /SEC.
copying without /SEC.
Here’s also what doesn’t work or help:
Taking over permission for Administrator.
Remove non-existent users from source folder/files.
Removing in-heritage for the folder in source.
Changing owner to administrator.
The CEPH is Nautilus.
Any ideas or suggestiong for the topic?
Best regards and happy new year!
-Oskari
Hi,
All my OSD nodes in the SSD tier are getting heartbeat_map timed out
randomly and I don't find why!
7ff2ed3f2700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
0x7ff2c8943700' had timed out after 15
It occurs many times in a day and causes my cluster to be down.
Is there any way to find why the OSDs get time out? I don't think it's
because of heartbeat and there is an issue with OSD that came to the
heartbeat to be timeout because ODSs don't suicide and OSDs get too slow
and cause downtime on RBD and S3 gateway because the queue is full!
Thanks.
Hi List,
In order te reproduce an issue we see on a production cluster (cephFS
client: ceph-fuse outperform kernel client by a factor of 5) we would
like to have a test cluster to have the same cephfs "flags" as
production. However, it's not completely clear how certain features
influence the cephfs flags. What I could find in the source code,
cephfs_features.h, is that it *seems* to correspond to the Ceph release.
For example CEPHFS_FEATURE_NAUTILUS gets a "12" as feature bit. An
upgraded (Luminous -> Mimic -> Nautilus) cephfs gives us the following
cephfs flags: "1c".
A (newly installed) Nautilus cluster gives "10" when new snapshots are
not allowed (ceph fs set cephfs allow_new_snaps false) and "12" when new
snapshots are allowed (ceph fs set cephfs allow_new_snaps true).
We would like to have the test cluster get the "1c" flags and see if we
can reproduce the issue. How can we achieve that?
Any info on how those cephfs flags are constructed is welcome.
Thanks,
Gr. Stefan
More banging on my prototype cluster, and ran into an odd problem.
Used to be, when I create an rbd device, then try to map it, it would initially fail, saying I have to disable some features.
Then I just run the suggested disable line -- usually
rbd feature disable poolname/rbdname object-map fast-diff deep-flatten
and then I can map it fine.
but now after the latest cluster recreation, when I try to map, I just get
# rbd map testpool/zfs02
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (110) Connection timed out
and no errors in dmesg output
if I try to disable those features anyway, I get
librbd::Operations: one or more requested features are already disabled(22) Invalid argument
nothing in /var/log/ceph/cephadm.log either
Any suggestions?
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Hello,
TL;DR How can I recreate the device_health_metrics pool?
I'm experimenting with Ceph Octopus v15.2.8 in a 3 node cluster under
Proxmox 6.3. After initializing CEPH the usual way, a
"device_health_metrics" pool is created as soon as I create the first
manager. That pool has just 1 PG but no OSD assigned, as OSD have not
been created yet. After creating a few OSD and waiting for a couple
days, that PG is still in "stale+undersized+peered" state.
So I thought I could just disable monitoring (ceph device monitoring
off), delete that pool and create it again with something like:
ceph osd pool create device_health_metrics 1 --autoscale-mode=off
The issue is that after recreating the pool and enabling monitoring
(ceph device monitoring on), I get no data stored in it regardind my
devices, even after running a manual scraping with ceph device
scrape-health-metrics.
Thank you in advance.
Victor