December 2020 - ceph-users

Clearing contents of OSDs without removing them?

by Dallas Jones

Stumbling closer toward a usable production cluster with Ceph, but I have yet another stupid n00b question I'm hoping you all will tolerate. I have 38 OSDs up and in across 4 hosts. I (maybe prematurely) removed my test filesystem as well as the metadata and data pools used by the deleted filesystem. This leaves me with 38 OSDs with a bunch of data on them. Is there a simple way to just whack all of the data on all of those OSDs before I create new pools and a new filesystem? Version: ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) As you can see from the partial output of ceph -s, I left a bunch of crap spread across the OSDs... pools: 8 pools, 32 pgs objects: 219 objects, 1.2 KiB usage: 45 TiB used, 109 TiB / 154 TiB avail pgs: 32 active+clean Thanks in advance for a shove in the right direction. -Dallas

3 years, 4 months

3
2
0 0

Fwd: ceph-fuse false passed X_OK check

by Alex Taylor

cc to the list On Thu, Dec 17, 2020 at 11:39 AM Patrick Donnelly <pdonnell(a)redhat.com> wrote: > > On Wed, Dec 16, 2020 at 5:46 PM Alex Taylor <alexu4993(a)gmail.com> wrote: > > > > Hi Cephers, > > > > I'm using VSCode remote development with a docker server. It worked OK > > but fails to start the debugger after /root mounted by ceph-fuse. The > > log shows that the binary passes access X_OK check but cannot be > > actually executed. see: > > > > ``` > > strace_log: access("/root/.vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7", > > X_OK) = 0 > > > > root@develop:~# ls -alh > > .vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7 > > -rw-r--r-- 1 root root 978 Dec 10 13:06 > > .vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7 > > ``` > > > > I also test the access syscall on ext4, xfs and even cephfs kernel > > client, all of them return -EACCES, which is expected (the extension > > will then explicitly call chmod +x). > > > > After some digging in the code, I found it is probably caused by > > https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L5549-L5550. > > So here come two questions: > > 1. Is this a bug or is there any concern I missed? > > I tried reproducing it with the master branch and could not. It might > be due to an older fuse/ceph. I suggest you upgrade! > I tried the master(332a188d9b3c4eb5c5ad2720b7299913c5a772ee) as well and the issue still exists. My test program is: ``` #include <stdio.h> #include <unistd.h> int main() { int r; const char path[] = "test"; r = access(path, F_OK); printf("file exists: %d\n", r); r = access(path, X_OK); printf("file executable: %d\n", r); return 0; } ``` And the test result: ``` # local filesystem: ext4 root@f626800a6e85:~# ls -l test -rw-r--r-- 1 root root 6 Dec 19 06:13 test root@f626800a6e85:~# ./a.out file exists: 0 file executable: -1 root@f626800a6e85:~# findmnt -t fuse.ceph-fuse TARGET SOURCE FSTYPE OPTIONS /root/mnt ceph-fuse fuse.ceph-fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other root@f626800a6e85:~# cd mnt # ceph-fuse root@f626800a6e85:~/mnt# ls -l test -rw-r--r-- 1 root root 6 Dec 19 06:10 test root@f626800a6e85:~/mnt# ./a.out file exists: 0 file executable: 0 root@f626800a6e85:~/mnt# ./test bash: ./test: Permission denied ``` Again, ceph-fuse says file `test` is executable but in fact it can't be executed. The kernel version I'm testing on is: ``` root@f626800a6e85:~/mnt# uname -ar Linux f626800a6e85 4.9.0-7-amd64 #1 SMP Debian 4.9.110-1 (2018-07-05) x86_64 GNU/Linux ``` Please try the program above and make sure you're running it as root user, thank you. And if the reproduction still fails, please let me know the kernel version. > > 2. It works again with fuse_default_permissions=true, any drawbacks if > > this option is set? > > Correctness (ironically, for you) and performance. > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Principal Software Engineer > Red Hat Sunnyvale, CA > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >

3 years, 4 months

1
0
0 0

OSD spec db_devices: rotational: 0 not working in 15.2.7

by David Orman

Hi, We deployed a cluster with an OSD spec like this: service_type: osd service_id: osd_spec_test1 placement: host_pattern: '*' data_devices: rotational: 1 db_devices: rotational: 0 db_slots: 12 This worked on an earlier version of 15 (I think we deployed with 15.2.5). However, this same drive spec results in the following in ceph orch ls --export: --- service_type: osd service_id: osd_spec_test1 service_name: osd.osd_spec_test1 placement: host_pattern: '*' spec: data_devices: rotational: 1 db_slots: 12 filter_logic: AND objectstore: bluestore Now, if we change rotational to 1 for db_devices: service_type: osd service_id: osd_spec_test1 service_name: osd.osd_spec_test1 placement: host_pattern: '*' spec: data_devices: rotational: 1 db_devices: rotational: 1 db_slots: 12 filter_logic: AND objectstore: bluestore For some reason, db_devices is ignored if rotational: 0 (our intended use case). This definitely worked before, our entire cluster was provisioned this way. We ran into this when attempting to replace a broken OSD. Any ideas what might be going on?

3 years, 4 months

1
0
0 0

Change MON IP in containerized environment

by Eugen Block

Hi Cephers, there have been several threads in this mailing list about changing IP addresses (especially MON services) and the docs [1] are based on pre-container deployments. I tried to achieve the same for an Octopus cluster (with podman). The "right way" mentioned in the docs is quite easy so I focused on the "messy way". I wrote a blog post [2] about it, maybe it is helpful to some of you. Best regards, Eugen [1] https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a… [2] https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors…

3 years, 4 months

1
0
0 0

Setting up NFS with Octopus

by Jens Hyllegaard (Soft Design A/S)

Hi. We are completely new to Ceph, and are exploring using it as an NFS server at first and expand from there. However we have not been successful in getting a working solution. I have set up a test environment with 3 physical servers, each with one OSD using the guide at: https://docs.ceph.com/en/latest/cephadm/install/ I created a new replicated pool: ceph osd pool create objpool replicated And then I deployed the gateway: ceph orch apply nfs objstore objpool nfs-ns I then created a new CephFS volume: ceph fs volume create objstore So far so good 😊 My problem is when I try to create the NFS export The settings are as follows: Cluster: objstore Daemons: nfs.objstore Storage Backend: CephFS CephFS User ID: admin CephFS Name: objstore CephFS Path: /objstore NFS Protocol: NFSV3 Access Type: RW Squash: all_squash Transport protocol: both UDP & TCP Client: Any client can access However when I click on Create NFS export, I get: Failed to create NFS 'objstore:/objstore' error in mkdirs /objstore: Permission denied [Errno 13] Has anyone got an idea as to why this is not working? If you need any further information, do not hesitate to say so. Best regards, Jens Hyllegaard Senior consultant Soft Design Rosenkaeret 13 | DK-2860 Søborg | Denmark | +45 39 66 02 00 | softdesign.dk<http://www.softdesign.dk/> | synchronicer.com

3 years, 4 months

2
1
0 0

Erasure Space not showing on Octopus

by Jeremi Avenant

Good day I currently have a problem where my octopus cluster shows cephfs EC free space differently from my luminous cluster cephfs EC datapool. The only difference I notice is the get application per pool. Mounting the volume from a vm in production Luminous 12.2.13 EC3+1: 10.102.25.18:6789,10.102.25.19:6789,10.102.25.28:6789:/volumes/_nogroup/6f485332-da3d-4f6d-b5aa-a68e9566d1dc 2.8P 2.5P 327T 89% /ilifu 2.8P EC available space 2.5P EC used space 327T available on the EC cephfs_data ceph osd pool application get cephfs_data { "cephfs": {} } Mounting the volume from a vm in production Octopus 15.2.8 EC8+2: 10.102.36.3:6789,10.102.36.5:6789,10.102.36.7:6789:/volumes/_nogroup/<longno> 4.3P 20T 4.3P 1% /new 4.3P is the RAW space of the whole of the ceph cluster, I'm expecting 3.0P Usable Space as show on ceph df cephfs_data 14 16 422 B 3 422 B 0 3.0 PiB ceph osd pool application get cephfs_data { "cephfs": { "data": "cephfs" } } If anyone has an idea how to get the EC volume to show the EC space and not the raw space. Thank you -- *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist Inter-University Institute for Data Intensive Astronomy 5th Floor, Department of Physics and Astronomy, University of Cape Town Tel: 021 959 4137 <0219592327> Web: www.idia.ac.za <http://www.uwc.ac.za/> E-mail (IDIA): jeremi(a)idia.ac.za <mfundo(a)idia.ac.za> Rondebosch, Cape Town, 7600, South Africa

3 years, 4 months

1
0
0 0

v14.2.15 Nautilus released

by David Galloway

This is the 15th backport release in the Nautilus series. This release fixes a ceph-volume regression introduced in v14.2.13 and includes few other fixes. We recommend users to update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/releases/v14-2-15-nautilus-released Notable Changes --------------- * ceph-volume: Fixes lvm batch --auto, which breaks backward compatibility when using non rotational devices only (SSD and/or NVMe). * BlueStore: Fixes a bug in collection_list_legacy which makes pgs inconsistent during scrub when running mixed versions of osds, prior to 14.2.12 with newer. * MGR: progress module can now be turned on/off, using the commands: `ceph progress on` and `ceph progress off`. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-14.2.15.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: afdd217ae5fb1ed3f60e16bd62357ca58cc650e5

3 years, 4 months

4
4
0 0

Bucket operations an issue with C# AWSSDK.S3 client

by Szabo, Istvan (Agoda)

Dear Community, We are having issues with bucket operations with ceph octopus 15.2.7. The client library used is : AWSSDK.S3 Version 3.5.6.6 Also used an old version of client: AWSSDK Version 2.3.55.2 Used it in both .net core and simple .net projects but the same result # Note : All methods mentioned below work perfectly fine with nodejs client Node js Client: https://www.npmjs.com/package/ceph Nodejs client doc: https://github.com/YounGoat/nodejs.osapi/blob/2f9d82092589bb50e452c57131499… # ALSO Note : All methods mentioned below work alright with another old version of ceph (luminous 12.2.8) with the same c# client Here are the details: The client object below is new AmazonS3Client( "<Our_Access_Key>", "<Our_Secret_Key>", new AmazonS3Config {ServiceURL = "<Our_Service_Url>" } ); 1. List Buckets. Result: OK Library method used : client.ListBucketsAsync(); [cid:image001.png@01D6D465.D7957510] 1. Create a New Bucket Result: Error 405 MethodNotAllowed (You might think it's a permission issue but it's not, because the same creds work in nodejs) Library method used : var request = new PutBucketRequest {BucketName = "seo"}; // seo is an existing bucket created by command line client.PutBucketAsync(request); Result error snapshot: [cid:image002.png@01D6D465.D7957510] 1. Create Object in an existing bucket Result: Error 501 Not Implemented Library method used : var request = new PutObjectRequest { BucketName = "seo", Key = "test2", ContentType = "text/plain", ContentBody = value, }; client.PutObjectAsync(request); We tried adding headers for content length as well but does not work as well Result error snapshot: [cid:image003.png@01D6D465.D7957510] 1. Read object existing in a bucket Existing bucket name: seo Existing object name(key): test Result: No Such Bucket Library code method: var request = new GetObjectRequest {BucketName = "seo", Key = "test"}; client.GetObjectAsync(request); Result error snapshot: [cid:image004.png@01D6D465.D7957510] Thank you in advance your help. ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 4 months

1
1
0 0

changing OSD IP addresses in octopus/docker environment

by Philip Brown

I was wondering how to change the IPs used for the OSD servers, in my new Octopus based environment, which uses all those docker/podman images by default. imiting date range to within a year, doesnt seem to hit anything. unlimited google search pulled up http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020503.h… but that references editing /etc/ceph/ceph.conf and changing the [osd.x] sections. which dont exist with Octopus, as far as I've found so far. It doesnt exist in the top level host's /etc/ceph/ceph.conf nor does it exist in the container's conf file, as viewed via "cephadm shell" So, what are the options here? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 4 months

2
1
0 0

MDS Corruption: ceph_assert(!p) in MDCache::add_inode

by Brandon Lyon

This is attempt #3 to submit this issue to this mailing list. I don't expect this to be received. I give up. I have an issue with MDS corruption which so far I haven't been able to resolve using the recovery steps I've found online. I'm on v15.2.6. I've tried all the recovery steps mentioned here, except copying the pool: https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ When I try to start an MDS instance, it crashes after a few seconds. It logs a bunch of "bad backtrace on directory inode" errors before failing on an assertion in MDCache::add_inode, line 313: https://github.com/ceph/ceph/blob/cb8c61a60551b72614257d632a574d420064c17a/… Here's the output of journalctl -xe: https://pastebin.com/9g1UJaKQ I asked in the IRC channel, and it was suggested I might be able to manually delete the duplicate inodes using the RADOS API, though I don't know specifically how I would do that. I have also cloned the code and built Ceph with the problem assertion replaced with a return, but I haven't tried using it yet and I'm saving that as my last resort. I'd appreciate any help you all can give. Thank you, - Brandon Lyon

3 years, 4 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users December 2020