Stumbling closer toward a usable production cluster with Ceph, but I have
yet another stupid n00b question I'm hoping you all will tolerate.
I have 38 OSDs up and in across 4 hosts. I (maybe prematurely) removed my
test filesystem as well as the metadata and data pools used by the deleted
filesystem.
This leaves me with 38 OSDs with a bunch of data on them.
Is there a simple way to just whack all of the data on all of those OSDs
before I create new pools and a new filesystem?
Version:
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus
(stable)
As you can see from the partial output of ceph -s, I left a bunch of crap
spread across the OSDs...
pools: 8 pools, 32 pgs
objects: 219 objects, 1.2 KiB
usage: 45 TiB used, 109 TiB / 154 TiB avail
pgs: 32 active+clean
Thanks in advance for a shove in the right direction.
-Dallas
cc to the list
On Thu, Dec 17, 2020 at 11:39 AM Patrick Donnelly <pdonnell(a)redhat.com> wrote:
>
> On Wed, Dec 16, 2020 at 5:46 PM Alex Taylor <alexu4993(a)gmail.com> wrote:
> >
> > Hi Cephers,
> >
> > I'm using VSCode remote development with a docker server. It worked OK
> > but fails to start the debugger after /root mounted by ceph-fuse. The
> > log shows that the binary passes access X_OK check but cannot be
> > actually executed. see:
> >
> > ```
> > strace_log: access("/root/.vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7",
> > X_OK) = 0
> >
> > root@develop:~# ls -alh
> > .vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7
> > -rw-r--r-- 1 root root 978 Dec 10 13:06
> > .vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7
> > ```
> >
> > I also test the access syscall on ext4, xfs and even cephfs kernel
> > client, all of them return -EACCES, which is expected (the extension
> > will then explicitly call chmod +x).
> >
> > After some digging in the code, I found it is probably caused by
> > https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L5549-L5550.
> > So here come two questions:
> > 1. Is this a bug or is there any concern I missed?
>
> I tried reproducing it with the master branch and could not. It might
> be due to an older fuse/ceph. I suggest you upgrade!
>
I tried the master(332a188d9b3c4eb5c5ad2720b7299913c5a772ee) as well
and the issue still exists. My test program is:
```
#include <stdio.h>
#include <unistd.h>
int main() {
int r;
const char path[] = "test";
r = access(path, F_OK);
printf("file exists: %d\n", r);
r = access(path, X_OK);
printf("file executable: %d\n", r);
return 0;
}
```
And the test result:
```
# local filesystem: ext4
root@f626800a6e85:~# ls -l test
-rw-r--r-- 1 root root 6 Dec 19 06:13 test
root@f626800a6e85:~# ./a.out
file exists: 0
file executable: -1
root@f626800a6e85:~# findmnt -t fuse.ceph-fuse
TARGET SOURCE FSTYPE OPTIONS
/root/mnt ceph-fuse fuse.ceph-fuse
rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other
root@f626800a6e85:~# cd mnt
# ceph-fuse
root@f626800a6e85:~/mnt# ls -l test
-rw-r--r-- 1 root root 6 Dec 19 06:10 test
root@f626800a6e85:~/mnt# ./a.out
file exists: 0
file executable: 0
root@f626800a6e85:~/mnt# ./test
bash: ./test: Permission denied
```
Again, ceph-fuse says file `test` is executable but in fact it can't
be executed.
The kernel version I'm testing on is:
```
root@f626800a6e85:~/mnt# uname -ar
Linux f626800a6e85 4.9.0-7-amd64 #1 SMP Debian 4.9.110-1 (2018-07-05)
x86_64 GNU/Linux
```
Please try the program above and make sure you're running it as root
user, thank you. And if the reproduction still fails, please let me
know the kernel version.
> > 2. It works again with fuse_default_permissions=true, any drawbacks if
> > this option is set?
>
> Correctness (ironically, for you) and performance.
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
Hi,
We deployed a cluster with an OSD spec like this:
service_type: osd
service_id: osd_spec_test1
placement:
host_pattern: '*'
data_devices:
rotational: 1
db_devices:
rotational: 0
db_slots: 12
This worked on an earlier version of 15 (I think we deployed with
15.2.5). However, this same drive spec results in the following in ceph
orch ls --export:
---
service_type: osd
service_id: osd_spec_test1
service_name: osd.osd_spec_test1
placement:
host_pattern: '*'
spec:
data_devices:
rotational: 1
db_slots: 12
filter_logic: AND
objectstore: bluestore
Now, if we change rotational to 1 for db_devices:
service_type: osd
service_id: osd_spec_test1
service_name: osd.osd_spec_test1
placement:
host_pattern: '*'
spec:
data_devices:
rotational: 1
db_devices:
rotational: 1
db_slots: 12
filter_logic: AND
objectstore: bluestore
For some reason, db_devices is ignored if rotational: 0 (our intended use
case). This definitely worked before, our entire cluster was provisioned
this way. We ran into this when attempting to replace a broken OSD.
Any ideas what might be going on?
Hi.
We are completely new to Ceph, and are exploring using it as an NFS server at first and expand from there.
However we have not been successful in getting a working solution.
I have set up a test environment with 3 physical servers, each with one OSD using the guide at: https://docs.ceph.com/en/latest/cephadm/install/
I created a new replicated pool:
ceph osd pool create objpool replicated
And then I deployed the gateway:
ceph orch apply nfs objstore objpool nfs-ns
I then created a new CephFS volume:
ceph fs volume create objstore
So far so good 😊
My problem is when I try to create the NFS export
The settings are as follows:
Cluster: objstore
Daemons: nfs.objstore
Storage Backend: CephFS
CephFS User ID: admin
CephFS Name: objstore
CephFS Path: /objstore
NFS Protocol: NFSV3
Access Type: RW
Squash: all_squash
Transport protocol: both UDP & TCP
Client: Any client can access
However when I click on Create NFS export, I get:
Failed to create NFS 'objstore:/objstore'
error in mkdirs /objstore: Permission denied [Errno 13]
Has anyone got an idea as to why this is not working?
If you need any further information, do not hesitate to say so.
Best regards,
Jens Hyllegaard
Senior consultant
Soft Design
Rosenkaeret 13 | DK-2860 Søborg | Denmark | +45 39 66 02 00 | softdesign.dk<http://www.softdesign.dk/> | synchronicer.com
Good day
I currently have a problem where my octopus cluster shows cephfs EC free
space differently from my luminous cluster cephfs EC datapool. The only
difference I notice is the get application per pool.
Mounting the volume from a vm in production Luminous 12.2.13 EC3+1:
10.102.25.18:6789,10.102.25.19:6789,10.102.25.28:6789:/volumes/_nogroup/6f485332-da3d-4f6d-b5aa-a68e9566d1dc
2.8P 2.5P 327T 89% /ilifu
2.8P EC available space
2.5P EC used space
327T available on the EC cephfs_data
ceph osd pool application get cephfs_data
{
"cephfs": {}
}
Mounting the volume from a vm in production Octopus 15.2.8 EC8+2:
10.102.36.3:6789,10.102.36.5:6789,10.102.36.7:6789:/volumes/_nogroup/<longno>
4.3P 20T 4.3P 1% /new
4.3P is the RAW space of the whole of the ceph cluster, I'm expecting 3.0P
Usable Space as show on ceph df
cephfs_data 14 16 422 B 3 422 B 0 3.0 PiB
ceph osd pool application get cephfs_data
{
"cephfs": {
"data": "cephfs"
}
}
If anyone has an idea how to get the EC volume to show the EC space and not
the raw space. Thank you
--
*Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
Inter-University Institute for Data Intensive Astronomy
5th Floor, Department of Physics and Astronomy,
University of Cape Town
Tel: 021 959 4137 <0219592327>
Web: www.idia.ac.za <http://www.uwc.ac.za/>
E-mail (IDIA): jeremi(a)idia.ac.za <mfundo(a)idia.ac.za>
Rondebosch, Cape Town, 7600, South Africa
This is the 15th backport release in the Nautilus series. This release
fixes a ceph-volume regression introduced in v14.2.13 and includes few
other fixes. We recommend users to update to this release.
For a detailed release notes with links & changelog please refer to the
official blog entry at https://ceph.io/releases/v14-2-15-nautilus-released
Notable Changes
---------------
* ceph-volume: Fixes lvm batch --auto, which breaks backward
compatibility when using non rotational devices only (SSD and/or NVMe).
* BlueStore: Fixes a bug in collection_list_legacy which makes pgs
inconsistent during scrub when running mixed versions of osds, prior to
14.2.12 with newer.
* MGR: progress module can now be turned on/off, using the commands:
`ceph progress on` and `ceph progress off`.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.15.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: afdd217ae5fb1ed3f60e16bd62357ca58cc650e5
Dear Community,
We are having issues with bucket operations with ceph octopus 15.2.7.
The client library used is : AWSSDK.S3 Version 3.5.6.6
Also used an old version of client: AWSSDK Version 2.3.55.2
Used it in both .net core and simple .net projects but the same result
# Note : All methods mentioned below work perfectly fine with nodejs client
Node js Client: https://www.npmjs.com/package/ceph
Nodejs client doc: https://github.com/YounGoat/nodejs.osapi/blob/2f9d82092589bb50e452c57131499…
# ALSO Note : All methods mentioned below work alright with another old version of ceph (luminous 12.2.8) with the same c# client
Here are the details:
The client object below is
new AmazonS3Client(
"<Our_Access_Key>",
"<Our_Secret_Key>",
new AmazonS3Config {ServiceURL = "<Our_Service_Url>" }
);
1. List Buckets.
Result: OK
Library method used : client.ListBucketsAsync();
[cid:image001.png@01D6D465.D7957510]
1. Create a New Bucket
Result: Error 405 MethodNotAllowed (You might think it's a permission issue but it's not, because the same creds work in nodejs)
Library method used :
var request = new PutBucketRequest {BucketName = "seo"}; // seo is an existing bucket created by command line
client.PutBucketAsync(request);
Result error snapshot:
[cid:image002.png@01D6D465.D7957510]
1. Create Object in an existing bucket
Result: Error 501 Not Implemented
Library method used :
var request = new PutObjectRequest
{
BucketName = "seo",
Key = "test2",
ContentType = "text/plain",
ContentBody = value,
};
client.PutObjectAsync(request);
We tried adding headers for content length as well but does not work as well
Result error snapshot:
[cid:image003.png@01D6D465.D7957510]
1. Read object existing in a bucket
Existing bucket name: seo
Existing object name(key): test
Result: No Such Bucket
Library code method:
var request = new GetObjectRequest {BucketName = "seo", Key = "test"};
client.GetObjectAsync(request);
Result error snapshot:
[cid:image004.png@01D6D465.D7957510]
Thank you in advance your help.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
I was wondering how to change the IPs used for the OSD servers, in my new Octopus based environment, which uses all those docker/podman images by default.
imiting date range to within a year, doesnt seem to hit anything.
unlimited google search pulled up
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020503.h…
but that references editing /etc/ceph/ceph.conf and changing the [osd.x] sections.
which dont exist with Octopus, as far as I've found so far.
It doesnt exist in the top level host's /etc/ceph/ceph.conf
nor does it exist in the container's conf file, as viewed via "cephadm shell"
So, what are the options here?
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
This is attempt #3 to submit this issue to this mailing list. I don't
expect this to be received. I give up.
I have an issue with MDS corruption which so far I haven't been able to
resolve using the recovery steps I've found online. I'm on v15.2.6. I've
tried all the recovery steps mentioned here, except copying the pool:
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/
When I try to start an MDS instance, it crashes after a few seconds. It
logs a bunch of "bad backtrace on directory inode" errors before failing on
an assertion in MDCache::add_inode, line 313:
https://github.com/ceph/ceph/blob/cb8c61a60551b72614257d632a574d420064c17a/…
Here's the output of journalctl -xe: https://pastebin.com/9g1UJaKQ
I asked in the IRC channel, and it was suggested I might be able to
manually delete the duplicate inodes using the RADOS API, though I don't
know specifically how I would do that. I have also cloned the code and
built Ceph with the problem assertion replaced with a return, but I haven't
tried using it yet and I'm saving that as my last resort. I'd appreciate
any help you all can give.
Thank you,
- Brandon Lyon