In my experiments with ceph so far, setting up a new cluster goes fairly well... so long as i only use a single network.
But when I try to use separate networks, things stop functioning in various ways.
(For example, I can "
SO I thought I'd ask for pointers to any multi-network setup guide.
My goal:
* have a 3+ node ceph cluster that each has local SSD storage only.
* have an RBD mapped on each node, which will then have a non-cephfs filesystem on it, shared out via NFS
* have each node share out NFS on one interface, but communicate to the cluster on a separate interface
In the old ways, it was theoretically straightforward, in that you could specify "public" interfaces vs other ones.
But in the new cephadm driven world, I havent found the magic that works.
For example, in my current iteration, I have successfully added all three hosts, and have 3 "mon"s...
but "ceph orch device ls --refresh"
only shows the dev from the node I'm running it on.
Im still trying to cycle through different bootstrap options. and Im experimenting with overriding naming in /etc/hosts, for things like which IP addresses get mapped to the real hostname, vs which get given "hostname-datainterface" type naming.
For example:
On the one hand, Im wondering if I need to name ALL IP addresses for a host, with the same hostname.
But on the other hand, my sysadmin instincts whisper to me that sounds like a terrible idea.
SO, tips from people who have done multi homing under octopus, would be appreciated.
Note that my initial proof-of-concept cluster is just 3 physical nodes, so everything needs to live on them.
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Hi,
I want to enable the firewall on my ceph nodes with ufw. Does anyone have
any experience with any performance regression in it?
Is there any solution for blocking exporter ports without a firewall in a
Ceph cluster like node exporter and ceph exporter?
Thanks.
Hi Folks
We've noticed that in a cluster of 21 nodes (5 mgrs&mons & 504 OSDs with 24
per node) that the mgr's are, after a non specific period of time, dropping
out of the cluster. The logs only show the following:
debug 2020-12-10T02:02:50.409+0000 7f1005840700 0 log_channel(cluster) log
[DBG] : pgmap v14163: 4129 pgs: 4129 active+clean; 10 GiB data, 31 TiB
used, 6.3 PiB / 6.3 PiB avail
debug 2020-12-10T03:20:59.223+0000 7f10624eb700 -1 monclient:
_check_auth_rotating possible clock skew, rotating keys expired way too
early (before 2020-12-10T02:20:59.226159+0000)
debug 2020-12-10T03:21:00.223+0000 7f10624eb700 -1 monclient:
_check_auth_rotating possible clock skew, rotating keys expired way too
early (before 2020-12-10T02:21:00.226310+0000)
The _check_auth_rotating repeats approximately every second. The instances
are all syncing their time with NTP and have no issues on that front. A
restart of the mgr fixes the issue.
It appears that this may be related to https://tracker.ceph.com/issues/39264.
The suggestion seems to be to disable prometheus metrics, however, this
obviously isn't realistic for a production environment where metrics are
critical for operations.
Please let us know what additional information we can provide to assist in
resolving this critical issue.
Cheers
Welby
I am sorry, but I am not sure how to do that? We have just started working with Ceph.
-----Original Message-----
From: Eugen Block <eblock(a)nde.ag>
Sent: 18. december 2020 12:06
To: Jens Hyllegaard (Soft Design A/S) <jens.hyllegaard(a)softdesign.dk>
Subject: Re: [ceph-users] Re: Setting up NFS with Octopus
Oh you're right, it worked for me, I just tried that with a new path and it was created for me.
Can you share the client keyrings? I have two nfs daemons running and they have these permissions:
client.nfs.ses7-nfs.host2
key: AQClNNJf5KHVERAAAzhpp9Mclh5wplrcE9VMkQ==
caps: [mon] allow r
caps: [osd] allow rw pool=nfs-test namespace=ganesha
client.nfs.ses7-nfs.host3
key: AQCqNNJf4rlqBhAARGTMkwXAldeprSYgmPEmJg==
caps: [mon] allow r
caps: [osd] allow rw pool=nfs-test namespace=ganesha
Zitat von "Jens Hyllegaard (Soft Design A/S)" <jens.hyllegaard(a)softdesign.dk>:
> On the Create NFS export page it says the directory will be created.
>
> Regards
>
> Jens
>
>
> -----Original Message-----
> From: Eugen Block <eblock(a)nde.ag>
> Sent: 18. december 2020 11:52
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] Re: Setting up NFS with Octopus
>
> Hi,
>
> is the path (/objstore) present within your CephFS? If not you need to
> mount the CephFS root first and create your directory to have NFS
> access it.
>
>
> Zitat von "Jens Hyllegaard (Soft Design A/S)"
> <jens.hyllegaard(a)softdesign.dk>:
>
>> Hi.
>>
>> We are completely new to Ceph, and are exploring using it as an NFS
>> server at first and expand from there.
>>
>> However we have not been successful in getting a working solution.
>>
>> I have set up a test environment with 3 physical servers, each with
>> one OSD using the guide at:
>> https://docs.ceph.com/en/latest/cephadm/install/
>>
>> I created a new replicated pool:
>> ceph osd pool create objpool replicated
>>
>> And then I deployed the gateway:
>> ceph orch apply nfs objstore objpool nfs-ns
>>
>> I then created a new CephFS volume:
>> ceph fs volume create objstore
>>
>> So far so good 😊
>>
>> My problem is when I try to create the NFS export The settings are as
>> follows:
>> Cluster: objstore
>> Daemons: nfs.objstore
>> Storage Backend: CephFS
>> CephFS User ID: admin
>> CephFS Name: objstore
>> CephFS Path: /objstore
>> NFS Protocol: NFSV3
>> Access Type: RW
>> Squash: all_squash
>> Transport protocol: both UDP & TCP
>> Client: Any client can access
>>
>> However when I click on Create NFS export, I get:
>> Failed to create NFS 'objstore:/objstore'
>>
>> error in mkdirs /objstore: Permission denied [Errno 13]
>>
>> Has anyone got an idea as to why this is not working?
>>
>> If you need any further information, do not hesitate to say so.
>>
>>
>> Best regards,
>>
>> Jens Hyllegaard
>> Senior consultant
>> Soft Design
>> Rosenkaeret 13 | DK-2860 Søborg | Denmark | +45 39 66 02 00 |
>> softdesign.dk<http://www.softdesign.dk/> | synchronicer.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
>> email to ceph-users-leave(a)ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
Is there a command to update a client with a new generated key?
Something like:
ceph auth new-key client.rbd
Could be usefull if you accidentaly did a ceph auth ls, because that
still displays keys ;)
Hello.
I'm trying to write some Python code for analysis of my RBD images storage
usage, rbd and rados package versions are 14.2.16.
Basically I want the same data that I can acquire from shell 'rbd du
<image>' and 'rbd info <image>' commands, but through Python API.
At the moment I can connect to the cluster, get an image list from the
pool, but that's about it.
As far as I can read, there's no API method to obtain image info such as
provisioned/used storage, etc.
Is this really not possible with the current Python API, or am I missing
something?
Documentation used:
https://docs.ceph.com/en/latest/rbd/api/librbdpy/https://docs.ceph.com/en/latest/rados/api/python/
Hi,
We have an Octopus installation using cephadm/containers in podman. We're
trying to upgrade from 15.2.7 to 15.2.8. The mgrs upgrade successfully, but
the we then get a failure that cephadm failed to pull an image:
[WRN] UPGRADE_FAILED_PULL: Upgrade: failed to pull target image
Failed to pull docker.io/ceph/ceph:v15.2.8 on ceph02:
Looking at this server, and trying to run the cephadm pull manually:
root@ceph02:~# cephadm pull
Using recent ceph image docker.io/ceph/ceph:v15.2.8
Pulling container image docker.io/ceph/ceph:v15.2.8...
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 6111, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 1381, in _infer_image
return func()
File "/usr/sbin/cephadm", line 2676, in command_pull
return command_inspect_image()
File "/usr/sbin/cephadm", line 1381, in _infer_image
return func()
File "/usr/sbin/cephadm", line 2716, in command_inspect_image
info_from = get_image_info_from_inspect(out.strip(), args.image)
File "/usr/sbin/cephadm", line 2727, in get_image_info_from_inspect
image_id, digests = out.split(',', 1)
ValueError: not enough values to unpack (expected 2, got 1)
root@ceph02:~# cephadm version
Using recent ceph image docker.io/ceph/ceph:v15.2.8
ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus
(stable)
root@ceph02:~# podman images
REPOSITORY TAG IMAGE ID
CREATED SIZE
docker.io/ceph/ceph v15.2.8 5553b0cb212c 3 days
ago 965 MB
docker.io/ceph/ceph v15.2.7 2bc420ddb175 2 weeks
ago 979 MB
docker.io/ceph/ceph v15 2bc420ddb175 2 weeks
ago 979 MB
docker.io/prom/node-exporter v0.18.1 e5a616e4b9cf 18
months ago 24.3 MB
root@ceph02:~#
We've upgraded this cluster multiple times prior without issue, so this
appear to be unique to the 15.2.7 -> 15.2.8 upgrade. If we pass --image
with docker.io/ceph/ceph:v15.2.8, the same behavior occurs. It seems like
cephadm in the 15.2.8 image isn't playing nicely with the pull command, for
some reason.
Interestingly, I did a fresh install of v15.2.8 on one server, and after
the installation, trying a 'cephadm pull' also fails on that 15.2.8 setup,
so it seems like the pull functionality may be broken on these images.
Hello,
I'm new to ceph and setting up my first cluster and playing around with
it. I followed the steps in cephadm guide (
https://docs.ceph.com/en/latest/cephadm/install/)
Here is my implementation of the instructions, from a Ubuntu Server 20.04.1
install:
sudo apt-get update && sudo apt-get dist-upgrade -y
sudo apt-get install curl git nano docker docker-compose docker.io attr ntp
bash-completion -y
sudo usermod -aG docker $USER
curl --silent --remote-name --location
https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
chmod +x cephadm
sudo ./cephadm add-repo --release octopus
sudo ./cephadm install
sudo mkdir -p /etc/ceph
sudo cephadm install ceph-common ceph
sudo cephadm bootstrap --mon-ip <IP>
ceph orch daemon add server-node1:/dev/sda
ceph orch daemon add server-node1:/dev/sdc
ceph orch daemon add server-node1:/dev/sdd
ceph orch daemon add server-node1:/dev/sde
ceph orch daemon add server-node1:/dev/sdf
Then I log into the ceph dashboard, change password, click around. Take a
break...
...HOURS Later...
My OSDs are down.
I notice this in ceph-volume.log:
[ceph_volume.main][INFO ] Running command: ceph-volume lvm deactivate 1
7a0fb8df-2d12-4e96-9def-5b2c195f6af4
So I run:
ceph-volume lvm activate --all
And my OSDs are back and 'cephadm ls' shows them as legacy. However,
running 'cephadm adopt --style legacy --name osd.0' causes the osd to go
down again.
What is going on?
PS: The only two other issues I in the logs see are
/usr/bin/docker:stderr Error: No such object: ceph-<ID>-osd.0
and
[ceph_volume.util.system][INFO ] /var/lib/ceph/osd/ceph-0 does not appear
to be a tmpfs mount
Jie
Hi,
I used radosgw-admin reshard process to process a manual bucket resharding
after it completes it logs an error below
ERROR: failed to process reshard logs, error=(2) No such file or directory
I've added a bucket to resharding queue with radosgw-admin reshard add
--bucket bucket-tmp --num-shards 2053
Is anything wrong with it?
Using nautilus 14.2.14.
Thanks.
Stumbling closer toward a usable production cluster with Ceph, but I have
yet another stupid n00b question I'm hoping you all will tolerate.
I have 38 OSDs up and in across 4 hosts. I (maybe prematurely) removed my
test filesystem as well as the metadata and data pools used by the deleted
filesystem.
This leaves me with 38 OSDs with a bunch of data on them.
Is there a simple way to just whack all of the data on all of those OSDs
before I create new pools and a new filesystem?
Version:
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus
(stable)
As you can see from the partial output of ceph -s, I left a bunch of crap
spread across the OSDs...
pools: 8 pools, 32 pgs
objects: 219 objects, 1.2 KiB
usage: 45 TiB used, 109 TiB / 154 TiB avail
pgs: 32 active+clean
Thanks in advance for a shove in the right direction.
-Dallas