Hi Oliver
Review this "step by step" guide to see if you forgot something:
BR
NFS:
1.
chmod +x cephadm
2.
./cephadm bootstrap
-
Record dashboard user & password printed out at the end
3.
ADD OTHER HOSTS (assuming 3+ total after adding)
4.
./cephadm shell
5.
ceph orch apply osd --all-available-devices
6.
ceph fs volume create test 1
7.
ceph orch apply mds test 3
8.
ceph nfs cluster create cephfs testnfs
9.
ceph nfs cluster info testnfs
-
(verify hostname, ip and port are listed)
-
Record ip and port for later
10.
ceph nfs export create cephfs test testnfs /cephfs
11.
ceph auth ls
-
(check “client.testnfs1” keyring is present)
12.
ceph nfs export get testnfs /cephfs
-
(should have output)
13.
rados -p nfs-ganesha -N testnfs get export-1 - testnfs/cephnfs
-
(check that export was successfully created)
14.
ceph nfs export ls testnfs
-
(should show pseudo path “/cephfs”)
15.
Verify nfs export exists on dashboard
-
Login to dashboard with credentials from bootstrap
-
URL will be https://{host-ip}:8443/
-
Navigate to NFS page
-
Table should contain the export you just created
16.
Exit shell
-
Command should just be “exit”
17.
systemctl status nfs-server
-
If service is listed as inactive, run “systemctl start nfs-server”
-
Run “systemctl status nfs-server”. Should now be active
18.
sudo mount -t nfs -o port={nfs-port} {nfs-ip}:/cephfs /mnt
-
Port and ip should be from “ceph nfs cluster info testnfs” command
ran earlier
Ex:
mount -t nfs -o port=2049 10.8.128.94:/cephfs /mnt/cephfs/
Then give mount command -> to check if its mounted
#mount
Output:
10.8.128.94:/cephfs on /mnt/cephfs type nfs4
(rw,relatime,seclabel,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.8.128.94,local_lock=none,addr=10.8.128.94)
--
Juan Miguel Olmo Martínez
Senior Software Engineer
Red Hat <https://www.redhat.com/>
jolmomar(a)redhat.com
<https://www.redhat.com/>
Dear Community,
Since Nautilus, we have 2 mechanisms for notifying 3rd parties on changes
in buckets and objects: "bucket notifications" [1] and "pubsub" [2].
In "bucket notifications" (="push mode") the events are sent from the RGW
to an external entity (kafka, rabbitmq etc.), while in "pubsub" (="pull
mode") the events are synched with a special zone, where they are stored
and could be later fetched by an external app.
From communications that I've seen so far, users preferred to use "bucket
notifications" over "pubsub". Since supporting both modes has maintenance
overhead, I was considering deprecating "pubsub".
However, before doing that I would like to see what the community has to
say!
So, if you are currently using pubsub, or plan to use it, as "pull mode"
fits your usecase better than "push mode" please chime in.
Yuval
[1] https://docs.ceph.com/en/latest/radosgw/notifications/
[2] https://docs.ceph.com/en/latest/radosgw/pubsub-module/
Hi,
I setup a small 3 node cluster as a POC. I bootstrapped the cluster with
separate networks for frontend (public network 192.168.30.0/24) and
backend (cluster network 192.168.41.0/24).
1st small question:
After the bootstrap, I recognized that I had mixed up cluster and public
network. :( Is there a way to fix this on a running cluster? Last resort
I would rebuild the cluster. Never the less I can't mount cephfs on a
Linux client using any of the two networks. My linux client is CentOS 7
(latest updates) and has 3 nics, two of them in one of the public and
cluster networks.
I bootstrapped the cluster using the following conf file to have two
networks:
/root/ceph.conf:
[global]
public network = 192.168.41.0/24
cluster network = 192.168.30.0/24
cephadm bootstrap -c /root/ceph.conf --mon-ip 192.168.30.11
I have 2 mons, one running on the bootstrap host (192.168.30.11 /
192.168.41.11) and one (gedaopl01 192.168.30.12/ 192.168.41.12) running
on one of the 3 osds:
[root@gedasvl02 ~]# ceph -s
cluster:
id: dad3c9fa-1ec7-11eb-94d6-005056b703af
health: HEALTH_OK
services:
mon: 2 daemons, quorum gedasvl02,gedaopl01 (age 5h)
mgr: gedasvl02.cspuee(active, since 12h), standbys: gedaopl01.llogef
mds: cephfs:1 {0=cephfs.gedaopl03.prrkll=up:active} 1 up:standby
osd: 3 osds: 3 up (since 11h), 3 in (since 11h)
task status:
scrub status:
mds.cephfs.gedaopl03.prrkll: idle
data:
pools: 3 pools, 81 pgs
objects: 29 objects, 2.2 KiB
usage: 450 GiB used, 407 GiB / 857 GiB avail
pgs: 81 active+clean
[root@gedasvl02 ~]# ceph osd metadata 2 | grep addr
"back_addr":
"[v2:192.168.30.12:6800/3112350288,v1:192.168.30.12:6801/3112350288]",
"front_addr":
"[v2:192.168.41.12:6800/3112350288,v1:192.168.41.12:6801/3112350288]",
"hb_back_addr":
"[v2:192.168.30.12:6802/3112350288,v1:192.168.30.12:6803/3112350288]",
"hb_front_addr":
"[v2:192.168.41.12:6802/3112350288,v1:192.168.41.12:6803/3112350288]",
Now when I try to mount cephfs from the linux client, the mount command
is just stuck and runs into a timeout. I can ping the mon from the
client on both IPs, public (192.168.41.12) and cluster (192.168.30.12)
and I can also see packets coming in on the mon using tcpdump. What
could be wrong here? I'm using fuse-cephfs.
One more question regarding rebuilding the cluster using cephadm, is
there a simple tear-down command? My bootstrap host is a VM so I can use
snapshots, but the nodes I have to clean manually, by removing all pods
and ceph directories.
Best Regards,
Oliver
Hi,
In my test ceph octopus cluster I was trying to simulate a failure case of
when client mounted cephfs thru kernel client and doing read and write
process, shutting down entire cluster with OSD flags like no down, no out,
no backfiling and no recovery.
Cluster is 4 node composed of 3 mons, 2 mgr, 2 mds, 48 OSD's.
Public IP range : 10.0.103.0 and Cluster IP range : 10.0.104.0
Write and Read got stalled after some time cluster was brought live and
healthy. But when reading file thru kernel mount read start at above
100MB/s and suddenly drops to byte and continues for long.
only error msg I could see in the client machine.
[ 167.591095] ceph: loaded (mds proto 32)
[ 167.600010] libceph: mon0 10.0.103.1:6789 session established
[ 167.601167] libceph: client144519 fsid
f8bc7682-0d11-11eb-a332-0cc47a5ec98a
[ 272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
CONNECTING)
What went wrong why is this issue.?
regards
Amudhan P
Hi,
This same error keeps happening to me: after writing some amount of data to an RBD image it gets stuck and no read or write operation on it works. Every operation hangs. I cannot resize, alter features, read or write data. I can mount it, but using parted or fdisk hangs indefinitely. In the end all I can do is remove the image.
Again, I see no errors on the logs and Ceph's status is OK. I tried to alter some log levels, but still no helpful info.
Is there anything I should check? Rados?
--
Salsa
Sent with ProtonMail Secure Email.
Hi,
i am experience the same problem. Could you please advise something how
to resolve this issue?
The fix should be shipped with 15.2.6 version of "ceph-common" or ceph
version?
I have my cluster in docker containers and systemd services.
How can I upgrade cluster to 15.2.6 if the command for upgrading fails?
sudo ceph orch upgrade start --ceph-version 15.2.5
Error ENOENT: Module not found
Hi list,
I see a few changes in the (minor) version changelogs in the default for
bluefs_buffered_io setting. Sometimes it is set to true in our version
(14.2.11) it is set to false
Can someone shed a light on this setting? I fail to find any documentation
on it. ceph config help is not entirely clear to me as well
- What does it do exactly when true
- If false does that mean that the linux buffer cache is always skipped?
And caching happens in the osd proces only?
- if enabled should we lower the osd_memory_target to leave more space for
the linux buffer cache? What would be the percentage of memory that we
then assign to osd_memory_targets
Marcel
Thank you for the suggestion. It does indeed seem to explain why the OSD nodes are no longer using the Buffers for caching.
Unfortunately, changing the value bluefs_buffered_io does not seem to make any difference in performance. I will keep looking for clues.