Hi,
I have a test cluster now running on Pacific with the cephadm
orchestrator and upstream container images.
In the Dashboard on the services tab I created a new service for NFS.
The containers got deployed.
But when I go to the NFS tab and try to create a new NFS share the
Dashboard only returns a 500 error:
Apr 05 19:38:49 ceph01 bash[35064]: debug 2021-04-05T17:38:49.146+0000 7f64468d1700 0 [dashboard ERROR exception] Internal Server Error
Apr 05 19:38:49 ceph01 bash[35064]: Traceback (most recent call last):
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 46, in dashboard_exception_handler
Apr 05 19:38:49 ceph01 bash[35064]: return handler(*args, **kwargs)
Apr 05 19:38:49 ceph01 bash[35064]: File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
Apr 05 19:38:49 ceph01 bash[35064]: return self.callable(*self.args, **self.kwargs)
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 694, in inner
Apr 05 19:38:49 ceph01 bash[35064]: ret = func(*args, **kwargs)
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/controllers/nfsganesha.py", line 265, in fsals
Apr 05 19:38:49 ceph01 bash[35064]: return Ganesha.fsals_available()
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/ganesha.py", line 154, in fsals_available
Apr 05 19:38:49 ceph01 bash[35064]: if RgwClient.admin_instance().is_service_online() and \
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 301, in admin_instance
Apr 05 19:38:49 ceph01 bash[35064]: return RgwClient.instance(daemon_name=daemon_name)
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 241, in instance
Apr 05 19:38:49 ceph01 bash[35064]: RgwClient._daemons = _get_daemons()
Apr 05 19:38:49 ceph01 bash[35064]: File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 53, in _get_daemons
Apr 05 19:38:49 ceph01 bash[35064]: raise NoRgwDaemonsException
Apr 05 19:38:49 ceph01 bash[35064]: dashboard.services.rgw_client.NoRgwDaemonsException: No RGW service is running.
Apr 05 19:38:49 ceph01 bash[35064]: debug 2021-04-05T17:38:49.150+0000 7f64468d1700 0 [dashboard ERROR request] [::ffff:10.0.44.42:39898] [GET] [500] [0.030s] [admin] [513.0B] /ui-api/nfs-ganesha/fsals
Apr 05 19:38:49 ceph01 bash[35064]: debug 2021-04-05T17:38:49.150+0000 7f64468d1700 0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "e89b8519-352f-4e44-a364-6e6faf9dc533"} ']
I have no radosgateways in that cluster (currently). There are the pools
for radosgw (.rgw.root etc) but no running instance.
Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Hello everyone,
I currently have a CephFS running with about 60TB of Data. I created it
with a replicated pool as default pool, an erasure coded one as
additional data pool like it is described in the docs. Now I want to
migrate the data from the replicated pool, to the new erasure coded one.
I couldn't find any docs and was wondering if its even possible
currently.
Thank you very much,
Fionera
Good morning all,
I'm experimenting with ceph orchestration and cephadm after using
ceph-deploy for several years, and I have a hopefully simple question.
I've converted a basic nautilus cluster over to cephadm+orchestration
and I tried adding, then removing a monitor. However, when I removed the
host using 'ceph orch host rm', it removed two mons. I may have missed
something in the adoption/upgrade that has left the cluster in a bad
state. Any advice/pointers/clarification would be of assistance.
Details:
A nautilus cluster with two mons (I know this is not correct for
quorum), a mgr, and a handful of osds. I went though the adoption
process and enabled the ceph orch backend.
[root@osdev-ctrl2 ~]# ceph orch ps
NAME HOST STATUS REFRESHED
AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
mgr.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10
docker.io/ceph/ceph:v15.2.10 5b724076c58f e73c19b51a09
mon.osdev-ctrl2 osdev-ctrl2 running (18h) 50s ago 18h 15.2.10
docker.io/ceph/ceph:v15.2.10 5b724076c58f a6bfc27221f0
mon.osdev-net1 osdev-net1 running (18h) 50s ago 18h 15.2.10
docker.io/ceph/ceph:v15.2.10 5b724076c58f f66e2bef3d44
osd.0 osdev-stor1 running (17h) 50s ago 17h
15.2.10 docker.io/ceph/ceph:v15.2.10 5b724076c58f ac59dbdc267c
...
[root@osdev-ctrl2 ~]# ceph orch status
Backend: cephadm
Available: True
[root@osdev-ctrl2 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
osdev-ctrl2 osdev-ctrl2 mon mgr
osdev-net1 osdev-net1 mon
osdev-stor1 osdev-stor1 osd
[root@osdev-ctrl2 ~]# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME
IMAGE ID
mgr 1/1 9m ago 20h label:mgr docker.io/ceph/ceph:v15.2.10
5b724076c58f
mon 2/2 9m ago 20h label:mon docker.io/ceph/ceph:v15.2.10
5b724076c58f
I then added a new mon host:
[root@osdev-ctrl2 ~]# ceph orch host add osdev-ctrl3 mon
It did not spawn a mon container on osdev-ctrl3 until I defined the
public network in the config:
[root@osdev-ctrl2 ~]# ceph config set global public_network 10.10.10.0/24
At this point all is good with three running mon as expected. Now I
wanted to delete the mon using
[root@osdev-ctrl2 ~]# ceph orch host rm osdev-ctrl3
This had the effect of:
1. removing the osdev-ctrl3 mon from 'ceph orch ls' and 'ceph orch ps'
2. the mon on osdev-ctrl3 is still running, and is part of 'ceph
-s' but reported as not managed by cephadm
3. (Big issue) the mon running on osdev-net1 was completely destroyed.
Any ideas what is going on? Sorry for the long post, but I tried to be
as clear as possible.
--
Gary Molenkamp Computer Science/Science Technology Services
Systems Administrator University of Western Ontario
molenkam(a)uwo.ca http://www.csd.uwo.ca
(519) 661-2111 x86882 (519) 661-3566
Hello. I was have one-way multisite S3 cluster and we've seen issues
with rgw-sync due to sharding problems and I've stopped the multisite
sync. This is not the topic just a knowledge about my story.
I have some leftover 0 byte objects in destination and I'm trying to
overwrite them with Rclone "path to path". But somehow I can not
overwrite these objects. If I delete with rclone or rados rm and do
rclone copy again, I got the result below. Rclone gives error but the
object is created again "0 byte" with pending attrs. Why is this
happening?
I think somehow I need to clean these objects and copy from source
again but how?
What is "user.rgw.olh.pending" ?
[root@SRV1]# radosgw-admin --id radosgw.prod1 object stat
--bucket=mybucket
--object=images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f
{
"name": "images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f",
"size": 0,
"tag": "713li30rvcrjfwhctx894mj7vf1wa1a8",
"attrs": {
"user.rgw.manifest": "",
"user.rgw.olh.idtag": "v1m9jy4cjck38ptel09qebsbb10pe2af",
"user.rgw.olh.info": "\u0001\u0001�",
"user.rgw.olh.pending.00000000606b04728gs23ecq11b3i3l1":
"\u0001\u0001\u0008",
"user.rgw.olh.pending.00000000606b0472bfhdzxeb9wesd8t7":
"\u0001\u0001\u0008",
"user.rgw.olh.pending.00000000606b0472fv06t1dob3vmo4da":
"\u0001\u0001\u0008",
"user.rgw.olh.pending.00000000606b0472lql6c9o88rt211r9":
"\u0001\u0001\u0008",
"user.rgw.olh.ver": ""
}
}
[root@SRV1]# rados listxattr -p prod.rgw.buckets.data
c106b26b-xxx-xxxx-xxx-dee3ca5c0968.121384004.3_images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f
user.rgw.idtag
user.rgw.olh.idtag
user.rgw.olh.info
user.rgw.olh.ver
[root@SRV1]# rados -p prod.rgw.buckets.data stat
c106b26b-xxx-xxxx-xxx-dee3ca5c0968.121384004.3_images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f
prod.rgw.buckets.data/c106b26b-xxx-xxxx-xxx-dee3ca5c0968.121384004.3_images/2019/05/29/ad4ba79c-bb66-4ff6-847a-09a1e0cff49f
mtime 2021-04-05 17:10:55.000000, size 0
Hi! How/where can i change the image configured for a service?
I tried to modify /var/lib/ceph/<fsid>/<service_name>/unit.{image,run}
but after restarting
ceph orch ps shows that the service use the same old image.
What other configuration locations are there for the ceph components
beside /etc/ceph (which is quite sparse) and /var/lib/ceph/<fsid> ?
Thank you!
Adrian
Hello,
I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu nodes and have a few questions mostly regarding planning of the infra.
1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is it ok to use Ubuntu 20.04 instead or should I stick with 18.04?
2) The documentation recommends using Cephadm for new deployments, so I will use that but I read that with Cephadm everything is running in containers, so is this the new way to go? Or is Ceph in containers kind of still experimental?
3) As I will be needing cephfs I will also need MDS servers so with a total of 6 nodes I am planning the following layout:
Node 1: MGR+MON+MDS
Node 2: MGR+MON+MDS
Node 3: MGR+MON+MDS
Node 4: OSD
Node 5: OSD
Node 6: OSD
Does this make sense? I am mostly interested in stability and HA with this setup.
4) Is there any special kind of demand in terms of disks on the MGR+MON+MDS nodes? Or can I use have my OS disks on these nodes? As far as I understand the MDS will create a metadata pool on the OSDs.
Thanks for the hints.
Best,
Mabi
Good morning,
I was wondering if there are any timing indications as to how long a PG
should "usually" stay in a certain state?
For instance, how long should a pg stay in
- peering (seconds - minutes?)
- activating (seconds?)
- srubbing (+deep)
The scrub process obviously depends on the number of objects in the PG,
however is the same true for peering and activation? Since Nautilus we
see longer (minutes long) peering states in the cluster, which we did
not see before.
Thanks for your input and hav a good start into the week!
Best regards,
Nico
--
Sustainable and modern Infrastructures by ungleich.ch
Hi,
I hope this mailgroup is ok for this kind of questions, if not please
ignore.
I'm currently in the process of planning a smaller ceph cluster mostly
for cephfs use.
The budget still allows for some SSD's in addition to the required
harddisks.
I see two options on how to use those,
a) Make SSD only pools for the cephfs metadata
b) Give every OSD a SSD for the bluestore cache
I was not able to find any suggestions or benchmarks so far, does anyone
has further resources or insight into those options?
Greetings,
Kai