Em sáb., 4 de jul. de 2020 às 11:27, Dave Hall <kdhall(a)binghamton.edu> escreveu:
>
> Rodrigo,
>
> I tried to send this to the list last night, but it looks like it didn't go through. I had a problem very much like this when I was first setting up my cluster. It turned out to be a missing systemd unit file. I would suggest that you check to see that you have an instance of ceph-volume@.service and an instance of ceph-osd@.service running for each OSD.
I manually started my ceph-volume@ instances and then I managed to
successfully restart my ceph-osd@ instances.
My OSDs are back.
Does anybody knows how is responsible for calling the ceph-volume@
instances on boot?
Regards,
Rodrigo
Em sáb., 4 de jul. de 2020 às 11:27, Dave Hall <kdhall(a)binghamton.edu> escreveu:
>
> Rodrigo,
>
> I tried to send this to the list last night, but it looks like it didn't go through. I had a problem very much like this when I was first setting up my cluster. It turned out to be a missing systemd unit file. I would suggest that you check to see that you have an instance of ceph-volume@.service and an instance of ceph-osd@.service running for each OSD.
>
> ceph-volume(a)lvm-10-a63a2465-9de2-497e-bf60-72ed4c3c4c33.service
> ceph-volume(a)lvm-11-a2151f1f-84f5-407b-a730-9d2a9502a85f.service
> ceph-volume(a)lvm-12-7a16123b-7a0f-40a2-9159-5555724d9978.service
> ceph-volume(a)lvm-13-bda32c07-ecc7-40c4-8066-383977c5e795.service
> ceph-volume(a)lvm-14-4cc89d3a-5390-415d-b12b-9a557b7b5950.service
> ceph-volume(a)lvm-15-21bbdaf5-1994-43a9-aa80-982689f51438.service
> ceph-volume(a)lvm-8-cda4394b-e132-4530-8044-3cbbfbcbea19.service
> ceph-volume(a)lvm-9-72ff199d-a4e4-4d26-9a4f-00337f8fdc7c.service
>
> ceph-osd(a)10.service
> ceph-osd(a)12.service
> ceph-osd(a)14.service
> ceph-osd(a)8.service
> ceph-osd(a)11.service
> ceph-osd(a)13.service
> ceph-osd(a)15.service
> ceph-osd(a)9.service
>
> In my case I found that the proto-unit for one of these (ceph-volume@.service, I think) was missing from /var/lib/systemd/system. In fact, it was missing from the Debian install package for some reason. I think I had to unpack a copy of the Debian SRC pacakge to retrieve the file. Once I added the missing file and made sure the unit was enabled, things started working better and making more sense. I don't recall if all of the instances created themselves or whether I had to do something addtional, but it worked.
Hi Dave,
I'm looking around but can't identify any missing file.
ceph-osd@.service and ceph-volume@.service are present. I'm looking on
the other servers for some other file that might be missing but can't
find anything.
Thanks for your help,
Rodrigo
Em sex., 3 de jul. de 2020 às 17:41, Marc Roos
<M.Roos(a)f1-outsourcing.eu> escreveu:
>
> So mount it, if it is empty
Sure. That was my first impulse but as I said, in my other osd
servers, these mounts are tmpfs filesystems.
It's easy to manually mount them but how would I populate them?
Regards,
Rodrigo Severo
>
>
>
> -----Original Message-----
> To: ceph-users
> Subject: [ceph-users] Ceph OSD not mounting after reboot
>
> Hi,
>
>
> Just rebooted one of my OSD servers after upgrading Ceph from 14.2.9 to
> 14.2.10 and it's OSDs won't come up.
>
> I find the following messages on my log:
>
> 4991 Jul 3 17:24:03 osdserver1-df ceph-osd[1272]: 2020-07-03
> 17:24:03.036 7fcc497f1c00 -1 auth: unable to find a keyring on
> /var/lib/ceph/osd/ceph-6/keyring: (2) No such file or directory
> 4992 Jul 3 17:24:03 osdserver1-df ceph-osd[1272]: 2020-07-03
> 17:24:03.036 7fcc497f1c00 -1 AuthRegistry(0x55e2ff810140) no keyring
> found at /var/lib/ceph/osd/ceph-6/keyring, disabling cephx
>
> and my /var/lib/ceph/osd/ceph-6 directory is empty.
>
> I see that on my other servers these /var/lib/ceph/osd/ceph-?
> directories are tmpfs mounts but I can't understand who is responsible
> for mounting them as there are no entries for them in /etc/fstab.
>
> How can I fix this osd server?
>
>
> Regards,
>
> Rodrigo Severo
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
>
Hi,
Just rebooted one of my OSD servers after upgrading Ceph from 14.2.9 to
14.2.10 and it's OSDs won't come up.
I find the following messages on my log:
4991 Jul 3 17:24:03 osdserver1-df ceph-osd[1272]: 2020-07-03
17:24:03.036 7fcc497f1c00 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-6/keyring: (2) No such file or directory
4992 Jul 3 17:24:03 osdserver1-df ceph-osd[1272]: 2020-07-03
17:24:03.036 7fcc497f1c00 -1 AuthRegistry(0x55e2ff810140) no keyring found
at /var/lib/ceph/osd/ceph-6/keyring, disabling cephx
and my /var/lib/ceph/osd/ceph-6 directory is empty.
I see that on my other servers these /var/lib/ceph/osd/ceph-? directories
are tmpfs mounts but I can't understand who is responsible for mounting
them as there are no entries for them in /etc/fstab.
How can I fix this osd server?
Regards,
Rodrigo Severo
Hi all,
we are currently experiencing a problem with the Obejct Gateway part of the dashboard not working anymore:
We had a working setup were the RGW servers only had 1 network interface with an IP address that was reachable by the monitor servers and the dashboard was working as expected.
After our initial tests everything was working great and we decided to add another physical link to the RGW Servers for the traffic to the clients.
With that network change we also had to set the default gateway to the new interface while adding static routes for the rest of the ceph environment.
To avoid issues with hostnames (the old hostname now resolves to the new interface) we added another hostname for the internal traffic, purged the gateways from ceph and added them again via ceph-deploy rgw create with the new hostname.
The S3 communication is working perfectly fine as it did before, we can reach all buckets and the monitors can communicate with the Gateway. The Dashboard however throws the following error whenever we navigate to any of the object gateway menus:
—————————————————————————————————
2020-07-03 10:33:41.871 7fa0f9dbc700 0 mgr[dashboard] [03/Jul/2020:10:33:41] HTTP Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py", line 656, in respond
response.body = self.handler()
File "/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py", line 188, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/cherrypy/_cptools.py", line 221, in wrap
return self.newhandler(innerfunc, *args, **kwargs)
File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 88, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py", line 34, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 661, in inner
ret = func(*args, **kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 28, in status
if not instance.is_service_online():
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 507, in func_wrapper
**kwargs)
File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 321, in is_service_online
_ = request({'format': 'json'})
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 313, in __call__
data, raw_content)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 445, in do_request
ex.args[0].reason.args[0])
File "/usr/lib64/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
TypeError: expected string or buffer
2020-07-03 10:33:41.872 7fa0f9dbc700 0 mgr[dashboard] [2a02:2e0:13::a05:42784] [GET] [500] [45.044s] [plusline] [1.8K] /api/rgw/status
2020-07-03 10:33:41.872 7fa0f9dbc700 0 mgr[dashboard] ['{"status": "500 Internal Server Error", "version": "3.2.2", "traceback": "Traceback (most recent call last):\\n File \\"/usr/lib/python2.7/site-
packages/cherrypy/_cprequest.py\\", line 656, in respond\\n response.body = self.handler()\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line 188, in __call__\\n self.b
ody = self.oldhandler(*args, **kwargs)\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\", line 221, in wrap\\n return self.newhandler(innerfunc, *args, **kwargs)\\n File \\"/usr/s
hare/ceph/mgr/dashboard/services/exception.py\\", line 88, in dashboard_exception_handler\\n return handler(*args, **kwargs)\\n File \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", l
ine 34, in __call__\\n return self.callable(*self.args, **self.kwargs)\\n File \\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 661, in inner\\n ret = func(*args, **kwargs)\\n F
ile \\"/usr/share/ceph/mgr/dashboard/controllers/rgw.py\\", line 28, in status\\n if not instance.is_service_online():\\n File \\"/usr/share/ceph/mgr/dashboard/rest_client.py\\", line 507, in func_w
rapper\\n **kwargs)\\n File \\"/usr/share/ceph/mgr/dashboard/services/rgw_client.py\\", line 321, in is_service_online\\n _ = request({\'format\': \'json\'})\\n File \\"/usr/share/ceph/mgr/dashb
oard/rest_client.py\\", line 313, in __call__\\n data, raw_content)\\n File \\"/usr/share/ceph/mgr/dashboard/rest_client.py\\", line 445, in do_request\\n ex.args[0].reason.args[0])\\n File \\"/usr/lib64/python2.7/re.py\\", line 137, in match\\n return _compile(pattern, flags).match(string)\\nTypeError: expected string or buffer\\n", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "e0d6ff11-4dad-496a-9ee7-9db036c46ab7"}']
—————————————————————————————————
We are running ceph version 14.2.9 on CentOS 7.7. Any help on how to debug this would be greatly apreciated.
Best Regards,
Hendrik
Hello.
I have tried to follow through the documented writeback cache tier
removal procedure
(https://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-…)
on a test cluster, and failed.
I have successfully executed this command:
ceph osd tier cache-mode alex-test-rbd-cache proxy
Next, I am supposed to run this:
rados -p alex-test-rbd-cache ls
rados -p alex-test-rbd-cache cache-flush-evict-all
The failure mode is that, while the client i/o still going on, I
cannot get zero objects in the cache pool, even with the help of
"rados -p alex-test-rbd-cache cache-flush-evict-all". And yes, I have
waited more than 20 minutes (my cache tier has hit_set_count 10 and
hit_set_period 120).
I also tried to set both cache_target_dirty_ratio and
cache_target_full_ratio to 0, it didn't help.
Here is the relevant part of the pool setup:
# ceph osd pool ls detail
pool 25 'alex-test-rbd-metadata' replicated size 3 min_size 2
crush_rule 9 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode
warn last_change 10973111 lfor 0/10971347/10971345 flags
hashpspool,nodelete stripe_width 0 application rbd
pool 26 'alex-test-rbd-data' erasure size 6 min_size 5 crush_rule 12
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn
last_change 10973112 lfor 10971705/10971705/10971705 flags
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps tiers 27 read_tier
27 write_tier 27 stripe_width 16384 application rbd
removed_snaps [1~3]
pool 27 'alex-test-rbd-cache' replicated size 3 min_size 2 crush_rule
9 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 10973113 lfor 10971705/10971705/10971705 flags
hashpspool,incomplete_clones,nodelete,selfmanaged_snaps tier_of 26
cache_mode proxy target_bytes 10000000000 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 120s
x10 decay_rate 0 search_last_n 0 stripe_width 0 application rbd
removed_snaps [1~3]
The relevant crush rules are selecting ssds for the
alex-test-rbd-cache and alex-test-rbd-metadata pools (plain old
"replicated size 3" pools), and hdds for alex-test-rbd-data (which is
EC 4+2).
The client workload, which seemingly outpaces the eviction and flushing, is:
for a in `seq 1000 2000` ; do
time rbd import --data-pool alex-test-rbd-data
./Fedora-Cloud-Base-32-1.6.x86_64.raw
alex-test-rbd-metadata/Fedora-copy-$a
done
The ceph version is "ceph version 14.2.9
(2afdc1f644870fb6315f25a777f9e4126dacc32d) nautilus (stable)" on all
osds.
The relevant part of "ceph df" is:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 23 TiB 20 TiB 2.9 TiB 3.0 TiB 12.99
ssd 1.7 TiB 1.7 TiB 19 GiB 23 GiB 1.28
TOTAL 25 TiB 22 TiB 2.9 TiB 3.0 TiB 12.17
POOLS:
POOL ID STORED OBJECTS USED
%USED MAX AVAIL
<irrelevant pools omitted>
alex-test-rbd-metadata 25 237 KiB 2.37k 59
MiB 0 564 GiB
alex-test-rbd-data 26 691 GiB 198.57k 1.0
TiB 6.52 9.7 TiB
alex-test-rbd-cache 27 5.1 GiB 2.99k 15
GiB 0.90 564 GiB
The total size and the number of stored objects in the
alex-test-rbd-cache pool oscillate around 5 GB and 3K, respectively,
while "rados -p alex-test-rbd-cache cache-flush-evict-all" is running
in a loop. Without it, the size grows to 6 GB and stays there.
# ceph -s
cluster:
id: <omitted for privacy>
health: HEALTH_WARN
1 cache pools at or near target size
services:
mon: 3 daemons, quorum xx-4a,xx-3a,xx-2a (age 10d)
mgr: xx-3a(active, since 5w), standbys: xx-2b, xx-2a, xx-4a
mds: cephfs:1 {0=xx-4b=up:active} 2 up:standby
osd: 89 osds: 89 up (since 7d), 89 in (since 7d)
rgw: 3 daemons active (xx-2b, xx-3b, xx-4b)
tcmu-runner: 6 daemons active (<only irrelevant images here>)
data:
pools: 15 pools, 1976 pgs
objects: 6.64M objects, 1.3 TiB
usage: 3.1 TiB used, 22 TiB / 25 TiB avail
pgs: 1976 active+clean
io:
client: 290 KiB/s rd, 251 MiB/s wr, 366 op/s rd, 278 op/s wr
cache: 123 MiB/s flush, 72 MiB/s evict, 31 op/s promote, 3 PGs
flushing, 1 PGs evicting
Is there any workaround, short of somehow telling the client to stop
creating new rbds?
--
Alexander E. Patrakov
CV: http://pc.cd/PLz7
Thanks Ramana and David.
So we are using the Shaman search API to get the latest build for
ceph_nautilus flavor of NFS Ganesha, and that's how we get to the mentioned
build. We are doing this since it's part of our CI and it's better for
automation.
Should we use different repos?
Thanks,
V
On Wed, Jun 24, 2020 at 3:33 PM Victoria Martinez de la Cruz <
vkmc(a)redhat.com> wrote:
> Thanks Ramana and David.
>
> So we are using the Shaman search API to get the latest build for
> ceph_nautilus flavor of NFS Ganesha, and that's how we get to the mentioned
> build. We are doing this since it's part of our CI and it's better for
> automation.
>
> Should we use different repos?
>
> Thanks,
>
> V
>
> On Tue, Jun 23, 2020 at 2:42 PM David Galloway <dgallowa(a)redhat.com>
> wrote:
>
>>
>>
>> On 6/23/20 1:21 PM, Ramana Venkatesh Raja wrote:
>> > On Tue, Jun 23, 2020 at 6:59 PM Victoria Martinez de la Cruz
>> > <victoria(a)redhat.com> wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> I'm hitting issues with the nfs-ganesha-stable packages [0], the repo
>> url
>> >> [1] is broken. Is there a known issue for this?
>> >>
>> >
>> > The missing packages in chacra could be due to the recent mishap in
>> > the sepia long running cluster,
>> >
>> https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/YQMAHTB7MUHL25QP7V…
>>
>> Hi Victoria,
>>
>> Ramana is correct. Do you need 2.7.4 specifically? If not, signed
>> nfs-ganesha packages can also be found here:
>> http://download.ceph.com/nfs-ganesha/
>>
>> >
>> >> Thanks,
>> >>
>> >> Victoria
>> >>
>> >> [0]
>> >>
>> https://shaman.ceph.com/repos/nfs-ganesha-stable/V2.7-stable/1a1fb71cdb811c…
>> >> [1]
>> >>
>> https://chacra.ceph.com/r/nfs-ganesha-stable/V2.7-stable/1a1fb71cdb811c1bac…
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users(a)ceph.io
>> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>
>> >
>>
>>