Hello,
So I know, the mon services can only bind to just one ip.
But I have to make it accessible to two networks because internal and external servers have to mount the cephfs.
The internal ip is 10.99.10.1 and the external is some public-ip.
I tried nat'ing it with this: "firewall-cmd --zone=public --add-forward-port=port=6789:proto=tcp:toport=6789:toaddr=10.99.10.1 -permanent"
So the nat is working, because I get a "ceph v027" (alongside with some gibberish) when I do a telnet "telnet *public-ip* 6789"
But when I try to mount it, I get just a timeout:
mount -vvvv -t ceph *public-ip*:6789:/testing /mnt -o name=test,secretfile=/root/ceph.client. test.key
mount error 110 = Connection timed out
The tcpdump also recognizes a "Ceph Connect" packet, coming from the mon.
How can I get around this problem?
Is there something I have missed?
Specs:
Latest Octopus 15.2.4
Centos 8
8 Nodes
No health warnings.
Thanks in advance,
Simon
Hello,
root@ceph02:~# ceph orch ps
NAME HOST STATUS REFRESHED AGE
VERSION IMAGE NAME IMAGE ID CONTAINER ID
mgr.ceph01 ceph01 running (18m) 6s ago 4w
15.2.4 docker.io/ceph/ceph:v15.2.4 54fa7e66fb03 7deebe09f6fd
(...)
mgr.cph02 ceph02 error 3s ago 4w
<unknown> docker.io/ceph/ceph:v15.2.4 <unknown> <unknown>
I must have been drunk when i added "mgr.cph02".
Now i am sober again, but i get:
root@ceph02:~# ceph orch rm mgr.cph02 --force
Failed to remove service. <mgr.cph02> was not found.
:-(
Any hint?
Cheers,
Michael
I have production cluster under jewel with rbd & mds under Gentoo. Building
luminous with mgr now problematic (mostly by dropping python 3.5 on the eclass
level). But for nautilus/etc I must go over luminous as transit. Can I temporary
use luminous without mgr (at least to wait for scrub)? What may be happened?
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.by/
Hi all,
we are in the process of upgrading one of our three containerised ceph clusters from luminous(12.2.12) to nautilus(14.2.9). We already upgraded all ceph-mons, ceph-mgrs and all of our ceph-osd hosts that are full flash nodes with optane cache and nvme data devices. Our process to upgrade osds is to completely purge a complete host and redeploy the osd's with nautilus and lvm under it.
This all went fine until we started touching our hdd nodes which serve a pool that provides a cephfs with erasure coding. The problem we are facing at the moment is that when we set a single hdd osd out the ceph command starts hanging for a couple of tens of seconds and the ceph quorum gets degraded because one of the ceph-mons gets marked as out with a lease_timeout. In the logs we could see some slow ops from the failing ceph-mon which were mon_subscribe events from osd's (full flash and hdd) hanging between all_read and dispatched for around 10 seconds.
In our metrics we can see that the memory consumption of that one ceph-mon (not the leader!) increases to up to 60GB and also the cpu usage increases dramatically. Looking into logs does not show any obvious problem, we can see that the cluster sets the osd out and starts backfilling to other osds, but at some point the failing mon stops logging completely for around one minute then resumes logging after rejoining the quorum and keeps on logging normal backfilling behaviour.
Here are some heap stats after the cluster was responsive and the ceph-mon was behaving normal again:
MALLOC: 711286936 ( 678.3 MiB) Bytes in use by application
MALLOC: + 27156045824 (25898.0 MiB) Bytes in page heap freelist
MALLOC: + 17420216 ( 16.6 MiB) Bytes in central cache freelist
MALLOC: + 9370880 ( 8.9 MiB) Bytes in transfer cache freelist
MALLOC: + 25277104 ( 24.1 MiB) Bytes in thread cache freelists
MALLOC: + 104857600 ( 100.0 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 28024258560 (26726.0 MiB) Actual memory used (physical + swap)
MALLOC: + 75189862400 (71706.6 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 103214120960 (98432.7 MiB) Virtual address space used
MALLOC:
MALLOC: 33647 Spans in use
MALLOC: 23 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
We tested all this in a test cluster without any problems.
Does anyone has an idea what could be going on or where to look for further debugging?
Best Regards
Max
While continuing my saga with the rgw orphans and dozens of terabytes of wasted space I have used the rgw-orphan-list tool. after about 45 mins the tool has crashed (((
# time rgw-orphan-list .rgw.buckets
Pool is ".rgw.buckets".
Note: output files produced will be tagged with the current timestamp -- 202008241403.
running 'rados ls' at Mon Aug 24 15:03:29 BST 2020
running 'radosgw-admin bucket radoslist' at Mon Aug 24 15:26:37 BST 2020
/usr/bin/rgw-orphan-list: line 64: 31745 Aborted (core dumped) radosgw-admin bucket radoslist > "$rgwadmin_out" 2> "$rgwadmin_err"
An error was encountered while running 'radosgw-admin radoslist'. Aborting.
Review file './radosgw-admin-202008241403.error' for details.
***
*** WARNING: The results are incomplete. Do not use! ***
***
I've got the error file with more information on the error if anyone is interested in improving the tool.
Cheers
Andrei
On Tue, Aug 25, 2020 at 6:54 AM huxiaoyu(a)horebdata.cn
<huxiaoyu(a)horebdata.cn> wrote:
>
> Dear Ceph folks,
>
> I am running Openstack Queens to host a variety of Apps, with ceph backend storage Luminous 12.2.13.
>
> Is there a solution to support IOPS constraints on a specific rbd volume from Ceph side? I konw Nautilus may support, but i am a bit hesitate to upgrade. I would like to give Nautilus more time to become more mature.
>
> thnaks a lot in advance,
There is no current support backend Ceph QoS (it's in development) --
just frontend QoS throttles which OpenStack handles via QEMU.
> samuel
>
>
>
>
> huxiaoyu(a)horebdata.cn
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
Jason
Dear Ceph folks,
I am running Openstack Queens to host a variety of Apps, with ceph backend storage Luminous 12.2.13.
Is there a solution to support IOPS constraints on a specific rbd volume from Ceph side? I konw Nautilus may support, but i am a bit hesitate to upgrade. I would like to give Nautilus more time to become more mature.
thnaks a lot in advance,
samuel
huxiaoyu(a)horebdata.cn
Hi everyone,
I have a serious problem which currently exists of my entire Ceph no longer being able to provide service. As if yesterday I added 10 OSD's total 2 per node, the rebalance started and took some IO but seemed to be doing its work. This morning the cluster was still processing the rebalance and taking so much IO that nearly all OSD's where marked as "slow ops" and from there everything went wrong. As attempt to clear as much IO for de rebalance I stoped all the clients and waited for the rebalance to finish. After it finished the cluster remained extremely slow and unusable. Whilst trying to debug I restarted several services and nodes trying to find the problem. Now the cluster has entered a state where multiple OSD's remain slow, various OSD's show a "BADAUTHORIZER" message and the mgr on all nodes also has issues "verify_authorizer".
I verified all the clocks on all servers and they are sinked to the same NTP service and seem good.
Please please please advise as straight 13 hours of debugging got me nowhere.
Current version: ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)
Mgr error example:
2020-08-24 20:19:12.865 7f2baf56a700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=12230
2020-08-24 20:19:13.043 7f2bb056c700 0 auth: could not find secret_id=12230
2020-08-24 20:19:13.043 7f2bb056c700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=12230
2020-08-24 20:19:13.210 7f2bb056c700 0 auth: could not find secret_id=12230
2020-08-24 20:19:13.210 7f2bb056c700 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=12230
OSD error example:
2020-08-24 19:47:15.777 7f9957d79700 -1 osd.19 41255 get_health_metrics reporting 72 slow ops, oldest is osd_op(mds.0.1510:4 12.a6 12.4b2c82a6 (undecoded) ondisk+retry+read+known_if_redirected+full_force e41119)
2020-08-24 19:47:15.833 7f995c88b700 0 auth: could not find secret_id=12230
2020-08-24 19:47:15.833 7f995c88b700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=12230
2020-08-24 19:47:15.833 7f995c88b700 0 --1- [v2:10.201.1.17:6814/1030299,v1:10.201.1.17:6815/1030299] >> v1:10.201.1.20:6823/1023281 conn(0x55affb8e7c00 0x55b007441000 :6815 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2: got bad authorizer, auth_reply_len=0
> -----Original Message-----
> From: Anthony D'Atri <anthony.datri(a)gmail.com>
> Sent: Monday, August 24, 2020 7:30 PM
> To: Tony Liu <tonyliu0592(a)hotmail.com>
> Subject: Re: [ceph-users] Re: Add OSD with primary on HDD, WAL and DB on
> SSD
>
> Why such small HDDs? Kinda not worth the drive bays and power, instead
> of the complexity of putting WAL+DB on a shared SSD, might you have been
> able to just buy SSDs and not split? ymmv.
2TB is for testing, it will bump up to 10TB for production.
> The limit is a function of the way the DB levels work, it’s not
> intentional.
>
> WAL by default takes a fixed size, like 512 MB or something.
>
> 64 GB is a reasonable size, it accomodates the WAL and allows space for
> DB compaction without overflowing.
For each 10TB HDD, what's the recommended DB device size for both
DB and WAL? The doc recommends 1% - 4%, meaning 100GB - 400GB for
each 10TB HDD. But given the WAL data size and DB data size, I am
not sure if that 100GB - 400GB will be used efficiently.
> With this commit the situation should be improved, though you don’t
> mention what release you’re running
>
> https://github.com/ceph/ceph/pull/29687
I am using ceph version 15.2.4 octopus (stable).
Thanks!
Tony
> >>> I don't need to create
> >>> WAL device, just primary on HDD and DB on SSD, and WAL will be using
> >>> DB device cause it's faster. Is that correct?
> >>
> >> Yes.
> >>
> >>
> >> But be aware that the DB sizes are limited to 3GB, 30GB and 300GB.
> >> Anything less than those sizes will have a lot of untilised space,
> >> e.g a 20GB device will only utilise 3GB.
> >
> > I have 1 480GB SSD and 7 2TB HDDs. 7 LVs are created on SSD, each is
> > about 64GB, for 7 OSDs.
> >
> > Since it's shared by DB and WAL, DB will take 30GB and WAL will take
> > the rest 34GB. Is that correct?
> >
> > Is that size of DB and WAL good for 2TB HDD (block store and object
> > store cases)?
> >
> > Could you share a bit more about the intention of such limit?
> >
> >
> > Thanks!
> > Tony
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> > email to ceph-users-leave(a)ceph.io
On 25/08/2020 6:07 am, Tony Liu wrote:
> I don't need to create
> WAL device, just primary on HDD and DB on SSD, and WAL will be
> using DB device cause it's faster. Is that correct?
Yes.
But be aware that the DB sizes are limited to 3GB, 30GB and 300GB.
Anything less than those sizes will have a lot of untilised space, e.g a
20GB device will only utilise 3GB.
--
Lindsay