Hi,
I have a CEPH 15.2.4 running in a docker. How to configure for use a
specific data pool? i try put the follow line in the ceph.conf but the
changes not working. .
[client.myclient]
rbd default data pool = Mydatapool
I need it to configure for erasure pool with cloudstack
Can help me? , where is the ceph conf we i need configure?
Thanks.
--
Untitled Document
Hi
Thanks for the reply.
cephadm runs ceph containers automatically. How to set privileged mode
in ceph container?
--
> El 23/9/20 a las 13:24, Daniel Gryniewicz escribió:
>> NFSv3 needs privileges to connect to the portmapper. Try running
>> your docker container in privileged mode, and see if that helps.
>>
>> Daniel
>>
>> On 9/23/20 11:42 AM, Gabriel Medve wrote:
>>> Hi,
>>>
>>> I have a CEPH 15.2.5 running in a docker , i configure nfs ganesha
>>> with nfs version 3 but i can not mount it.
>>> If configure ganesha with nfs version 4 i can mounted without
>>> problems but i need the version 3 .
>>>
>>> The error is mount.nfs: Protocol not supported
>>>
>>> Can help me?
>>>
>>> Thanks.
>>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> --
> Untitled Document
Is it possible to disable checking on 'x pool(s) have no replicas
configured', so I don't have this HEALTH_WARN constantly.
Or is there some other disadvantage of keeping some empty 1x replication
test pools?
Hi,
We are running a ceph cluster on Ubuntu 18.04 machines with ceph 14.2.4.
Our cephfs clients are using the kernel module and we have noticed that
some of them are sometimes (at least once) hanging after an MDS restart.
The only way to resolve this is to unmount and remount the mountpoint,
or reboot the machine if unmounting is not possible.
After some investigation, the problem seems to be that the MDS denies
reconnect attempts from some clients during restart even though the
reconnect interval is not yet reached. In particular, I see the following
log entries. Note that there are supposedly 9 sessions. 9 clients
reconnect (one client has two mountpoints) and then two more clients
reconnect after the MDS already logged "reconnect_done". These two
clients were hanging after the event. The kernel log of one of them is
shown below too.
Running `ceph tell mds.0 client ls` after the clients have been
rebooted/remounted also shows 11 clients instead of 9.
Do you have any ideas what is wrong here and how it could be fixed? I'm
guessing that the issue is that the MDS apparently has an incorrect
session count and stops the reconnect process to soon. Is this indeed a
bug and if so, do you know what is broken?
Regardless, I also think that the kernel should be able to deal with a
denied reconnect and that it should try again later. Yet, even after
10 minutes, the kernel does not attempt to reconnect. Is this a known
issue or maybe fixed in newer kernels? If not, is there a chance to get
this fixed?
Thanks,
Florian
MDS log:
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.server reconnect_clients -- 9 sessions
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24197043 v1:10.1.4.203:0/990008521 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.30487144 v1:10.1.4.146:0/483747473 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21019865 v1:10.1.7.22:0/3752632657 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21020717 v1:10.1.7.115:0/2841046616 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24171153 v1:10.1.7.243:0/1127767158 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.23978093 v1:10.1.4.71:0/824226283 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24209569 v1:10.1.4.157:0/1271865906 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190930 v1:10.1.4.240:0/3195698606 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190912 v1:10.1.4.146:0/852604154 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.59 reconnect_done
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.24167394 v1:10.1.67.49:0/1483641729 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1087700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.49:0/1483641729 conn(0x55af50053f80 0x55af50140800 :6801 s=OPENED pgs=21 cs=1 l=0).fault server, going to standby
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.30586072 v1:10.1.67.140:0/3664284158 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1888700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.140:0/3664284158 conn(0x55af50055600 0x55af50143000 :6801 s=OPENED pgs=8 cs=1 l=0).fault server, going to standby
Hanging client (10.1.67.49) kernel log:
> 2019-09-26T16:08:27.481676+02:00 hostnamefoo kernel: [708596.227148] ceph: mds0 reconnect start
> 2019-09-26T16:08:27.488943+02:00 hostnamefoo kernel: [708596.233145] ceph: mds0 reconnect denied
> 2019-09-26T16:16:17.541041+02:00 hostnamefoo kernel: [709066.287601] libceph: mds0 10.1.4.203:6801 socket closed (con state NEGOTIATING)
> 2019-09-26T16:16:18.068934+02:00 hostnamefoo kernel: [709066.813064] ceph: mds0 rejected session
> 2019-09-26T16:16:18.068955+02:00 hostnamefoo kernel: [709066.814843] ceph: get_quota_realm: ino (10000000008.fffffffffffffffe) null i_snap_realm
We have a fairly old cluster that has over time been upgraded to nautilus. We were digging through some things and found 3 bucket indexes without a corresponding bucket. They should have been deleted but somehow were left behind. When we try and delete the bucket index, it will not allow it as the bucket is not found. The bucket index list command works fine though without the bucket. Is there a way to delete the indexes? Maybe somehow relink the bucket so it can be deleted again?
Thanks,
Kevin
Hi,
I'm facing something strange! One of the PGs in my pool got inconsistent
and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty`
the `inconsistents` key was empty! What is this? Is it a bug in Ceph or..?
Thanks.
Hi all,
i try to run ceph client tools on an odroid xu4 (armhf) with Ubuntu 20.04
on python 3.8.5.
Unfortunately there is the following error on each "ceph" command (even in
ceph --help)
Traceback (most recent call last):
File "/usr/bin/ceph", line 1275, in <module>
retval = main()
File "/usr/bin/ceph", line 981, in main
cluster_handle = run_in_thread(rados.Rados,
File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1342, in
run_in_thread
raise Exception("timed out")
Exception: timed out
With this Server I access an existing Ceph-Cluster with the same hardware.
I checked the code part, there is just a thread start and a join (waiting
for finish a RadosThread).
Maybe this is a python error in combination with armhf architecture? Maybe
someone can help.
Thanks and greetings
Dominik
Hi,
We have a ceph 15.2.7 deployment using cephadm under podman w/ systemd.
We've run into what we believe is:
https://github.com/ceph/ceph-container/issues/1748https://tracker.ceph.com/issues/47875
In our case, eventually the mgr container stops emitting output/logging. We
are polling with external prometheus clusters, which is likely what
triggers the issue, as it appears some amount of time after the container
is spawned.
Unfortunately, setting limits in the systemd service file for the mgr
service on the host OS doesn't work, nor does modifying the unit.run file
which is used to start the container under podman to include the --ulimit
settings as suggested. Looking inside the container:
lib/systemd/system/ceph-mgr@.service:LimitNOFILE=1048576
This prevents us from deploying medium to large ceph clusters, so I would
argue it's a high priority bug that should not be closed, unless there is a
workaround that works until EPEL 8 contains the fixed version of cheroot
and the ceph containers include it.
My understanding is this was fixed in cheroot 8.4.0:
https://github.com/cherrypy/cheroot/issues/249https://github.com/cherrypy/cheroot/pull/301
Thank you in advance for any suggestions,
David
Dear Ceph contributors
While our (new) rgw secondary zone is doing the initial data sync from our master zone,
we noticed that the reported capacity usage was getting higher than on primary zone:
Master Zone:
ceph version 14.2.5
zone parameters:
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "false",
"sync_from": [],
"redirect_zone": ""
bucket stats:
=> "size_actual GiB": 269868.9823989868,
"size_utilized GiB": 17180102071.868008, <= (cf. other issue below)
"num_objects": 100218788,
"Compression Rate": 63660.899148714125
pool stats:
=> "stored": 403834132234240,
"objects": 191530416,
"kb_used": 692891724288,
"bytes_used": 709521125670912,
"percent_used": 0.7570806741714478,
"max_avail": 136595529269248
erasure-code-profile:
crush-device-class=nvme
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=8
m=2
plugin=jerasure
technique=reed_sol_van
w=8
osd parameters:
bluefs_alloc_size 1048576 default
bluefs_shared_alloc_size 65536 default
bluestore_extent_map_inline_shard_prealloc_size 256 default
bluestore_max_alloc_size 0 default
bluestore_min_alloc_size 0 default
bluestore_min_alloc_size_hdd 65536 default
bluestore_min_alloc_size_ssd 16384 default
rgw compression: no
Secondary Zone:
ceph version 14.2.12
zone parameters:
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 11,
"read_only": "true",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
bucket stats:
=> "size_actual GiB": 65282.37313461304,
"size_utilized GiB": 60779.72828538809,
"num_objects": 23074921,
"Compression Rate": 0.9310281683550253
pool stats:
=> "stored": 407816305115136,
"objects": 118637638,
"kb_used": 497822635396,
"bytes_used": 509770378645504,
"percent_used": 0.7275146245956421,
"max_avail": 152744706965504
erasure-code-profile:
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8
osd parameters:
EC k=8 m=2
bluefs_alloc_size 1048576 default
bluefs_shared_alloc_size 65536 default
bluestore_extent_map_inline_shard_prealloc_size 256 default
bluestore_max_alloc_size 0 default
bluestore_min_alloc_size 0 default
bluestore_min_alloc_size_hdd 65536 default
bluestore_min_alloc_size_ssd 4096 default
rgw compression: yes
As you see, the secondary zone is using 408 TB vs 404 TB on master zone.
Summing size_actual for each buckets gives only 65 TB vs 270 TB on master zone.
Any idea about what could cause such a difference?
Is it a known issue?
The are some known issue with Space Overhead with EC Pools for alloc_size > 4 KiB, cf. :
https://www.mail-archive.com/ceph-users@ceph.io/msg06191.htmlhttps://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/OHPO43J54TPBEUISYC…https://www.spinics.net/lists/ceph-users/msg59587.html
But our secondary zone is on nvme osd's with bluestore_min_alloc_size_ssd=4096, so that should be fine.
I will also investigate further with "radosgw-admin bucket radoslist" and rgw-orphan-list.
Thank you in advance for any help.
And Happy New Year very soon ;-)
Best Regards
Francois
PS:
The value on master zone "size_utilized GiB": 17180102071.868008 is wrong.
This is due to a bucket with wrong stats:
{
"bucket": "XXX",
"tenant": "",
"zonegroup": "d29ea82c-4f77-40af-952b-2ab0705ad268",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "cb1594b3-a782-49d0-a19f-68cd48870a63.3119460.1",
"marker": "cb1594b3-a782-49d0-a19f-68cd48870a63.63841.1481",
"index_type": "Normal",
"owner": "e13d054f6e9c4eea881f687923d7d380",
"ver": "0#373075,1#286613,2#290913,3#341069,4#360862,5#341416,6#279526,7#352172,8#255944,9#314797,10#317650,11#305698,12#289557,13#345344,14#345273,15#294708,16#241001,17#298577,18#274866,19#293952,20#307635,21#334606,22#265355,23#302567,24#277505,25#307278,26#266963,27#297452,28#332274,29#319133,30#361027,31#314294,32#282887,33#324849,34#278560,35#307506,36#287269,37#344789,38#345389,39#323814,40#386483,41#280319,42#358072,43#336651,44#339176,45#248079,46#356784,47#381496,48#295152,49#251661,50#318661,51#330530,52#263564,53#332005,54#332937,55#320163,56#300485,57#296138,58#343271,59#359351,60#295711,61#275751,62#332264,63#351532",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0",
"mtime": "2020-06-03 12:13:01.207610Z",
"max_marker": "0#,1#,2#,3#00000341068.137793799.5,4#00000360861.156871075.5,5#00000341415.97799619.5,6#00000279525.155275260.5,7#,8#,9#00000314796.95564320.5,10#,11#,12#00000289556.137784091.5,13#,14#,15#,16#00000241000.126242121.5,17#,18#00000274865.124884405.5,19#,20#00000307634.137793798.5,21#00000334605.93836734.5,22#,23#00000302566.125226103.5,24#,25#,26#,27#00000297451.125229375.5,28#,29#00000319132.155275278.5,30#00000361026.98341455.5,31#,32#,33#00000324848.126242117.5,34#00000278559.124884417.5,35#,36#,37#00000344788.125945123.5,38#00000345388.97796715.5,39#,40#00000386482.98341457.5,41#,42#00000358071.124884415.5,43#,44#00000339175.135084366.5,45#00000248078.155263912.5,46#00000356783.98341461.5,47#00000381495.94538350.5,48#00000295151.138701826.5,49#00000251660.137793803.5,50#00000318660.93848186.5,51#,52#,53#00000332004.126242119.5,54#00000332936.138701824.5,55#,56#00000300484.156871073.5,57#,58#00000343270.98341459.5,59#00000359350.94257302.5,60#,61#,62#,63#00000351531.135084368.5",
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 18446744073709551604
},
"rgw.main": {
"size": 22615692324233,
"size_actual": 22617004142592,
"size_utilized": 18446736471689615156,
"size_kb": 22085637036,
"size_kb_actual": 22086918108,
"size_kb_utilized": 18014391085634390,
"num_objects": 521927
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 977
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
This is another issue, probably not related to our sync issue.
--
EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich
tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheurer(a)everyware.ch
web: http://www.everyware.ch