Dear all,
I am experimenting with Ceph as a replacement for the AndrewFileSystem (https://en.wikipedia.org/wiki/Andrew_File_System). In my current setup, I am using AFS as a distributed filesystem for approximately 1000 users to store personal data and let them access their home directories and other shared data from multiple locations across different buildings. The authentication is managed by Kerberos (+ LDAP server). My goal is to replace AFS with CephFS but keep the current Kerberos database.
Right now I've managed to set up a testing Ceph cluster with 6 nodes and 11 osds and I can mount CephFS using the kernel driver + CephX.
However, from the Ceph docs, I can't understand if this might be a correct use-case for Ceph since the default authentication method CephX doesn't have a standard username/password authentication protocol. As far as I understand it requires the creation of a keyring with a random password generated on-the-fly which can then be used to mount the filesystem using the CephFS kernel module (https://docs.ceph.com/en/latest/cephfs/mount-using-kernel-driver/#mounting-…).
As for the Kerberos integration, I found in the docs this page https://docs.ceph.com/en/latest/dev/ceph_krb_auth/ which is still a draft even if the last update was almost 2 years ago. From this page, I don't understand if the current version of Ceph supports full integration with GSSAPI/kerberos/LDAP. Since the docs only refer to keytab files, I was wondering if Kerberos can only be used as an authentication protocol between Ceph monitors/osds/metadata-servers and not for mounting the filesystem.
Therefore I am asking
- if anyone has tried Ceph for a similar use-case
- what is the current status of Kerberos integration
- if there are alternatives to CephX for mounting CephFS using kernel drivers which uses a username/password protocol
Thank you and best regards,
Alessandro Piazza
Hello,
I am trying to debug slow operations in our cluster running Nautilus
14.2.13. I am analysing the output of "ceph daemon osd.N dump_historic_ops"
command. I am noticing that the
I am noticing that most of the time is spent between "header_read" and
"throttled" events. For example, below is an operation that took ~160
seconds to complete and almost all of that time was spent between these 2
events.
Going by the descriptions at
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/#…
-
header_read: When the messenger first started reading the message off
the wire.
-
throttled: When the messenger tried to acquire memory throttle space to
read the message into memory.
-
all_read: When the messenger finished reading the message off the wire.
Does this mean that the slowness I am observing is because OSD's messaging
layer is not able to acquire the memory required for the message fast
enough?
The system has lots of available memory (over 300G), so how do I tune OSD
to perform better at this?
Appreciate any feedback on this.
{
"description": "osd_op(client.405792.0:98299 3.313
3:c8c63189:::rbd_data.51b046b8b4567.0000000000000180:head [set-alloc-hint
object_size 4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e1073)",
"initiated_at": "2020-11-06 16:16:40.924448",
"age": 164.32155802899999,
"duration": 159.57800813,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.405792",
"client_addr": "v1:x.y.156.101:0/3840080733",
"tid": 98299
},
"events": [
{
"time": "2020-11-06 16:16:40.924448",
"event": "initiated"
},
{
"time": "2020-11-06 16:16:40.924448",
"event": "header_read"
},
{
"time": "2020-11-06 16:19:20.481593",
"event": "throttled"
},
{
"time": "2020-11-06 16:19:20.487331",
"event": "all_read"
},
{
"time": "2020-11-06 16:19:20.487333",
"event": "dispatched"
},
{
"time": "2020-11-06 16:19:20.487340",
"event": "queued_for_pg"
},
{
"time": "2020-11-06 16:19:20.487372",
"event": "reached_pg"
},
{
"time": "2020-11-06 16:19:20.487507",
"event": "started"
},
{
"time": "2020-11-06 16:19:20.487586",
"event": "waiting for subops from 1,94"
},
{
"time": "2020-11-06 16:19:20.491873",
"event": "op_commit"
},
{
"time": "2020-11-06 16:19:20.501164",
"event": "sub_op_commit_rec"
},
{
"time": "2020-11-06 16:19:20.502423",
"event": "sub_op_commit_rec"
},
{
"time": "2020-11-06 16:19:20.502438",
"event": "commit_sent"
},
{
"time": "2020-11-06 16:19:20.502456",
"event": "done"
}
]
}
}
Hello,
I am running nautilus cluster. Is there a way to force the cluster to use
msgr-v1 instead of msgr-v2?
I am debugging an issue and it seems like it could be related to the msgr
layer, so want to test it by using msgr-v1.
Thanks,
Shridhar
Hi Anthony
Thank you for your respons
I am looking at the"OSDs highest latency of write operations" panel of the
grafana dashboard found in the ceph source in
./monitoring/grafana/dashboards/osds-overview.json. It is a topk graph
that uses ceph_osd_op_w_latency_sum / ceph_osd_op_w_latency_count.
During normal operations we see sometime latency spikes of 4 seconds max
but during the bringing back of the rack we saw a consistent increase in
latency for a lot of osds into the 20 seconds range
The cluster has 1139 osds total of which we had 5 x 9 - 45 in maintenance
We did not throttle the backfilling proces because we succesfully did the
same maintenance before on a few occasions for other racks without
problems. I will throttle backfills next time we have the same sort of
maintenance in the next rack
Can you elaborate a bit more what happens exactly during the peering
process? I understand that the osds need to catch up. I also see that the
nr of scrubs increases a lot when osds are brought back online. Is that
part of the peering proces?
Thx, Marcel
> HDDs and concern for latency donât mix. That said, you donât specify
> what you mean by âlatencyâ. Does that mean average client write
> latency? median? P99? Something else?
>
> If you have a 15 node cluster and you took a third of it down for two
> hours then yeah youâll have a lot to catch up on when you come back.
> Bringing the nodes back one at a time can help, to spread out the peering.
> Did you throttle backfill/recovery tunables all the way down to 1? In a
> way that the restarted OSDs would use the throttled values as they boot?
>
>
>
>
>> On Nov 5, 2020, at 6:47 AM, Marcel Kuiper <ceph(a)mknet.nl> wrote:
>>
>> Hi
>>
>> We had a rack down for 2hours for maintenance. 5 storage nodes were
>> involved. We had noout en norebalance flags set before the start of the
>> maintenance
>>
>> When the systems were brought back online we noticed a lot of osds with
>> high latency (in 20 seconds range) . Mostly osds that are not on the
>> storage nodes that were down. It took about 20 minutes for things to
>> settle down.
>>
>> We're running nautilus 14.2.11. The storage nodes run bluestore and have
>> 9
>> x 8T HDD's and 3 x SSD for rocksdb. Each with 3 x 123G LV
>>
>> - Can anyone give a reason for these high latencies?
>> - Is there a way to avoid or lower these latencies when bringing systems
>> back into operation?
>>
>> Best Regards
>>
>> Marcel
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Hi,
Is there anybody tried to migrate data from Hadoop to Ceph?
If yes what is the right way?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hello List,
i think 3 of 6 Nodes have to less memory. This triggers the effect,
that the nodes will swap a lot and almost kill themselfes. That
triggers OSDs to go down, which triggers a rebalance which does not
really help :D
I already ordered more ram. Can i turn temporary down the RAM usage of
the OSDs to not get into that vicious cycle and just suffer small but
stable performance?
This is ceph version 15.2.5 with bluestore.
Thanks,
Michael
Hi,
I've got a problem on Octopus (15.2.3, debian packages) install, bucket
S3 index shows a file:
s3cmd ls s3://upvid/255/38355 --recursive
2020-07-27 17:48 50584342
s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4
radosgw-admin bi list also shows it
{
"type": "plain",
"idx":
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
"entry": { "name":
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
"instance": "", "ver": {
"pool": 11,
"epoch": 853842
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 50584342,
"mtime": "2020-07-27T17:48:27.203008Z",
"etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7",
"storage_class": "",
"owner": "filmweb-app",
"owner_display_name": "filmweb app user",
"content_type": "",
"accounted_size": 50584342,
"user_data": "",
"appendable": "false"
},
"tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
},
but trying to download it via curl (I've set permissions to public0 only gets me
<?xml version="1.0"
encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>upvid</BucketName><RequestId>tx0000000000000000e716d-005f1f14cb-e478a-pl-war1</RequestId><HostId>e478a-pl-war1-pl</HostId></Error>
(the actually nonexisting files shows access denied in same context)
same with other tools:
$ s3cmd get s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 /tmp
download: 's3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' -> '/tmp/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' [1 of 1]
ERROR: S3 error: 404 (NoSuchKey)
cluster health is OK
Any ideas what is happening here ?
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
NOC: [+48] 22 380 10 20
E: admin(a)efigence.com
Hello,
I am running 14.2.13-1xenial version and I am seeing lot of logs from msgv2
layer on the OSDs. Attached are some of the logs. It looks like these logs
are not controlled by the standard log level configuration, so I couldn't
find a way to disable these logs.
I am concerned that these logs may be hurting the system performance.
Any inputs on how I can disable the level of these logs?
Regards,
Shridhar
Hi,
we upgraded our ceph cluster from 14.2.9 to 15.2.5 but osds with 15.2.5
are not joining the cluster after restart. They hang with "1234 tick
checking mon for new map"
The Systems are Centos 7.8
I tried everything i could think of. But nothing helped. The mons and
mgrs are 15.2.5.
Anyone an Idea want could cause this?
2020-11-05T15:18:13.142+0100 7f02fce81700 1 osd.114 pg_epoch: 1234
pg[5.c9s0( v 839'39171 (839'37583,839'39171] local-lis/les=1129/1130
n=495 ec=813/813 lis/c=1129/1081 les/c/f=1130/1082/0 sis=1210)
[NONE,NONE,NONE,NONE,124,NONE,157,NONE,NONE,149,NONE]p124(4) r=-1
lpr=1212 pi=[1081,1210)/1 crt=839'39171 lcod 0'0 mlcod 0'0 unknown
mbc={} ps=[1~3]] state<Start>: transitioning to Stray
2020-11-05T15:18:13.143+0100 7f02fce81700 1 osd.114 pg_epoch: 1234
pg[1.736( v 1105'9685 (815'8100,1105'9685] local-lis/les=1131/1132 n=6
ec=719/719 lis/c=1131/1019 les/c/f=1132/1020/0 sis=1210
pruub=11.135073235s) [] r=-1 lpr=1212 pi=[1019,1210)/1 crt=1105'9685
lcod 0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Stray
2020-11-05T15:18:13.143+0100 7f02fce81700 1 osd.114 pg_epoch: 1234
pg[4.3fd( v 1109'105689 (1109'102600,1109'105689]
local-lis/les=1131/1132 n=32 ec=740/740 lis/c=1131/1089
les/c/f=1132/1090/0 sis=1210 pruub=11.134586488s) [] r=-1 lpr=1212
pi=[1089,1210)/1 crt=1109'105689 lcod 0'0 mlcod 0'0 unknown mbc={}]
state<Start>: transitioning to Stray
2020-11-05T15:18:13.144+0100 7f02fce81700 1 osd.114 pg_epoch: 1234
pg[5.123s10( v 839'38976 (839'37410,839'38976] local-lis/les=1119/1120
n=498 ec=813/813 lis/c=1119/1064 les/c/f=1120/1065/0 sis=1210)
[NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE,NONE]p? r=-1 lpr=1212
pi=[1064,1210)/1 crt=839'38976 lcod 0'0 mlcod 0'0 unknown mbc={}
ps=[1~3]] state<Start>: transitioning to Stray
2020-11-05T15:18:13.145+0100 7f03140b5700 1 osd.114 1234
set_numa_affinity public network bond0 numa node 2
2020-11-05T15:18:13.145+0100 7f03140b5700 1 osd.114 1234
set_numa_affinity cluster network bond1 numa node 0
2020-11-05T15:18:13.145+0100 7f03140b5700 1 osd.114 1234
set_numa_affinity public and cluster network numa nodes do not match
2020-11-05T15:18:13.145+0100 7f03140b5700 1 osd.114 1234
set_numa_affinity not setting numa affinity
2020-11-05T15:18:14.072+0100 7f031713a700 1 osd.114 1234 tick checking
mon for new map
2020-11-05T15:18:44.397+0100 7f031713a700 1 osd.114 1234 tick checking
mon for new map
2020-11-05T15:19:15.370+0100 7f031713a700 1 osd.114 1234 tick checking
mon for new map
2020-11-05T15:19:46.206+0100 7f031713a700 1 osd.114 1234 tick checking
mon for new map
2020-11-05T15:20:16.400+0100 7f031713a700 1 osd.114 1234 tick checking
mon for new map
2020-11-05T15:20:46.483+0100 7f031713a700 1 osd.114 1234 tick checking
mon for new map
Regards
--
Ingo Ebel
Human knowledge belongs to the world.
RadioTux.de - Internet-Radio rund um Linux und Open Source
## https://twitter.com/ingoebel
## https://keybase.io/savar
## Jabber: ingo.ebel(a)ingoebel.de
Hi
We had a rack down for 2hours for maintenance. 5 storage nodes were
involved. We had noout en norebalance flags set before the start of the
maintenance
When the systems were brought back online we noticed a lot of osds with
high latency (in 20 seconds range) . Mostly osds that are not on the
storage nodes that were down. It took about 20 minutes for things to
settle down.
We're running nautilus 14.2.11. The storage nodes run bluestore and have 9
x 8T HDD's and 3 x SSD for rocksdb. Each with 3 x 123G LV
- Can anyone give a reason for these high latencies?
- Is there a way to avoid or lower these latencies when bringing systems
back into operation?
Best Regards
Marcel