Hi,
on a large cluster with ~1600 OSDs, 60 servers and using 16+3 erasure
coded pools, the recovery after OSD failure (HDD) is quite slow. Typical
values are at 4GB/s with 125 ops/s and 32MB object sizes, which then
takes 6-8 hours, during that time the pgs are degraded. I tried to speed
it up with
osd advanced osd_max_backfills 32
osd advanced osd_recovery_max_active 10
osd advanced osd_recovery_op_priority 63
osd advanced osd_recovery_sleep_hdd 0.000000
which at least kept the iops/s at a constant level. The recovery does
not seem to be cpu or memory bound. Is there any way to speed it up?
While testing the recovery on replicated pools, it reached 50GB/s.
In contrast, replacing the failed drive with a new one and re-adding the
OSD is quite fast, with 1GB/s recovery rate of misplaced pgs, or
~120MB/s average HDD write speed, which is not very far from HDD throughput.
Regards,
Andrej
--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-425-7074
-------------------------------------------------------------
Hello List,
oversudden i can not mount a specific rbd device anymore:
root@proxmox-backup:~# rbd map backup-proxmox/cluster5 -k
/etc/ceph/ceph.client.admin.keyring
/dev/rbd0
root@proxmox-backup:~# mount /dev/rbd0 /mnt/backup-cluster5/
(just never times out)
Any idea how to debug that mount? Tcpdump does show some active traffic.
Cheers,
Michael
Dear all,
some time ago I reported that the kernel client resorts to a copy instead of move when moving a file across quota domains. I was told that the fuse client does not have this problem. If enough space is available, a move should be a move, not a copy.
Today, I tried to move a large file across quota domains testing botn, the kernel- and the fuse client. Both still resort to a copy even though this issue was addressed quite a while ago (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/44AEIHNEGKV…). The versions I'm using are (CentOS 7)
# yum list installed | grep ceph-fuse
ceph-fuse.x86_64 2:13.2.10-0.el7 @ceph
# uname -r
3.10.0-1160.31.1.el7.x86_64
Any suggestions how to get this to work? I have to move directories containing 100+ TB.
Many thanks,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
We have a containerised ceph cluster in version 16.2.4 (15 hosts, 180 osds) deployed with ceph-ansible.
Our host run on centos 7 (kernel 3.10) with ceph-deamon docker image based on centos 8.
I cannot find in the documentation which native distribution is recommended, should it be the same as docker image (centos 8) ?
About centos8 and the end of support announced for the end of the year, which distribution will ceph use in docker image ?
Thanks
Hi
I'm using Ceph Pacific 16.2.1
I'm creating a topic as a user which belongs to a non-default tenant.
I'm using AWS CLI 2 with v3 authentication enabled
aws --profile=ceph-myprofile --endpoint=$HOST_S3_API --region="" sns
create-topic --name=fishtopic --attributes='{"push-endpoint": "
http://my-ceph-source-svc.default.svc.cluster.local"}'
{
"TopicArn": "arn:aws:sns:default::fishtopic"
}
topic is created in default tenant though.
User can list topics but see topics from the default tenant.
aws --profile=ceph-myprofile --endpoint=$HOST_S3_API --region="" sns
list-topics
{
"Topics": [
{
"TopicArn": "arn:aws:sns:default::fishtopic"
}
]
}
Topic is in default tenant
# radosgw-admin topic list --uid none
{
"topics": [
{
"topic": {
"user": "",
"name": "fishtopic",
"dest": {
"bucket_name": "",
"oid_prefix": "",
"push_endpoint": "
http://my-ceph-source-svc.default.svc.cluster.local",
"push_endpoint_args":
"Attributes.entry.1.key=push-endpoint&Attributes.entry.1.value=
http://my-ceph-source-svc.default.svc.cluster.local
&Version=2010-03-31&push-endpoint=
http://my-ceph-source-svc.default.svc.cluster.local",
"push_endpoint_topic": "fishtopic",
"stored_secret": "false",
"persistent": "false"
},
"arn": "arn:aws:sns:default::fishtopic",
"opaqueData": ""
},
"subs": []
}
]
}
When I create a topic over HTTP with a federated user, the topic is created
in the correct (user's) tenant.
For some reason the "user" below is "marvel", which is actually the name of
the tenant.
Possibly the topic is not owned by the user but rather the tenant.
radosgw-admin topic list --tenant marvel --uid none
{
"topics": [
{
"topic": {
"user": "marvel",
"name": "MyTopic",
"dest": {
"bucket_name": "",
"oid_prefix": "",
"push_endpoint": "amqp://127.0.0.1",
"push_endpoint_args":
"amqp-exchange=rgw-exchange&push-endpoint=amqp://127.0.0.1
&use-ssl=false&verify-ssl=false",
"push_endpoint_topic": "MyTopic",
"stored_secret": "false",
"persistent": "false"
},
"arn": "arn:aws:sns:default:marvel:MyTopic",
"opaqueData": ""
},
"subs": []
}
]
}
Also, what permissions are checked when creating a topic?
It seems so far I can create a topic without granting any special
permissions?
Regards
Daniel
Hello.
Today we've experienced a complete CEPH cluster outage - total loss of
power in the whole infrastructure.
6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10
This resulted in unfound objects, which were "reverted" in a hurry with
ceph pg <pg_id> mark_unfound_lost revert
In retrospect that was probably a mistake as the "have" part stated 0'0.
But then deep-scrubs started and they found inconsistent PGs. We tried
repairing them, but they just switched to failed_repair.
Here's a log example:
2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6
3:3163e703:::rbd_data.be08c566ef438d.0000000000002445:head : missing
2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c
3:3163e2ee:::rbd_data.efa86358d15f4a.000000000000004b:6ab1 : is an
unexpected clone
2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0 inconsistent
objects
2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed
I tried manually deleting conflicting objects from secondary osds
with ceph-objectstore-tool like this
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c
rbd_data.efa86358d15f4a.000000000000004b:6ab1 remove
it removes it but without any positive impact. Pretty sure I don't
understand the concept.
So currently I have the following thoughts:
- is there any doc on the object placement specifics and what all of those
numbers in their name mean? I've seen objects with similar prefix/mid but
different suffix and I have no idea what does it mean;
- I'm actually not sure what the production impact is at that point
because everything seems to work so far. So I'm thinking if it's possible
to kill replicas on secondary OSDd with ceph-objectstore-tool and just let
CEPH create a replica from primary PG?
I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid
that further deep scrubs will reveal more errors.
Any thoughts appreciated.
I notice on
https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/
that it lists a requirement of
"VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6."
Could anyone enlighten me as to why this specific limit is in place?
Officlaly knowing something like, "you have to use v6.5 or later, because X happens", would be very helpful to me when doing a writeup for potential deployment plans.
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Dear Ceph Folks,
Does anyone has real experience of using rbd mirroring for disaster recovery over 1000 miles away?
I am planning using Ceph rbd mirroring feature for DR, and has no real experience. Could anyone sharing good or bad experience here? I am thinking of using iSCSI over rbd-nbd map, with rbd mirror to a remote site using a dedicated link of 200Mb/s.
Ceph version will be on Luminous 12.2.13
Any sharing, suggestions and comments are highly appreciated.
best regards,
samuel
huxiaoyu(a)horebdata.cn
Hi everyone,
The Ceph Month June schedule is now available:
https://pad.ceph.com/p/ceph-month-june-2021
We have great sessions from component updates, performance best
practices, Ceph on different architectures, BoF sessions to get more
involved with working groups in the community, and more! You may also
leave open discussion topics for the listed talks that we'll get to
each Q/A portion.
I will provide the video stream link on this thread and etherpad once
it's available. You can also add the Ceph community calendar, which
will have the Ceph Month sessions prefixed with "Ceph Month" to get
local timezone conversions.
https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%4…
Thank you to our speakers for taking the time to share with us all the
latest best practices and usage with Ceph!
--
Mike Perez