June 2021 - ceph-users - lists.ceph.io

by Andrej Filipcic

Hi, on a large cluster with ~1600 OSDs, 60 servers and using 16+3 erasure coded pools, the recovery after OSD failure (HDD) is quite slow. Typical values are at 4GB/s with 125 ops/s and 32MB object sizes, which then takes 6-8 hours, during that time the pgs are degraded. I tried to speed it up with osd advanced osd_max_backfills 32 osd advanced osd_recovery_max_active 10 osd advanced osd_recovery_op_priority 63 osd advanced osd_recovery_sleep_hdd 0.000000 which at least kept the iops/s at a constant level. The recovery does not seem to be cpu or memory bound. Is there any way to speed it up? While testing the recovery on replicated pools, it reached 50GB/s. In contrast, replacing the failed drive with a new one and re-adding the OSD is quite fast, with 1GB/s recovery rate of misplaced pgs, or ~120MB/s average HDD write speed, which is not very far from HDD throughput. Regards, Andrej -- _____________________________________________________________ prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674 Fax: +386-1-425-7074 -------------------------------------------------------------

2 years, 10 months

2
1
0 0

Can not mount rbd device anymore

by Ml Ml

Hello List, oversudden i can not mount a specific rbd device anymore: root@proxmox-backup:~# rbd map backup-proxmox/cluster5 -k /etc/ceph/ceph.client.admin.keyring /dev/rbd0 root@proxmox-backup:~# mount /dev/rbd0 /mnt/backup-cluster5/ (just never times out) Any idea how to debug that mount? Tcpdump does show some active traffic. Cheers, Michael

2 years, 10 months

4
9
0 0

Missing objects in pg

by Vadim Bulst

Dear List, since my update yesterday from 14.2.18 to 14.2.20 i got an unhealthy cluster. As I remember right, it appeared after rebooting the second server. They are 7 missing objects from pgs of a cache pool (pool 3). This pool is now changed writeback to proxy and i'm not able to flush all objects. root@scvirt06:/home/urzadmin/ceph_issue# ceph -s cluster: id: 5349724e-fa96-4fd6-8e44-8da2a39253f7 health: HEALTH_ERR 7/15893342 objects unfound (0.000%) Possible data damage: 7 pgs recovery_unfound Degraded data redundancy: 21/47680026 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized client is using insecure global_id reclaim mons are allowing insecure global_id reclaim services: mon: 3 daemons, quorum scvirt03,scvirt06,scvirt01 (age 19h) mgr: scvirt04(active, since 21m), standbys: scvirt03, scvirt02 mds: scfs:1 {0=scvirt04=up:active} 1 up:standby-replay 1 up:standby osd: 54 osds: 54 up (since 17m), 54 in (since 10w); 7 remapped pgs task status: scrub status: mds.scvirt03: idle data: pools: 5 pools, 704 pgs objects: 15.89M objects, 49 TiB usage: 139 TiB used, 145 TiB / 285 TiB avail pgs: 21/47680026 objects degraded (0.000%) 7/15893342 objects unfound (0.000%) 694 active+clean 7 active+recovery_unfound+undersized+degraded+remapped 3 active+clean+scrubbing+deep io: client: 3.7 MiB/s rd, 6.6 MiB/s wr, 40 op/s rd, 31 op/s wr my cluster: scvirt01 - mon,osds scvirt02 - mgr,osds scvirt03 - mon,mgr,mds,osds scvirt04 - mgr,mds,osds scvirt05 - osds scvirt06 - mon,mds,osds log of osd.49: root@scvirt03:/home/urzadmin# tail -f /var/log/ceph/ceph-osd.49.log AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.64 GB write, 0.01 MB/s write, 0.54 GB read, 0.01 MB/s read, 6.5 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count ** File Read Latency Histogram By Level [default] ** 2021-06-24 08:53:08.865 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 3.9 has 1 objects unfound and apparently lost 2021-06-24 08:53:08.865 7f88a505f700 -1 log_channel(cluster) log [ERR] : 3.1e has 1 objects unfound and apparently lost 2021-06-24 08:53:40.570 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 3.9 has 1 objects unfound and apparently lost 2021-06-24 08:53:40.570 7f88a9067700 -1 log_channel(cluster) log [ERR] : 3.1e has 1 objects unfound and apparently lost 2021-06-24 08:54:45.042 7f88b487e700 4 rocksdb: [db/db_impl.cc:777] ------- DUMPING STATS ------- 2021-06-24 08:54:45.042 7f88b487e700 4 rocksdb: [db/db_impl.cc:778] ** DB Stats ** Uptime(secs): 85202.3 total, 600.0 interval Cumulative writes: 1148K writes, 8640K keys, 1148K commit groups, 1.0 writes per commit group, ingest: 1.24 GB, 0.01 MB/s Cumulative WAL: 1148K writes, 546K syncs, 2.10 writes per sync, written: 1.24 GB, 0.01 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 369 writes, 1758 keys, 369 commit groups, 1.0 writes per commit group, ingest: 0.41 MB, 0.00 MB/s Interval WAL: 369 writes, 155 syncs, 2.37 writes per sync, written: 0.00 MB, 0.00 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 3/0 104.40 MB 0.8 0.0 0.0 0.0 0.2 0.2 0.0 1.0 0.0 67.8 2.89 2.70 6 0.482 0 0 L1 2/0 131.98 MB 0.5 0.2 0.1 0.1 0.2 0.1 0.0 1.8 149.9 120.9 1.53 1.41 1 1.527 2293K 140K L2 16/0 871.57 MB 0.3 0.3 0.1 0.3 0.3 -0.0 0.0 5.2 158.1 132.3 2.05 1.93 1 2.052 3997K 1089K Sum 21/0 1.08 GB 0.0 0.5 0.2 0.4 0.6 0.2 0.0 3.3 85.5 100.8 6.47 6.03 8 0.809 6290K 1229K Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 If I run ceph pg repair 3.1e it doesn't change anything and i do not understand why these pgs are undersized. All OSDs are up. ceph.conf: [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 10.10.144.0/24 filestore_xattr_use_omap = true fsid = 5349724e-fa96-4fd6-8e44-8da2a39253f7 mon_allow_pool_delete = true mon_cluster_log_file_level = info mon_host = 172.26.8.151,172.26.8.153,172.26.8.156 osd_journal_size = 5120 osd_pool_default_min_size = 1 public_network = 172.26.8.128/26 [client] keyring = /etc/pve/priv/$cluster.$name.keyring [mds] keyring = /var/lib/ceph/mds/ceph-$id/keyring [mds.scvirt03] host = scvirt03 mds_standby_for_rank = 0 mds_standby_replay = true [mds.scvirt04] host = scvirt04 mds standby for name = pve [mds.scvirt06] host = scvirt06 mds_standby_for_rank = 0 mds_standby_replay = true [mon.scvirt01] public_addr = 172.26.8.151 [mon.scvirt03] public_addr = 172.26.8.153 [mon.scvirt06] public_addr = 172.26.8.156 ceph health detail: HEALTH_ERR 7/15893333 objects unfound (0.000%); Possible data damage: 7 pgs recovery_unfound; Degraded data redundancy: 21/47679999 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized; client is using insecure global_id reclaim; mons are allowing insecure global_id reclaim OBJECT_UNFOUND 7/15893333 objects unfound (0.000%) pg 3.1e has 1 unfound objects pg 3.1f has 1 unfound objects pg 3.1b has 1 unfound objects pg 3.15 has 1 unfound objects pg 3.16 has 1 unfound objects pg 3.b has 1 unfound objects pg 3.9 has 1 unfound objects PG_DAMAGED Possible data damage: 7 pgs recovery_unfound pg 3.9 is active+recovery_unfound+undersized+degraded+remapped, acting [49,52], 1 unfound pg 3.b is active+recovery_unfound+undersized+degraded+remapped, acting [43,52], 1 unfound pg 3.15 is active+recovery_unfound+undersized+degraded+remapped, acting [44,52], 1 unfound pg 3.16 is active+recovery_unfound+undersized+degraded+remapped, acting [43,51], 1 unfound pg 3.1b is active+recovery_unfound+undersized+degraded+remapped, acting [43,52], 1 unfound pg 3.1e is active+recovery_unfound+undersized+degraded+remapped, acting [49,51], 1 unfound pg 3.1f is active+recovery_unfound+undersized+degraded+remapped, acting [48,51], 1 unfound PG_DEGRADED Degraded data redundancy: 21/47679999 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized pg 3.9 is stuck undersized for 64516.343966, current state active+recovery_unfound+undersized+degraded+remapped, last acting [49,52] pg 3.b is stuck undersized for 64516.351507, current state active+recovery_unfound+undersized+degraded+remapped, last acting [43,52] pg 3.15 is stuck undersized for 64521.368841, current state active+recovery_unfound+undersized+degraded+remapped, last acting [44,52] pg 3.16 is stuck undersized for 64516.351599, current state active+recovery_unfound+undersized+degraded+remapped, last acting [43,51] pg 3.1b is stuck undersized for 64517.427120, current state active+recovery_unfound+undersized+degraded+remapped, last acting [43,52] pg 3.1e is stuck undersized for 64521.369635, current state active+recovery_unfound+undersized+degraded+remapped, last acting [49,51] pg 3.1f is stuck undersized for 64517.426392, current state active+recovery_unfound+undersized+degraded+remapped, last acting [48,51] AUTH_INSECURE_GLOBAL_ID_RECLAIM client is using insecure global_id reclaim client.admin at 172.26.8.154:0/3925203408 is using insecure global_id reclaim mds.scvirt04 at [v2:172.26.8.154:6836/3778505565,v1:172.26.8.154:6837/3778505565] is using insecure global_id reclaim AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED mons are allowing insecure global_id reclaim mon.scvirt03 has auth_allow_insecure_global_id_reclaim set to true mon.scvirt06 has auth_allow_insecure_global_id_reclaim set to true mon.scvirt01 has auth_allow_insecure_global_id_reclaim set to true ceph osd tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 284.51312 root default -2 48.75215 host scvirt01 0 hdd 9.09560 osd.0 up 1.00000 1.00000 3 hdd 9.09560 osd.3 up 1.00000 1.00000 6 hdd 9.09560 osd.6 up 1.00000 1.00000 9 hdd 9.09560 osd.9 up 1.00000 1.00000 12 hdd 9.09560 osd.12 up 1.00000 1.00000 42 nvme 0.97029 osd.42 up 1.00000 1.00000 43 nvme 0.97029 osd.43 up 1.00000 1.00000 44 nvme 0.97029 osd.44 up 1.00000 1.00000 37 ssd 0.36330 osd.37 up 1.00000 1.00000 -3 48.75215 host scvirt02 1 hdd 9.09560 osd.1 up 1.00000 1.00000 4 hdd 9.09560 osd.4 up 1.00000 1.00000 7 hdd 9.09560 osd.7 up 1.00000 1.00000 10 hdd 9.09560 osd.10 up 1.00000 1.00000 13 hdd 9.09560 osd.13 up 1.00000 1.00000 45 nvme 0.97029 osd.45 up 1.00000 1.00000 46 nvme 0.97029 osd.46 up 1.00000 1.00000 47 nvme 0.97029 osd.47 up 1.00000 1.00000 38 ssd 0.36330 osd.38 up 1.00000 1.00000 -4 48.75224 host scvirt03 2 hdd 9.09569 osd.2 up 1.00000 1.00000 5 hdd 9.09560 osd.5 up 1.00000 1.00000 8 hdd 9.09560 osd.8 up 1.00000 1.00000 11 hdd 9.09560 osd.11 up 1.00000 1.00000 14 hdd 9.09560 osd.14 up 1.00000 1.00000 48 nvme 0.97029 osd.48 up 1.00000 1.00000 49 nvme 0.97029 osd.49 up 1.00000 1.00000 50 nvme 0.97029 osd.50 up 1.00000 1.00000 39 ssd 0.36330 osd.39 up 1.00000 1.00000 -9 56.75706 host scvirt04 15 hdd 9.09560 osd.15 up 1.00000 1.00000 17 hdd 9.09560 osd.17 up 1.00000 1.00000 20 hdd 9.09560 osd.20 up 1.00000 1.00000 22 hdd 9.09560 osd.22 up 1.00000 1.00000 23 hdd 9.09560 osd.23 up 1.00000 1.00000 25 hdd 3.63860 osd.25 up 1.00000 1.00000 26 hdd 3.63860 osd.26 up 1.00000 1.00000 27 hdd 3.63860 osd.27 up 1.00000 1.00000 40 ssd 0.36330 osd.40 up 1.00000 1.00000 -11 56.75706 host scvirt05 16 hdd 9.09560 osd.16 up 1.00000 1.00000 18 hdd 9.09560 osd.18 up 1.00000 1.00000 19 hdd 9.09560 osd.19 up 1.00000 1.00000 21 hdd 9.09560 osd.21 up 1.00000 1.00000 24 hdd 9.09560 osd.24 up 1.00000 1.00000 28 hdd 3.63860 osd.28 up 1.00000 1.00000 29 hdd 3.63860 osd.29 up 1.00000 1.00000 30 hdd 3.63860 osd.30 up 1.00000 1.00000 41 ssd 0.36330 osd.41 up 1.00000 1.00000 -13 24.74245 host scvirt06 31 hdd 3.63860 osd.31 up 1.00000 1.00000 32 hdd 3.63860 osd.32 up 1.00000 1.00000 33 hdd 3.63860 osd.33 up 1.00000 1.00000 34 hdd 3.63860 osd.34 up 1.00000 1.00000 35 hdd 3.63860 osd.35 up 1.00000 1.00000 36 hdd 3.63860 osd.36 up 1.00000 1.00000 51 nvme 0.97029 osd.51 up 1.00000 1.00000 52 nvme 0.97029 osd.52 up 1.00000 1.00000 53 nvme 0.97029 osd.53 up 1.00000 1.00000 Regards, Vadim -- Vadim Bulst Universität Leipzig / URZ 04109 Leipzig, Augustusplatz 10 phone: +49-341-97-33380 mail: vadim.bulst(a)uni-leipzig.de

2 years, 10 months

1
1
0 0

ceph fs mv does copy, not move

by Frank Schilder

Dear all, some time ago I reported that the kernel client resorts to a copy instead of move when moving a file across quota domains. I was told that the fuse client does not have this problem. If enough space is available, a move should be a move, not a copy. Today, I tried to move a large file across quota domains testing botn, the kernel- and the fuse client. Both still resort to a copy even though this issue was addressed quite a while ago (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/44AEIHNEGKV…). The versions I'm using are (CentOS 7) # yum list installed | grep ceph-fuse ceph-fuse.x86_64 2:13.2.10-0.el7 @ceph # uname -r 3.10.0-1160.31.1.el7.x86_64 Any suggestions how to get this to work? I have to move directories containing 100+ TB. Many thanks, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

2 years, 10 months

4
10
0 0

native linux distribution host running ceph container ?

by marc boisis

Hi, We have a containerised ceph cluster in version 16.2.4 (15 hosts, 180 osds) deployed with ceph-ansible. Our host run on centos 7 (kernel 3.10) with ceph-deamon docker image based on centos 8. I cannot find in the documentation which native distribution is recommended, should it be the same as docker image (centos 8) ? About centos8 and the end of support announced for the end of the year, which distribution will ceph use in docker image ? Thanks

2 years, 10 months

1
0
0 0

RGW topic created in wrong (default) tenant

by Daniel Iwan

Hi I'm using Ceph Pacific 16.2.1 I'm creating a topic as a user which belongs to a non-default tenant. I'm using AWS CLI 2 with v3 authentication enabled aws --profile=ceph-myprofile --endpoint=$HOST_S3_API --region="" sns create-topic --name=fishtopic --attributes='{"push-endpoint": " http://my-ceph-source-svc.default.svc.cluster.local"}' { "TopicArn": "arn:aws:sns:default::fishtopic" } topic is created in default tenant though. User can list topics but see topics from the default tenant. aws --profile=ceph-myprofile --endpoint=$HOST_S3_API --region="" sns list-topics { "Topics": [ { "TopicArn": "arn:aws:sns:default::fishtopic" } ] } Topic is in default tenant # radosgw-admin topic list --uid none { "topics": [ { "topic": { "user": "", "name": "fishtopic", "dest": { "bucket_name": "", "oid_prefix": "", "push_endpoint": " http://my-ceph-source-svc.default.svc.cluster.local", "push_endpoint_args": "Attributes.entry.1.key=push-endpoint&Attributes.entry.1.value= http://my-ceph-source-svc.default.svc.cluster.local &Version=2010-03-31&push-endpoint= http://my-ceph-source-svc.default.svc.cluster.local", "push_endpoint_topic": "fishtopic", "stored_secret": "false", "persistent": "false" }, "arn": "arn:aws:sns:default::fishtopic", "opaqueData": "" }, "subs": [] } ] } When I create a topic over HTTP with a federated user, the topic is created in the correct (user's) tenant. For some reason the "user" below is "marvel", which is actually the name of the tenant. Possibly the topic is not owned by the user but rather the tenant. radosgw-admin topic list --tenant marvel --uid none { "topics": [ { "topic": { "user": "marvel", "name": "MyTopic", "dest": { "bucket_name": "", "oid_prefix": "", "push_endpoint": "amqp://127.0.0.1", "push_endpoint_args": "amqp-exchange=rgw-exchange&push-endpoint=amqp://127.0.0.1 &use-ssl=false&verify-ssl=false", "push_endpoint_topic": "MyTopic", "stored_secret": "false", "persistent": "false" }, "arn": "arn:aws:sns:default:marvel:MyTopic", "opaqueData": "" }, "subs": [] } ] } Also, what permissions are checked when creating a topic? It seems so far I can create a topic without granting any special permissions? Regards Daniel

2 years, 10 months

2
4
0 0

PG inconsistent+failed_repair

by Vladimir Prokofev

Hello. Today we've experienced a complete CEPH cluster outage - total loss of power in the whole infrastructure. 6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10 This resulted in unfound objects, which were "reverted" in a hurry with ceph pg <pg_id> mark_unfound_lost revert In retrospect that was probably a mistake as the "have" part stated 0'0. But then deep-scrubs started and they found inconsistent PGs. We tried repairing them, but they just switched to failed_repair. Here's a log example: 2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6 3:3163e703:::rbd_data.be08c566ef438d.0000000000002445:head : missing 2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c 3:3163e2ee:::rbd_data.efa86358d15f4a.000000000000004b:6ab1 : is an unexpected clone 2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0 inconsistent objects 2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed I tried manually deleting conflicting objects from secondary osds with ceph-objectstore-tool like this ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c rbd_data.efa86358d15f4a.000000000000004b:6ab1 remove it removes it but without any positive impact. Pretty sure I don't understand the concept. So currently I have the following thoughts: - is there any doc on the object placement specifics and what all of those numbers in their name mean? I've seen objects with similar prefix/mid but different suffix and I have no idea what does it mean; - I'm actually not sure what the production impact is at that point because everything seems to work so far. So I'm thinking if it's possible to kill replicas on secondary OSDd with ceph-objectstore-tool and just let CEPH create a replica from primary PG? I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid that further deep scrubs will reveal more errors. Any thoughts appreciated.

2 years, 10 months

1
1
0 0

iscsi, gwcli, and vmware version

by Philip Brown

I notice on https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/ that it lists a requirement of "VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6." Could anyone enlighten me as to why this specific limit is in place? Officlaly knowing something like, "you have to use v6.5 or later, because X happens", would be very helpful to me when doing a writeup for potential deployment plans. -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

2 years, 10 months

2
2
0 0

query about product use of rbd mirror for DR

by huxiaoyu＠horebdata.cn

Dear Ceph Folks， Does anyone has real experience of using rbd mirroring for disaster recovery over 1000 miles away? I am planning using Ceph rbd mirroring feature for DR, and has no real experience. Could anyone sharing good or bad experience here? I am thinking of using iSCSI over rbd-nbd map, with rbd mirror to a remote site using a dedicated link of 200Mb/s. Ceph version will be on Luminous 12.2.13 Any sharing, suggestions and comments are highly appreciated. best regards, samuel huxiaoyu(a)horebdata.cn

2 years, 10 months

1
0
0 0

Ceph Month June Schedule Now Available

by Mike Perez

Hi everyone, The Ceph Month June schedule is now available: https://pad.ceph.com/p/ceph-month-june-2021 We have great sessions from component updates, performance best practices, Ceph on different architectures, BoF sessions to get more involved with working groups in the community, and more! You may also leave open discussion topics for the listed talks that we'll get to each Q/A portion. I will provide the video stream link on this thread and etherpad once it's available. You can also add the Ceph community calendar, which will have the Ceph Month sessions prefixed with "Ceph Month" to get local timezone conversions. https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%4… Thank you to our speakers for taking the time to share with us all the latest best practices and usage with Ceph! -- Mike Perez

2 years, 10 months

2
13
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2021