July 2020 - ceph-users - lists.ceph.io

by Szabo, Istvan (Agoda)

Hi, I've configured the all.yml like this: grep -v "^#\|^$" ceph/ceph-ansible-playbooks/ceph-ansible-stable-4.0/group_vars/all.yml --- dummy: nautilus: 14 cluster: ceph mon_group_name: mons osd_group_name: osds mgr_group_name: mgrs configure_firewall: False centos_package_dependencies: - epel-release - libselinux-python ceph_origin: repository ceph_repository: community ceph_mirror: http://hk-repo-2001/repo/ceph/ ceph_stable_key: http://hk-repo-2001/repo/ceph/release.asc ceph_stable_release: nautilus ceph_stable_redhat_distro: el7 monitor_interface: bond0 ip_version: ipv4 public_network: 10.121.58.0/24 cluster_network: 192.168.58.0/24 osd_objectstore: bluestore dashboard_enabled: False I'd like to install from our repo server, but when it wants to install at this step it complains about can't reach the epel repository: TASK [ceph-common : install redhat ceph packages] ******************************************************************************************************** FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (2 retries left). FAILED - RETRYING: install redhat ceph packages (2 retries left). FAILED - RETRYING: install redhat ceph packages (2 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). fatal: [hk-ceph-2c09]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"} fatal: [hk-ceph-2c08]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"} fatal: [hk-ceph-2c10]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"} FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (2 retries left). FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (3 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). FAILED - RETRYING: install redhat ceph packages (2 retries left). fatal: [hk-ceph-2c07]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: failure: repodata/repomd.xml from epel: [Errno 256] No more mirrors to try.\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.x…: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed6: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed6: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed6: Network is unreachable\""} FAILED - RETRYING: install redhat ceph packages (2 retries left). FAILED - RETRYING: install redhat ceph packages (2 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). FAILED - RETRYING: install redhat ceph packages (1 retries left). I've tried to change on the hk-ceph-2c07 in the epel.repo file uncomment baseurl and comment out metalink, cleaned yum ... but nothing helped, I guess something is wrong with my internal link. When I try to install something from epel repo on the server, I can, but via ansible I can't. I'd like to deploy ceph with ansible user, the rights are correct. Here is the hosts file for the playbook: [all:vars] ansible_ssh_user=ansible ansible_become=true ansible_become_method=sudo ansible_become_user=root [mons] hk-cephm-2007 hk-cephm-2008 hk-cephm-2009 [mgrs] hk-cephm-2007 hk-cephm-2008 hk-cephm-2009 [osds] hk-ceph-2c07 hk-ceph-2c08 hk-ceph-2c09 hk-ceph-2c10 Why is the repo error or network unreachable? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 8 months

1
1
0 0

ceph mgr memory leak

by Frank Ritchie

Hi all, When running containerized Ceph (Nautilus) is anyone else seeing a constant memory leak in the ceph-mgr pod with constant ms_handle_reset errors in the logs for the backup mgr instance? --- 0 client.0 ms_handle_reset on v2:172.29.1.13:6848/1 0 client.0 ms_handle_reset on v2:172.29.1.13:6848/1 0 client.0 ms_handle_reset on v2:172.29.1.13:6848/1 --- I see a couple of related reports with no activity: https://tracker.ceph.com/issues/36471 https://tracker.ceph.com/issues/40260 and one related merge that doesn't seem to have corrected the issue: https://github.com/ceph/ceph/pull/24233 thx Frank

3 years, 8 months

2
2
0 0

[nautilus][mds] MDS fall into ReadOnly mode

by Frank Yu

Hi guys, I have a ceph cluster with three MDS servers, two of them in active status, while the left one is in standby-replay mode. Today I found the message '1 MDSs are read only' show up when check the cluster status with 'ceph -s', details as below: # ceph -s cluster: id: 3d43e9a5-50dc-4f84-9493-656bf4f06f8c health: HEALTH_WARN 5 clients failing to advance oldest client/flush tid 1 MDSs are read only 2 MDSs report slow requests 2 MDSs behind on trimming BlueFS spillover detected on 33 OSD(s) services: mon: 3 daemons, quorum bjcpu-001,bjcpu-002,bjcpu-003 (age 3M) mgr: bjcpu-001.xxxx.io(active, since 3M), standbys: bjcpu-003.xxxx.io, bjcpu-002.xxxx.io mds: cephfs:2 {0=bjcpu-003.xxxx.io=up:active,1=bjcpu-001.xxxx.io=up:active} 1 up:standby-replay osd: 48 osds: 48 up (since 7w), 48 in (since 7M) data: pools: 3 pools, 2304 pgs objects: 301.35M objects, 70 TiB usage: 246 TiB used, 280 TiB / 527 TiB avail pgs: 2295 active+clean 9 active+clean+scrubbing+deep io: client: 254 B/s rd, 44 MiB/s wr, 0 op/s rd, 15 op/s wr What should I do to fix the error message? it seems the cluster still works fine(can read and write). Many thanks -- Regards Frank Yu

3 years, 8 months

2
4
0 0

Not able to access radosgw S3 bucket creation with AWS java SDK. Caused by: java.net.UnknownHostException: issue.

by sathvik vutukuri

Hi All, radosgw-admin is configured in ceph-deploy, created a few buckets from the Ceph dashboard, but when accessing through Java AWS S3 code to create a new bucket i am facing the below issue.. Exception in thread "main" com.amazonaws.SdkClientException: Unable to execute HTTP request: firstbucket.rgwhost at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008) at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394) at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950) at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812) at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772) at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1710) at org.S3.App.main(App.java:71) Caused by: java.net.UnknownHostException: firstbucket.rgwhost at java.net.InetAddress.getAllByName0(InetAddress.java:1281) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) at com.amazonaws.http.conn.$Proxy3.connect(Unknown Source) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1330) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ... 15 more -- Thanks, Vutukuri Sathvik, 8197748291.

3 years, 8 months

3
5
0 0

Reply with Attachments in Outlook

by emailfix11＠gmail.com

People who frequently deal with emails in Outlook must be aware of a common issue. That is we won’t be able to reply with original attachments. This situation does result in multitudinous troubles. For instance, if in reply we’ve put forward some errors about original attachments, recipients who need to check the errors wouldn’t be able to find and open the attachments directly in our reply. They have to go to “Sent Items” folder and open the original email as well as its inside attachments. Far and away, it is pretty inconvenient for both senders and recipients. In fact, this case isn’t involved with any outlook reply with attachment. It only manifests a default configuration of Outlook. In a nutshell, reply can’t attach original messages by default in Outlook. Nonetheless, we can configure Outlook to permit reply with original attachments. Here are the concrete steps. For more info: https://www.emailsfix.com/outlook-email/how-to-reply-with-attachment-in-out…

3 years, 8 months

1
0
0 0

High io wait when osd rocksdb is compacting

by Raffael Bachmann

Hi All, I'm kind of crossposting this from here: https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-cep… But since I'm more and more sure that it's a ceph problem I'll try my luck here. Since updating from Luminous to Nautilus I have a big problem. I have a 3 node cluster. Each cluster has 2 nvme ssd and a 10GBASE-T net for ceph. Every few minutes a osd seems to compact the rocksdb. While doing this it uses alot of I/O and blocks. This basically blocks the whole cluster and no VM/Container can read data for some seconds (minutes). While it happens "iostat -x" looks like this: Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 2.00 0.00 24.00 0.00 46.00 0.00 95.83 0.00 0.00 0.00 0.00 12.00 2.00 0.40 nvme1n1 0.00 1495.00 0.00 3924.00 0.00 6099.00 0.00 80.31 0.00 352.39 523.78 0.00 2.62 0.67 100.00 And iotop: Total DISK READ: 0.00 B/s | Total DISK WRITE: 1573.47 K/s Current DISK READ: 0.00 B/s | Current DISK WRITE: 3.43 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 2306 be/4 ceph 0.00 B/s 1533.22 K/s 0.00 % 99.99 % ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph [rocksdb:low1] In the ceph-osd log I see that rocksdb is compacting. https://gist.github.com/qwasli/3bd0c7d535ee462feff8aaee618f3e08 The pool and one OSD is nearfull. I'd planed to move some data away to another ceph pool. But now I'm not sure anymore if I should go with ceph. I'l move some data away anyway today to see if that helps, but before the upgrade there was the same amount of data an I haven't had a problem. Any hints to solve this are appreciated. Cheers Raffael

3 years, 8 months

4
9
1 0

cephadm and disk partitions

by Jason Borden

Greetings, I have a question regarding the use of cephadm and disk partitions. I have noticed that the cephadm documentation mentions that a device cannot have partitions to be considered "available" for use. In my situation I don't want to use a device with partitions, but rather a partition itself as an osd. I've noticed that partitions do not show up when using `ceph orch device ls`. I've also noticed that partitions can still be used as osds by running something like `ceph orch daemon add osd node1:/dev/sda4`. My question is should I? Am I going to run into trouble by using a partition for an osd instead of a full device? Thanks, Jason

3 years, 8 months

3
4
0 0

Setting rbd_default_data_pool through the config store

by Wido den Hollander

Hi, I'm trying to have clients read the 'rbd_default_data_pool' config option from the config store when creating a RBD image. This doesn't seem to work and I'm wondering if somebody knows why. I tried: $ ceph config set client rbd_default_data_pool rbd-data $ ceph config set global rbd_default_data_pool rbd-data They both show up under: $ ceph config dump However, newly created RBD images with the 'rbd' CLI tool do not use the data pool. If I set this in ceph.conf it works: [client] rbd_default_data_pool = rbd-data Somehow librbd isn't fetching these configuration options. Any hints on how to get this working? The end result is that libvirt (which doesn't read ceph.conf) should also be able to create RBD images with a different data pool. Wido

3 years, 8 months

2
6
0 0

Usable space vs. Overhead

by David Orman

I'm having a hard time understanding the EC usable space vs. raw. https://ceph.io/geen-categorie/ceph-erasure-coding-overhead-in-a-nutshell/ indicates "nOSD * k / (k+m) * OSD Size" is how you calculate usable space, but that's not lining up with what i'd expect just from k data chunks + m parity chunks. So, for example, k=4, m=2. you'd expect every 4 byte object written would consume 6 bytes, so 50% overhead. however, the prior formula in a 7 server cluster, using 4+2 encoding, would indicate 66.67% usable capacity vs. raw storage. What am I missing here?

3 years, 8 months

4
7
0 0

Re: S3 bucket lifecycle not deleting old objects

by Robin H. Johnson

On Tue, Jul 28, 2020 at 01:28:14PM +0000, Alex Hussein-Kershaw wrote: > Hello, > > I have a problem that old versions of S3 objects are not being deleted. Can anyone advise as to why? I'm using Ceph 14.2.9. How many objects are in the bucket? If it's a lot, then you may run into RGW's lifecycle performance limitations: listing each bucket is a very slow operation for lifecycle prior to improvements make in later versions (Octopus with maybe a backport to Nautilius?) If the bucket doesn't have a lot of operations, you could try running the 'radosgw-admin lc process' directly, with debug logging, and see where it gets bogged down. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robbat2(a)gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

3 years, 8 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2020