Hi,
I've configured the all.yml like this:
grep -v "^#\|^$" ceph/ceph-ansible-playbooks/ceph-ansible-stable-4.0/group_vars/all.yml
---
dummy:
nautilus: 14
cluster: ceph
mon_group_name: mons
osd_group_name: osds
mgr_group_name: mgrs
configure_firewall: False
centos_package_dependencies:
- epel-release
- libselinux-python
ceph_origin: repository
ceph_repository: community
ceph_mirror: http://hk-repo-2001/repo/ceph/
ceph_stable_key: http://hk-repo-2001/repo/ceph/release.asc
ceph_stable_release: nautilus
ceph_stable_redhat_distro: el7
monitor_interface: bond0
ip_version: ipv4
public_network: 10.121.58.0/24
cluster_network: 192.168.58.0/24
osd_objectstore: bluestore
dashboard_enabled: False
I'd like to install from our repo server, but when it wants to install at this step it complains about can't reach the epel repository:
TASK [ceph-common : install redhat ceph packages] ********************************************************************************************************
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (2 retries left).
FAILED - RETRYING: install redhat ceph packages (2 retries left).
FAILED - RETRYING: install redhat ceph packages (2 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
fatal: [hk-ceph-2c09]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"}
fatal: [hk-ceph-2c08]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"}
fatal: [hk-ceph-2c10]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"}
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (2 retries left).
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (3 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
FAILED - RETRYING: install redhat ceph packages (2 retries left).
fatal: [hk-ceph-2c07]: FAILED! => {"attempts": 3, "changed": false, "msg": "Failure talking to yum: failure: repodata/repomd.xml from epel: [Errno 256] No more mirrors to try.\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.x…: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed7: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed6: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed6: Network is unreachable\"\nhttp://download.fedoraproject.org/pub/epel/7/x86_64/repodata/repomd.xml: [Errno 14] curl#7 - \"Failed to connect to 2620:52:3:1:dead:beef:cafe:fed6: Network is unreachable\""}
FAILED - RETRYING: install redhat ceph packages (2 retries left).
FAILED - RETRYING: install redhat ceph packages (2 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
FAILED - RETRYING: install redhat ceph packages (1 retries left).
I've tried to change on the hk-ceph-2c07 in the epel.repo file uncomment baseurl and comment out metalink, cleaned yum ... but nothing helped, I guess something is wrong with my internal link.
When I try to install something from epel repo on the server, I can, but via ansible I can't.
I'd like to deploy ceph with ansible user, the rights are correct.
Here is the hosts file for the playbook:
[all:vars]
ansible_ssh_user=ansible
ansible_become=true
ansible_become_method=sudo
ansible_become_user=root
[mons]
hk-cephm-2007
hk-cephm-2008
hk-cephm-2009
[mgrs]
hk-cephm-2007
hk-cephm-2008
hk-cephm-2009
[osds]
hk-ceph-2c07
hk-ceph-2c08
hk-ceph-2c09
hk-ceph-2c10
Why is the repo error or network unreachable?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi all,
When running containerized Ceph (Nautilus) is anyone else seeing a
constant memory leak in the ceph-mgr pod with constant ms_handle_reset
errors in the logs for the backup mgr instance?
---
0 client.0 ms_handle_reset on v2:172.29.1.13:6848/1
0 client.0 ms_handle_reset on v2:172.29.1.13:6848/1
0 client.0 ms_handle_reset on v2:172.29.1.13:6848/1
---
I see a couple of related reports with no activity:
https://tracker.ceph.com/issues/36471https://tracker.ceph.com/issues/40260
and one related merge that doesn't seem to have corrected the issue:
https://github.com/ceph/ceph/pull/24233
thx
Frank
Hi guys,
I have a ceph cluster with three MDS servers, two of them in active status,
while the left one is in standby-replay mode. Today I found the message '1
MDSs are read only' show up when check the cluster status with 'ceph -s',
details as below:
# ceph -s
cluster:
id: 3d43e9a5-50dc-4f84-9493-656bf4f06f8c
health: HEALTH_WARN
5 clients failing to advance oldest client/flush tid
1 MDSs are read only
2 MDSs report slow requests
2 MDSs behind on trimming
BlueFS spillover detected on 33 OSD(s)
services:
mon: 3 daemons, quorum bjcpu-001,bjcpu-002,bjcpu-003 (age 3M)
mgr: bjcpu-001.xxxx.io(active, since 3M), standbys: bjcpu-003.xxxx.io,
bjcpu-002.xxxx.io
mds: cephfs:2 {0=bjcpu-003.xxxx.io=up:active,1=bjcpu-001.xxxx.io=up:active}
1 up:standby-replay
osd: 48 osds: 48 up (since 7w), 48 in (since 7M)
data:
pools: 3 pools, 2304 pgs
objects: 301.35M objects, 70 TiB
usage: 246 TiB used, 280 TiB / 527 TiB avail
pgs: 2295 active+clean
9 active+clean+scrubbing+deep
io:
client: 254 B/s rd, 44 MiB/s wr, 0 op/s rd, 15 op/s wr
What should I do to fix the error message? it seems the cluster still works
fine(can read and write).
Many thanks
--
Regards
Frank Yu
Hi All,
radosgw-admin is configured in ceph-deploy, created a few buckets from the
Ceph dashboard, but when accessing through Java AWS S3 code to create a new
bucket i am facing the below issue..
Exception in thread "main" com.amazonaws.SdkClientException: Unable to
execute HTTP request: firstbucket.rgwhost
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
at
com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:394)
at
com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:5950)
at
com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1812)
at
com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1772)
at
com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1710)
at org.S3.App.main(App.java:71)
Caused by: java.net.UnknownHostException: firstbucket.rgwhost
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at
com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
at
com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
at
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
at com.amazonaws.http.conn.$Proxy3.connect(Unknown Source)
at
org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1330)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
... 15 more
--
Thanks,
Vutukuri Sathvik,
8197748291.
People who frequently deal with emails in Outlook must be aware of a common issue. That is we won’t be able to reply with original attachments. This situation does result in multitudinous troubles. For instance, if in reply we’ve put forward some errors about original attachments, recipients who need to check the errors wouldn’t be able to find and open the attachments directly in our reply. They have to go to “Sent Items” folder and open the original email as well as its inside attachments. Far and away, it is pretty inconvenient for both senders and recipients. In fact, this case isn’t involved with any outlook reply with attachment. It only manifests a default configuration of Outlook. In a nutshell, reply can’t attach original messages by default in Outlook. Nonetheless, we can configure Outlook to permit reply with original attachments. Here are the concrete steps.
For more info: https://www.emailsfix.com/outlook-email/how-to-reply-with-attachment-in-out…
Hi All,
I'm kind of crossposting this from here:
https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-cep…
But since I'm more and more sure that it's a ceph problem I'll try my
luck here.
Since updating from Luminous to Nautilus I have a big problem.
I have a 3 node cluster. Each cluster has 2 nvme ssd and a 10GBASE-T net
for ceph.
Every few minutes a osd seems to compact the rocksdb. While doing this
it uses alot of I/O and blocks.
This basically blocks the whole cluster and no VM/Container can read
data for some seconds (minutes).
While it happens "iostat -x" looks like this:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 2.00 0.00 24.00 0.00 46.00 0.00 95.83 0.00 0.00 0.00 0.00 12.00 2.00 0.40
nvme1n1 0.00 1495.00 0.00 3924.00 0.00 6099.00 0.00 80.31 0.00 352.39 523.78 0.00 2.62 0.67 100.00
And iotop:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 1573.47 K/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 3.43 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
2306 be/4 ceph 0.00 B/s 1533.22 K/s 0.00 % 99.99 % ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph [rocksdb:low1]
In the ceph-osd log I see that rocksdb is compacting.
https://gist.github.com/qwasli/3bd0c7d535ee462feff8aaee618f3e08
The pool and one OSD is nearfull. I'd planed to move some data away to
another ceph pool. But now I'm not sure anymore if I should go with ceph.
I'l move some data away anyway today to see if that helps, but before
the upgrade there was the same amount of data an I haven't had a problem.
Any hints to solve this are appreciated.
Cheers
Raffael
Greetings,
I have a question regarding the use of cephadm and disk partitions. I have noticed that the cephadm documentation mentions that a device cannot have partitions to be considered "available" for use. In my situation I don't want to use a device with partitions, but rather a partition itself as an osd. I've noticed that partitions do not show up when using `ceph orch device ls`. I've also noticed that partitions can still be used as osds by running something like `ceph orch daemon add osd node1:/dev/sda4`. My question is should I? Am I going to run into trouble by using a partition for an osd instead of a full device?
Thanks,
Jason
Hi,
I'm trying to have clients read the 'rbd_default_data_pool' config
option from the config store when creating a RBD image.
This doesn't seem to work and I'm wondering if somebody knows why.
I tried:
$ ceph config set client rbd_default_data_pool rbd-data
$ ceph config set global rbd_default_data_pool rbd-data
They both show up under:
$ ceph config dump
However, newly created RBD images with the 'rbd' CLI tool do not use the
data pool.
If I set this in ceph.conf it works:
[client]
rbd_default_data_pool = rbd-data
Somehow librbd isn't fetching these configuration options. Any hints on
how to get this working?
The end result is that libvirt (which doesn't read ceph.conf) should
also be able to create RBD images with a different data pool.
Wido
I'm having a hard time understanding the EC usable space vs. raw.
https://ceph.io/geen-categorie/ceph-erasure-coding-overhead-in-a-nutshell/
indicates "nOSD * k / (k+m) * OSD Size" is how you calculate usable space,
but that's not lining up with what i'd expect just from k data chunks + m
parity chunks.
So, for example, k=4, m=2. you'd expect every 4 byte object written would
consume 6 bytes, so 50% overhead. however, the prior formula in a 7 server
cluster, using 4+2 encoding, would indicate 66.67% usable capacity vs. raw
storage.
What am I missing here?
On Tue, Jul 28, 2020 at 01:28:14PM +0000, Alex Hussein-Kershaw wrote:
> Hello,
>
> I have a problem that old versions of S3 objects are not being deleted. Can anyone advise as to why? I'm using Ceph 14.2.9.
How many objects are in the bucket? If it's a lot, then you may run into
RGW's lifecycle performance limitations: listing each bucket is a very
slow operation for lifecycle prior to improvements make in later
versions (Octopus with maybe a backport to Nautilius?)
If the bucket doesn't have a lot of operations, you could try running
the 'radosgw-admin lc process' directly, with debug logging, and see
where it gets bogged down.
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail : robbat2(a)gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136