I'm trying to follow the directions in ceph-ansible for having it automatically set up the crush map.
I've also looked at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/i…
But.. my setup isnt working.
ansible complains about:
[WARNING]: While constructing a mapping from /etc/ansible/hosts.yml, line 5, column 9, found a duplicate dict key (hostA2).
Using last defined value only.
Could anyone explain what I'm doing wrong, please?
Here's the relevant hosts.yml snippets
all:
children:
osds:
hosts:
hostA1:
osd_crush_location:
host: 'hostA1'
chassis: 'hostA'
root: 'default'
hostA2:
osd_crush_location:
host: 'hostA2'
chassis: 'hostA'
root: 'default'
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Hello,
I would like to install Ceph 15.2.10 using cephadm and just found the following table by checking the requirements on the host:
https://docs.ceph.com/en/latest/cephadm/compatibility/#compatibility-with-p…
Do I understand this table correctly that I should be using podman version 2.1?
and what happens if I use the latest podman version 3.0
Best regards,
Mabi
Hi
I'm in a bit of a panic :-(
Recently we started attempting to configure a radosgw to our ceph
cluster, which was until now only doing cephfs (and rbd wss working as
well). We were messing about with ceph-ansible, as this was how we
originally installed the cluster. Anyway, it installed nautilus 14.2.18
on the radosgw and I though it would be good to pull up the rest of the
cluster to that level as well using our tried and tested ceph upgrade
script (it basically does an update of all ceph nodes one by one and
checks whether ceph is ok again before doing the next)
After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...
The message deceptively is: HEALTH_WARN Reduced data availability: 5568
pgs inactive
That's all PGs!
I tried as a desperate measure to upgrade one ceph OSD node, but that
broke as well, the osd service on that node gets an interrupt from the
kernel....
the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
"mon": {
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
},
"mds": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
},
"overall": {
"ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
"ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
}
}
12 OSDs are down
# ceph -s
cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
Reduced data availability: 5568 pgs inactive
services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722
remapped pgs
data:
pools: 12 pools, 5568 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
5568 unknown
progress:
Rebalancing after osd.103 marked in
[..............................]
Hi,
I noticed while using rclone to migrate some data from a Swift
cluster into a RADOSGW cluster that sometimes when listing a
bucket RADOSGW will not always return as many results as specified
by the "limit" parameter, even when more objects remain to list.
This results in rclone believing on subsequent runs that the
objects do not exist, since it performs an initial comparison
based on bucket listings, and so it needlessly recopies data.
This seems contrary to how pagination is specified by Swift:
https://docs.openstack.org/swift/latest/api/pagination.html
Is this known behaviour, or should I go ahead and file a bug?
I believe the cluster is running 15.2.8 or so, but will confirm.
Thanks,
Paul
---
Further observations:
* Here's a summary of the reply lengths I got when listing
various buckets in our RADOSGW cluster. (This is not all of
the buckets in the tenant; the other 100 or so are fine.)
reply lengths: 1000 999 1000 1000 1000 1000 1000 1000 1000 1000 119
reply lengths: 1000 992 1000 1000 1000 1000 1000 935 1000 1000 257
reply lengths: 1000 1000 1000 1000 1000 975 1000 948
reply lengths: 953 1000 1000 1000 1000 1000 954 1000 1000 70
reply lengths: 1000 1000 1000 1000 998 15
reply lengths: 1000 1000 1000 1000 974 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 939 1000 1000 1000 1000 949 1000 1000 1000 644
reply lengths: 1000 1000 1000 1000 999 1000 1000 937 1000 1000 538
reply lengths: 1000 998 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 551
reply lengths: 1000 1000 1000 1000 1000 1000 1000 931 1000 986 1000 1000 1000 975 1000 989 1000 1000 1000 966 1000 998 921 994 1000 1000 973 58
reply lengths: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 976 1000 366
reply lengths: 1000 1000 1000 1000 1000 983 1000 1000 1000 1000 1000 1000 1000 517
reply lengths: 1000 1000 1000 984 1000 1000 971 1000 1000 401
reply lengths: 949 1000 1000 1000 1000 1000 1000 403
reply lengths: 1000 998 532
reply lengths: 951 1000 1000 1000 1000 1000 976 1000 877
* rclone uses a default $limit of 1,000, in contrast to the
Python swiftclient's default of 10,000.
* The Swift API doc seems clear that $limit results should always
be returned if at least $limit results are available, and that
receiving less than $limit results indicates no more exist.
(It doesn't *explicitly* say the last, but the document could
be a lot shorter if it were not intended for that to follow.)
* When swiftclient is asked to fetch a listing, and full_listing
is set to True, instead of implementing pagingation as
described in the document above, swiftclient simply keeps
fetching pages until it receives an empty page.
So Swift API implementations that don't strictly implement
paging per the docs may not even be noticed by most users.
* From a review of its code, swiftclient seems to have done this
since the very beginning. Perhaps the code was written first
and then pagination on the server side was nailed down later?
--
Paul Collins
Wellington, New Zealand
Now that the hybrid allocator appears to be enabled by default in
Octopus, is it safe to change bluestore_min_alloc_size_hdd to 4k from
64k on Octopus 15.2.10 clusters, and then redeploy every OSD to switch
to the smaller allocation size, without massive performance impact for
RBD? We're seeing a lot of storage usage amplification on EC 8+3
clusters which are HDD backed that lines up with a lot of the mailing
list posts we've seen here. Upgrading to Pacific before making this
change is also a possibility once a more stable release arrives, if
that's necessary.
Second part of this question - we are using RBDs currently on the
clusters impacted. These have XFS filesystems on top, which detect the
sector size of the RBD as 512byte, and XFS has a block size of 4k.
With the default of 64k for bluestore_min_alloc_size_hdd, let's say a
1G file is written out to the XFS filesystem backed by the RBD. On the
ceph side, is this being seen as a lot of 4k objects thus a
significant space waste is occurring, or is RBD able to coalesce these
into 64k objects, even though XFS is using a 4k block size?
XFS details below, you can see the allocation groups are quite large:
meta-data=/dev/rbd0 isize=512 agcount=501, agsize=268435440 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=134217728000, imaxpct=1
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
I'm curious if people have been tuning XFS on RBD for better
performance, as well.
Thank you!
We just updated one of our ceph clusters from 14.2.15 to 14.2.19, and
see some unexpected behavior by radosgw - it seems to ignore parameters
set by the ceph config database. Specifically this is making it start up
listening only on port 7480, and not the configured 80 and 443 (ssl) ports.
Downgrading ceph on the rgw nodes back to 14.2.15 restores the expected
behavior (I haven't yet tried any intermediate versions). The host OS is
CentOS 7, if that matters...
Here's a ceph config dump for one of the affected nodes, along with the
radosgw startup log:
> # ceph config dump|grep tier2-gw02
> client.rgw.tier2-gw02 basic log_file /var/log/ceph/radosgw.log *
> client.rgw.tier2-gw02 advanced rgw_dns_name s3.msi.umn.edu *
> client.rgw.tier2-gw02 advanced rgw_enable_usage_log true
> client.rgw.tier2-gw02 basic rgw_frontends beast port=80 ssl_port=443 ssl_certificate=/etc/ceph/civetweb.pem *
> client.rgw.tier2-gw02 basic rgw_thread_pool_size 512
> # tail /var/log/ceph/radosgw.log
> 2021-04-08 11:51:07.956 7f420b78f700 -1 received signal: Terminated from /usr/lib/systemd/systemd --switched-root --system --deserialize 22 (PID: 1) UID: 0
> 2021-04-08 11:51:07.956 7f420b78f700 1 handle_sigterm
> 2021-04-08 11:51:07.956 7f4220bc5900 -1 shutting down
> 2021-04-08 11:51:07.956 7f420b78f700 1 handle_sigterm set alarm for 120
> 2021-04-08 11:51:08.010 7f4220bc5900 1 final shutdown
> 2021-04-08 11:51:08.159 7f2ac6105900 0 deferred set uid:gid to 167:167 (ceph:ceph)
> 2021-04-08 11:51:08.159 7f2ac6105900 0 ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) nautilus (stable), process radosgw, pid 88256
> 2021-04-08 11:51:08.300 7f2ac6105900 0 starting handler: beast
> 2021-04-08 11:51:08.302 7f2ac6105900 0 set uid:gid to 167:167 (ceph:ceph)
> 2021-04-08 11:51:08.317 7f2ac6105900 1 mgrc service_daemon_register rgw.tier2-gw02 metadata {arch=x86_64,ceph_release=nautilus,ceph_version=ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) nautilus (stable),ceph_version_short=14.2.19,cpu=AMD EPYC 7302P 16-Core Processor,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=beast port=7480,frontend_type#0=beast,hostname=tier2-gw02.msi.umn.edu,kernel_description=#1 SMP Tue Mar 16 18:28:22 UTC 2021,kernel_version=3.10.0-1160.21.1.el7.x86_64,mem_swap_kb=4194300,mem_total_kb=131754828,num_handles=1,os=Linux,pid=88256,zone_id=default,zone_name=default,zonegroup_id=default,zonegroup_name=default}
BTW I can also change "rgw_frontends" to specify a civetweb frontend
instead and it will still start the default beast...
I haven't seen anyone else report such a problem so I wonder if this is
something local to us - like perhaps I'm using "ceph config" incorrectly
in a way which happened to be accepted before? Has anyone else seen this
behavior?
Graham
--
Graham Allan - gta(a)umn.edu
Associate Director of Operations - Minnesota Supercomputing Institute
I upgraded our Luminous cluster to Nautilus a couple of weeks ago and
converted the last batch of FileStore OSDs to BlueStore about 36 hours ago.
Yesterday our monitor cluster went nuts and started constantly calling
elections because monitor nodes were at 100% and wouldn't respond to
heartbeats. I reduced the monitor cluster to one to prevent the constant
elections and that let the system limp along until the backfills finished.
There are large amounts of time where ceph commands hang with the CPU is at
100%, when the CPU drops I see a lot of work getting done in the monitor
logs which stops as soon as the CPU is at 100% again.
I did a `perf top` on the node to see what's taking all the time and it
appears to be in the rocksdb code path. I've set `mon_compact_on_start =
true` in the ceph.conf but that does not appear to help. The
`/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the
backfills were going on. I've tried adding a second monitor, but it goes
back to the constant elections. I tried restarting all the services without
luck. I also pulled the monitor from the network work and tried restarting
the mon service isolated (this helped a couple of weeks ago when `ceph -s`
would cause 100% CPU and lock up the service much worse than this) and
didn't see the high CPU load. So I'm guessing it's triggered from some
external source.
I'm happy to provide more info, just let me know what would be helpful.
Thank you,
Robert LeBlanc
[image: image.png]
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
Hi everyone,
I cleaned up the CFP coordination etherpad with some events coming up.
Please add other events you think the community should be considering
proposing content on Ceph or adjacent projects like Rook.
KubeCon NA CFP, for example, is ending April 11. Take a look:
https://pad.ceph.com/p/cfp-coordination
I have also added this to our wiki for discovery.
https://tracker.ceph.com/projects/ceph/wiki/Community
--
Mike Perez
Hi everyone,
We encountered an issue with KRBD mounting after mapping it to the host with read-only option.
We try to pinpoint where the problem is, but not able to do it.
The image is mounted well if we map it without the "read-only" option.
This leads to an issue that the pod in k8s cannot use the snapshotted persistent volume created by ceph-csi rbd provisioner.
Thank you for reading.
I have reported the bug here: Bug #50234: krbd failed to mount after map image with read-only option - Ceph - Ceph<https://tracker.ceph.com/issues/50234>
Context
- Using admin keyring
- Linux Kernel: 3.10.0-1160.15.2.el7.x86_64
- Linux Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
- Ceph version: "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)"
rbd image 'csi-vol-85919409-9797-11eb-80ba-720b2b57c790':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 533a03bba388ea
block_name_prefix: rbd_data.533a03bba388ea
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Apr 7 13:51:02 2021
access_timestamp: Wed Apr 7 13:51:02 2021
modify_timestamp: Wed Apr 7 13:51:02 2021
Bug Reproduction
# Map RBD image WITH read-only option, CANNOT mount with both readonly or readwrite option
sudo rbd device map -p k8s-sharedpool csi-vol-85919409-9797-11eb-80ba-720b2b57c790 -ro
/dev/rbd0
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
mount: cannot mount /dev/rbd0 read-only
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
mount: /dev/rbd0 is write-protected, mounting read-only
mount: cannot mount /dev/rbd0 read-only
# Map RBD image WITHOUT read-only option, CAN mount with both readonly or readwrite option
sudo rbd device map -p k8s-sharedpool csi-vol-85919409-9797-11eb-80ba-720b2b57c790
/dev/rbd0
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
mount: /mnt/test1 does not contain SELinux labels.
You just mounted an file system that supports labels which does not
contain labels, onto an SELinux box. It is likely that confined
applications will generate AVC messages and not be allowed access to
this file system. For more details see restorecon(8) and mount(8).
mount: /dev/rbd0 mounted on /mnt/test1.
sudo mount -v -t ext4 /dev/rbd0 /mnt/test1
mount: /mnt/test1 does not contain SELinux labels.
You just mounted an file system that supports labels which does not
contain labels, onto an SELinux box. It is likely that confined
applications will generate AVC messages and not be allowed access to
this file system. For more details see restorecon(8) and mount(8).
mount: /dev/rbd0 mounted on /mnt/test1.
With my best regards,
Son Hai Ha
--
KPMG IT Service GmbH
Sitz/Registergericht: Berlin/Amtsgericht Charlottenburg, HRB 87521 B
Geschäftsführer: Hans-Christian Schwieger, Helmar Symmank
Aufsichtsratsvorsitzender: WP StB Klaus Becker
Allgemeine Informationen zur Datenverarbeitung im Rahmen unserer allgemeinen Geschäftstätigkeit sowie im Mandatsverhältnis gemäß EU Datenschutz-Grundverordnung sind hier <https://home.kpmg.com/content/dam/kpmg/de/pdf/Themen/2018/datenschutzinform…> abrufbar.
Die Information in dieser E-Mail ist vertraulich und kann dem Berufsgeheimnis unterliegen. Sie ist ausschließlich für den Adressaten bestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede Veröffentlichung, Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Maßnahmen im Vertrauen auf erlangte Information untersagt. In dieser E-Mail enthaltene Meinungen oder Empfehlungen unterliegen den Bedingungen des jeweiligen Mandatsverhältnisses mit dem Adressaten.
The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any opinions or advice contained in this e-mail are subject to the terms and conditions expressed in the governing KPMG client engagement letter.
Hi! I have a single machine ceph installation and after trying to update to
pacific the upgrade is stuck with:
ceph -s
cluster:
id: d9f4c810-8270-11eb-97a7-faa3b09dcf67
health: HEALTH_WARN
Upgrade: Need standby mgr daemon
services:
mon: 1 daemons, quorum sev.spacescience.ro (age 3w)
mgr: sev.spacescience.ro.wpozds(active, since 2w)
mds: sev-ceph:1 {0=sev-ceph.sev.vmvwrm=up:active}
osd: 2 osds: 2 up (since 2w), 2 in (since 2w)
data:
pools: 4 pools, 194 pgs
objects: 32 objects, 8.4 KiB
usage: 2.0 GiB used, 930 GiB / 932 GiB avail
pgs: 194 active+clean
progress:
Upgrade to docker.io/ceph/ceph:v16.2.0 (0s)
[............................]
How can i put the mgr on standby? so far i did not find anything relevant..
Thanks a lot!
Adrian