Hi Cephers,
I have a lot of questions after reading a lot about bluestore, the
min_alloc_size and the impact of writing small files to it through cepfs
with erasure coding.
In a setup using VMware ESX with its VMFS6 and its 1 MB block size on an
ISCSI-LUN mapped from Ceph RBD (replicated)
1. Wouldnt it be better to reduce RBD block size to 1 MB also?
2. When a file with a size smaller than 4k is written to a filesystem
within a virtual machine (and all the layers downwards) will the consumed
space be 4k? So does the 4MB block size of RBD combine a lot of small files
to one big object?
When using Cephfs and erasure coding:
1. I assume using a 4k min_alloc_size_hdd would reduce wasted space, but
increases fragmentation as Igor wrote.
2. How is the official way to deal with fragmentation in bluestore? Is
there a defrag tool available or planned?
From a performance perspective: My cluster runs on good old filestore using
nvme journals. I am about to migrate to bluestore.
1. With a MaxIOSize of 512KB in VMware, wouldnt
bluestore_prefer_deferred_size_hdd = 524288 give me a filestore like
behavior? My aim is to have the write latency like in filestore because we
have a lot of databases.
2. Are there any tradeoffs doing this?
Regards,
Dennis
As a follow-up to our recent memory problems with OSDs (with high pglog
values:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LJPJZPBSQRJ…
), we also see high buffer_anon values. E.g. more than 4 GB, with "osd
memory target" set to 3 GB. Is there a way to restrict it?
As it is called "anon", I guess that it would first be necessary to find
out what exactly is behind this?
Well maybe it is just as Wido said, with lots of small objects, there
will be several problems.
Cheers
Harry
Hello All,
We are trying to make work the Zabbix module of our ceph cluster but im encountering an issue that got me stuck.
Configuration of the module looks ok and we manage to send data using zabbix_sender to the host that is configured on Zabbix. We can also see this data/metric in the graphs that come with the template aswell. So far so good.
However when testing from ceph module by doing ceph zabbix send , we get : Sending Data to Zabbix but when check ceph_mgr logs we see :
7fad8b61e700 0 mgr[zabbix] Exception when sending: [Errno 2] No such file or directory
and no data is seen in zabbix itself. Any clue what we can check ? This ceph is running with an openstack cluster and ceph mgr instance is a docker container, not sure if we need to do anything in particular because of this architecture.
Ceph Version : 12.2.4
Thank you,
Etienne.
Hello All,
We are trying to make work the Zabbix module of our ceph cluster but im encountering an issue that got me stuck.
Configuration of the module looks ok and we manage to send data using zabbix_sender to the host that is configured on Zabbix. We can also see this data/metric in the graphs that come with the template aswell. So far so good.
However when testing from ceph module by doing ceph zabbix send , we get : Sending Data to Zabbix but when check ceph_mgr logs we see :
7fad8b61e700 0 mgr[zabbix] Exception when sending: [Errno 2] No such file or directory
and no data is seen in zabbix itself. Any clue what we can check ? This ceph is running with an openstack cluster and ceph mgr instance is a docker container, not sure if we need to do anything in particular because of this architecture.
Ceph Version : 12.2.4
Thank you,
Etienne.
Hi All,
A customer of ours has upgraded the cluster from nautilus to octopus
after experiencing issues with osds not being able to connect to each
other, clients/mons/mgrs. The connectivity issues was related to the
msgrV2 and require_osd_release setting not being set to nautilus. After
fixing this the OSDs were restarted and all placement groups became
active again.
After unsetting the norecover and nobackfill flag some OSDs started
crashing every few minutes. The OSD log, even with high debug settings,
don't seem to reveal anything, it just stops logging mid log line.
I've created a bug report: https://tracker.ceph.com/issues/46366
Has anyone experienced something similar?
--
kind regards,
Wout
42on
Hi all,
We're planning to upgrade an existing testing 14.2.6 CEPH cluster to latest 15.2.4.
Existing cluster was deployed using ceph-ansible and we're still searching steps to do the upgrade.
Shall we do the upgrade with steps like following ?
- Upgrade rpm on all nodes (mon/mgr first, then osd, then rgw)
- Convert the cluster to use cephadm as mentioned here https://docs.ceph.com/docs/master/cephadm/adoption/
Apology for the newbie question.
Thanks a lot.
Regards,
/ST Wong
I just upgraded from Luminous to Nautilus. the cluster was originally hammer or kraken (can't recall), then jewel, on to luminous, and now Nautilus. The message i get from ceph -s is:
"health: HEALTH_WARN
crush map has legacy tunables (require firefly, min is hammer)"
Not sure how to get rid of it. I tried "ceph osd set-require-min-compat-client firefly" per instructions but still have error. When I query osd:
ceph osd dump | grep min_compat_client
require_min_compat_client firefly
min_compat_client firefly
Instructions say I can ignore the error, but that seems like a bad idea.
Windows VM's are running on qemu/kvm via rbd connection.
Ceph and KVM are running CentOS7. Just upgraded all clients to "centos-release-7-8.2003.0.el7.centos.x86_64"
I have rebooted all nodes since CentOS upgrade and again after Nautilus upgrade.
tried setting "ceph osd set-require-min-compat-client jewel" but then heath_warn was still present and:
ceph osd dump | grep min_compat_client
require_min_compat_client jewel
min_compat_client firefly
Any help is appreciated.
Hi all.
I see this sentence in many sites. Does anyone knows why?
> Then turn off print continue. If you have it set to true, you may encounter problems with PUT operations
I use nginx in front of my rgw and proxy pass expect header in it.
Thanks.
Nautilus install.
Documentation seems a bit ambiguous to me - this is for a spinner + SSD,
using ceph-volume
If I put the block.db on the SSD with
"ceph-volume lvm create --bluestore --data /dev/sdd --block.db
/dev/sdc1"
does the wal exists on the ssd (/dev/sdc1) as well, or does it remain on
the hdd (/dev/sdd)?
Conversely, what happens with the block.db if I place the wal with
--block.wal
Or do I have to setup separate partitions for the block.db and wal?
--
Lindsay
Hi all.
There are high iops on my bucket index pool when there is about 1K PUT
request/s.
Is there any way I can debug why there are so many iops on the bucket
index pool?
Thanks.