Hello
the mgr module diskprediction_local fails under ubuntu 20.04 focal with
python3-sklearn version 0.22.2
Ceph version is 15.2.3
when the module is enabled i get the following error:
File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 112, in
serve
self.predict_all_devices()
File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 279, in
predict_all_devices
result = self._predict_life_expentancy(devInfo['devid'])
File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 222, in
_predict_life_expentancy
predicted_result = obj_predictor.predict(predict_datas)
File "/usr/share/ceph/mgr/diskprediction_local/predictor.py", line 457,
in predict
pred = clf.predict(ordered_data)
File "/usr/lib/python3/dist-packages/sklearn/svm/_base.py", line 585, in
predict
if self.break_ties and self.decision_function_shape == 'ovo':
AttributeError: 'SVC' object has no attribute 'break_ties'
Best Regards
Eric
We're happy to announce the fourth bugfix release in the Octopus series.
In addition to a security fix in RGW, this release brings a range of fixes
across all components. We recommend that all Octopus users upgrade to this
release. For a detailed release notes with links & changelog please
refer to the official blog entry at https://ceph.io/releases/v15-2-4-octopus-released
Notable Changes
---------------
* CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration's ExposeHeader
(William Bowling, Adam Mohammed, Casey Bodley)
* Cephadm: There were a lot of small usability improvements and bug fixes:
* Grafana when deployed by Cephadm now binds to all network interfaces.
* `cephadm check-host` now prints all detected problems at once.
* Cephadm now calls `ceph dashboard set-grafana-api-ssl-verify false`
when generating an SSL certificate for Grafana.
* The Alertmanager is now correctly pointed to the Ceph Dashboard
* `cephadm adopt` now supports adopting an Alertmanager
* `ceph orch ps` now supports filtering by service name
* `ceph orch host ls` now marks hosts as offline, if they are not
accessible.
* Cephadm can now deploy NFS Ganesha services. For example, to deploy NFS with
a service id of mynfs, that will use the RADOS pool nfs-ganesha and namespace
nfs-ns::
ceph orch apply nfs mynfs nfs-ganesha nfs-ns
* Cephadm: `ceph orch ls --export` now returns all service specifications in
yaml representation that is consumable by `ceph orch apply`. In addition,
the commands `orch ps` and `orch ls` now support `--format yaml` and
`--format json-pretty`.
* Cephadm: `ceph orch apply osd` supports a `--preview` flag that prints a preview of
the OSD specification before deploying OSDs. This makes it possible to
verify that the specification is correct, before applying it.
* RGW: The `radosgw-admin` sub-commands dealing with orphans --
`radosgw-admin orphans find`, `radosgw-admin orphans finish`, and
`radosgw-admin orphans list-jobs` -- have been deprecated. They have
not been actively maintained and they store intermediate results on
the cluster, which could fill a nearly-full cluster. They have been
replaced by a tool, currently considered experimental,
`rgw-orphan-list`.
* RBD: The name of the rbd pool object that is used to store
rbd trash purge schedule is changed from "rbd_trash_trash_purge_schedule"
to "rbd_trash_purge_schedule". Users that have already started using
`rbd trash purge schedule` functionality and have per pool or namespace
schedules configured should copy "rbd_trash_trash_purge_schedule"
object to "rbd_trash_purge_schedule" before the upgrade and remove
"rbd_trash_purge_schedule" using the following commands in every RBD
pool and namespace where a trash purge schedule was previously
configured::
rados -p <pool-name> [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule
rados -p <pool-name> [-N namespace] rm rbd_trash_trash_purge_schedule
or use any other convenient way to restore the schedule after the
upgrade.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.10.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 7447c15c6ff58d7fce91843b705a268a1917325c
--
David Galloway
Systems Administrator, RDU
Ceph Engineering
IRC: dgalloway
Hello,
Over the last week I have tried optimising the performance of our MDS
nodes for the large amount of files and concurrent clients we have. It
turns out that despite various stability fixes in recent releases, the
default configuration still doesn't appear to be optimal for keeping the
cache size under control and avoid intermittent I/O blocks.
Unfortunately, it is very hard to tweak the configuration to something
that works, because the tuning parameters needed are largely
undocumented or only described in very technical terms in the source
code making them quite unapproachable for administrators not familiar
with all the CephFS internals. I would therefore like to ask if it were
possible to document the "advanced" MDS settings more clearly as to what
they do and in what direction they have to be tuned for more or less
aggressive cap recall, for instance (sometimes it is not clear if a
threshold is a min or a max threshold).
I am am in the very (un)fortunate situation to have folders with a
several 100K direct sub folders or files (and one extreme case with
almost 7 million dentries), which is a pretty good benchmark for
measuring cap growth while performing operations on them. For the time
being, I came up with this configuration, which seems to work for me,
but is still far from optimal:
mds basic mds_cache_memory_limit 10737418240
mds advanced mds_cache_trim_threshold 131072
mds advanced mds_max_caps_per_client 500000
mds advanced mds_recall_max_caps 17408
mds advanced mds_recall_max_decay_rate 2.000000
The parameters I am least sure about---because I understand the least
how they actually work---are mds_cache_trim_threshold and
mds_recall_max_decay_rate. Despite reading the description in
src/common/options.cc, I understand only half of what they're doing and
I am also not quite sure in which direction to tune them for optimal
results.
Another point where I am struggling is the correct configuration of
mds_recall_max_caps. The default of 5K doesn't work too well for me, but
values above 20K also don't seem to be a good choice. While high values
result in fewer blocked ops and better performance without destabilising
the MDS, they also lead to slow but unbounded cache growth, which seems
counter-intuitive. 17K was the maximum I could go. Higher values work
for most use cases, but when listing very large folders with millions of
dentries, the MDS cache size slowly starts to exceed the limit after a
few hours, since the MDSs are failing to keep clients below
mds_max_caps_per_client despite not showing any "failing to respond to
cache pressure" warnings.
With the configuration above, I do not have cache size issues any more,
but it comes at the cost of performance and slow/blocked ops. A few
hints as to how I could optimise my settings for better client
performance would be much appreciated and so would be additional
documentation for all the "advanced" MDS settings.
Thanks a lot
Janek
Dear Cephers,
we are currently mounting CephFS with relatime, using the FUSE client (version 13.2.6):
ceph-fuse on /cephfs type fuse.ceph-fuse (rw,relatime,user_id=0,group_id=0,allow_other)
For the first time, I wanted to use atime to identify old unused data. My expectation with "relatime" was that the access time stamp would be updated less often, for example,
only if the last file access was >24 hours ago. However, that does not seem to be the case:
----------------------------------------------
$ stat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root
...
Access: 2019-04-10 15:50:04.975959159 +0200
Modify: 2019-04-10 15:50:05.651613843 +0200
Change: 2019-04-10 15:50:06.141006962 +0200
...
$ cat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root > /dev/null
$ sync
$ stat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root
...
Access: 2019-04-10 15:50:04.975959159 +0200
Modify: 2019-04-10 15:50:05.651613843 +0200
Change: 2019-04-10 15:50:06.141006962 +0200
...
----------------------------------------------
I also tried this via an nfs-ganesha mount, and via a ceph-fuse mount with admin caps,
but atime never changes.
Is atime really never updated with CephFS, or is this configurable?
Something as coarse as "update at maximum once per day only" would be perfectly fine for the use case.
Cheers,
Oliver
Hi,
On a recently deployed Octopus (15.2.2) cluster (240 OSDs) we are seeing
OSDs randomly drop out of the cluster.
Usually it's 2 to 4 OSDs spread out over different nodes. Each node has
16 OSDs and not all the failing OSDs are on the same node.
The OSDs are marked as down and all they keep print in their logs:
monclient: _check_auth_rotating possible clock skew, rotating keys
expired way too early (before 2020-06-04T07:57:17.706529-0400)
Looking at their status through the admin socket:
{
"cluster_fsid": "68653193-9b84-478d-bc39-1a811dd50836",
"osd_fsid": "87231b5d-ae5f-4901-93c5-18034381e5ec",
"whoami": 206,
"state": "active",
"oldest_map": 73697,
"newest_map": 75795,
"num_pgs": 19
}
The message brought me to my own ticket I created 2 years ago:
https://tracker.ceph.com/issues/23460
The first thing I've checked is NTP/time. Double, triple check this. All
the times are in sync on the cluster. Nothing wrong there.
Again, it's not all the OSDs on a node failing. Just 1 or 2 dropping out.
Restarting them brings them back right away and then within 24h some
other OSDs will drop out.
Has anybody seen this behavior with Octopus as well?
Wido
Hello,
I’m running kvm virtualization with rbd storage, some images on rbd pool become efficiently unusable after VM restart.
All I/O to problematic rbd image blocks infinitely.
Checked that it is not a permission or locking problem.
The bug was silent until we performed a planned restart of few VMs and some of VMs failed to start (kvm process timed out).
It could be related to recent upgrades luminous to nautilus or proxmox 5 to 6.
Ceph backend is clean, no observable problems, all mons/mgrs/osds up and running. Network is ok.
Nothing in logs relevant to the problem.
ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)
kernel 5.3.13-2-pve #1 SMP PVE 5.3.13-2 (Fri, 24 Jan 2020 09:49:36 +0100) x86_64 GNU/Linux
HEALTH_OK
No locks:
# rbd status rbd-technet/vm-402-disk-0
Watchers: none
# rbd status rbd-technet/vm-402-disk-1
Watchers: none
Normal image vs problematic:
# rbd object-map check rbd-technet/vm-402-disk-0
Object Map Check: 100% complete…done.
# rbd object-map check rbd-technet/vm-402-disk-1
^C
disk-0 is good while disk-1 is effectively lost. Command hangs for many minutes with no visible activity, interrupted.
rbd export runs without problems, however some data is lost after being imported back (ext4 errors).
rbd deep copy worked for me. Copy looks good, no errors.
# rbd info rbd-technet/vm-402-disk-1
rbd image 'vm-402-disk-1':
size 16 GiB in 4096 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: c600d06b8b4567
block_name_prefix: rbd_data.c600d06b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
op_features:
flags:
create_timestamp: Fri Jan 31 17:50:50 2020
access_timestamp: Sat Mar 7 00:30:53 2020
modify_timestamp: Sat Mar 7 00:33:35 2020
journal: c600d06b8b4567
mirroring state: disabled
What can be done to debug this problem?
Thanks,
Ilia.
Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
interconnected via Infiniband 40.
Problem is that the ceph performance is quite bad (approx. 30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
have a faster ceph storage and also some storage extension.
The question is now which SSDs I should use. If I understand it right,
not every SSD is suitable for ceph, as is denoted at the links below:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-i…
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
fast SSD for ceph. As the 950 is not available anymore, I ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
Before equipping all nodes with these SSDs, I did some tests with "fio"
as recommended, e.g. like this:
fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
The results are as the following:
-----------------------
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
-----------------------
So the read speed is impressive, but the write speed is really bad.
Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
chips (MLC instead of TLC). The results are, however even worse for writing:
-----------------------
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
-----------------------
I did some research and found out, that the "--sync" flag sets the flag
"O_DSYNC" which seems to disable the SSD cache which leads to these
horrid write speeds.
It seems that this relates to the fact that the write cache is only not
disabled for SSDs which implement some kind of battery buffer that
guarantees a data flush to the flash in case of a powerloss.
However, It seems impossible to find out which SSDs do have this
powerloss protection, moreover, these enterprise SSDs are crazy
expensive compared to the SSDs above - moreover it's unclear if
powerloss protection is even available in the NVMe form factor. So
building a 1 or 2 TB cluster seems not really affordable/viable.
So, can please anyone give me hints what to do? Is it possible to ensure
that the write cache is not disabled in some way (my server is situated
in a data center, so there will probably never be loss of power).
Or is the link above already outdated as newer ceph releases somehow
deal with this problem? Or maybe a later Debian release (10) will handle
the O_DSYNC flag differently?
Perhaps I should simply invest in faster (and bigger) harddisks and
forget the SSD-cluster idea?
Thank you in advance for any help,
Best Regards,
Hermann
--
hermann(a)qwer.tk
PGP/GPG: 299893C7 (on keyservers)
I'm trying to deploy a ceph cluster with a cephadm tool. I've already successfully done all steps except adding OSDs. My testing equipment consists of three hosts. Each host has SSD storage, where OS is installed into. On that storage I created partition, which can be used as a ceph block.db. Hosts have also 2 additional HDs (spinning drives) for OSD data. On docs I couldn't find how to deploy such configuration. Do you have any hints, how to do that?
Thanks for help!
Hello everybody,
Can somebody add support for Debian buster and ceph-deploy:
https://tracker.ceph.com/issues/42870
Highly appreciated,
Regards,
Jelle de Jong
Dear all,
After enabling "allow_standby_replay" on our cluster we are getting
(lots) of identical errors on the client /var/log/messages like
Apr 29 14:21:26 hal kernel: ceph: mdsmap_decode got incorrect
state(up:standby-replay)
We are using the ml kernel 5.6.4-1.el7 on Scientific Linux 7.8
Cluster and client are running Ceph v14.2.9
Setting was enabled with:
# ceph fs set cephfs allow_standby_replay true
[root@ceph-s1 ~]# ceph mds stat
cephfs:1 {0=ceph-s3=up:active} 1 up:standby-replay 2 up:standby
Is this something to worry about, or should we just disable
allow_standby_replay ?
any advice appreciated,
many thanks
Jake
Note: I am working from home until further notice.
For help, contact unixadmin(a)mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267019
Mobile 0776 9886539