Hi Cephers,
To better understand how our current users utilize Ceph, we conducted a
public community survey. This information is a guide to the community of
how we spend our contribution efforts for future development. The survey
results will remain anonymous and aggregated in future Ceph Foundation
publications to the community.
I'm pleased to announce after much discussion on the Ceph dev mailing list
[0] that the community has formed the Ceph Survey for 2019.
The deadline for this survey due to it being out later than we'd like will
be January 31st, 2020 at 11:59 PT.
https://ceph.io/user-survey/
We have discussed in the future to use the Ceph telemetry module to collect
the data to save time for our users. Please let me know of any mistakes
that need to be corrected on the survey. Thanks!
[0] -
https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/WU374ZJP5N3NKY22X2…
--
Mike Perez
he/him
Ceph Community Manager
M: +1-951-572-2633
494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
@Thingee <https://twitter.com/thingee> Thingee
<https://www.linkedin.com/thingee> <https://www.facebook.com/RedHatInc>
<https://www.redhat.com>
This is the seventh bugfix release of the Mimic v13.2.x long term stable
release series. We recommend all Mimic users upgrade.
For the full release notes, see
https://ceph.io/releases/v13-2-7-mimic-released/
Notable Changes
MDS:
- Cache trimming is now throttled. Dropping the MDS cache via the “ceph
tell mds.<foo> cache drop” command or large reductions in the cache size
will no longer cause service unavailability.
- Behavior with recalling caps has been significantly improved to not
attempt recalling too many caps at once, leading to instability. MDS with
a large cache (64GB+) should be more stable.
- MDS now provides a config option “mds_max_caps_per_client” (default:
1M) to limit the number of caps a client session may hold. Long running
client sessions with a large number of caps have been a source of
instability in the MDS when all of these caps need to be processed during
certain session events. It is recommended to not unnecessarily increase
this value.
- The “mds_recall_state_timeout” config parameter has been removed. Late
client recall warnings are now generated based on the number of caps the
MDS has recalled which have not been released. The new config parameters
“mds_recall_warning_threshold” (default: 32K) and
“mds_recall_warning_decay_rate” (default: 60s) set the threshold for this
warning.
- The “cache drop” admin socket command has been removed. The “ceph tell
mds.X cache drop” remains.
OSD:
- A health warning is now generated if the average osd heartbeat ping
time exceeds a configurable threshold for any of the intervals computed.
The OSD computes 1 minute, 5 minute and 15 minute intervals with average,
minimum and maximum values. New configuration option
“mon_warn_on_slow_ping_ratio” specifies a percentage of
“osd_heartbeat_grace” to determine the threshold. A value of zero disables
the warning. A new configuration option “mon_warn_on_slow_ping_time”,
specified in milliseconds, overrides the computed value, causing a warning
when OSD heartbeat pings take longer than the specified amount. A new
admin command “ceph daemon mgr.# dump_osd_network [threshold]” lists all
connections with a ping time longer than the specified threshold or value
determined by the config options, for the average for any of the 3
intervals. A new admin command ceph daemon osd.# dump_osd_network
[threshold]” does the same but only including heartbeats initiated by the
specified OSD.
- The default value of the
“osd_deep_scrub_large_omap_object_key_threshold” parameter has been
lowered to detect an object with large number of omap keys more easily.
RGW:
- radosgw-admin introduces two subcommands that allow the managing of
expire-stale objects that might be left behind after a bucket reshard in
earlier versions of RGW. One subcommand lists such objects and the other
deletes them. Read the troubleshooting section of the dynamic resharding
docs for details.
Hi everyone,
We're pleased to announce that the next Cephalocon will be March 3-5 in
Seoul, South Korea!
https://ceph.com/cephalocon/seoul-2020/
The CFP for the conference is now open:
https://linuxfoundation.smapply.io/prog/cephalocon_2020
Main conference: March 4-5
Developer summit: March 3
Mark your calendars, and get your talk proposals in! The CFP will close
in early December in order to get a final schedule published in early
January.
In addition to the two day conference, we will also have a developer
summit on March 3 to take advantage of having so many developers in the
same place at the same time. The developer sessions will include video
conferencing so that remote developers will also be able to participate.
A sponsorship prospectus will be available Real Soon Now.
We hope you can join us!
Hi everyone,
We've identified a data corruption bug[1], first introduced[2] (by yours
truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption
appears as a rocksdb checksum error or assertion that looks like
os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available >= allocated)
or in some cases a rocksdb checksum error. It only affects BlueStore OSDs
that have a separate 'db' or 'wal' device.
We have a fix[3] that is working its way through testing, and will
expedite the next Nautilus point release (14.2.5) once it is ready.
If you are running 14.2.2 or 14.2.1 and use BlueStore OSDs with
separate 'db' volumes, you should consider waiting to upgrade
until 14.2.5 is released.
A big thank you to Igor Fedotov and several *extremely* helpful users who
managed to reproduce and track down this problem!
sage
[1] https://tracker.ceph.com/issues/42223
[2] https://github.com/ceph/ceph/commit/096033b9d931312c0688c2eea7e14626bfde0ad…
[3] https://github.com/ceph/ceph/pull/31621