There is a regression in librbd in v0.94.4 that can cause VMs to crash.
For now, please refrain from upgrading hypervisor nodes or other librbd
users to v0.94.4.
http://tracker.ceph.com/issues/13559
The problem does not affect server-side daemons (ceph-mon, ceph-osd,
etc.).
Jason's identified the bug and has a fix prepared, but it'll probably take
a few days before we have v0.94.5 out.
https://github.com/ceph/ceph/commit/4692c330bd992a06b97b5b8975ab71952b22477a
Thanks!
sage
This Hammer point fixes several important bugs in Hammer, as well as
fixing interoperability issues that are required before an upgrade to
Infernalis. That is, all users of earlier version of Hammer or any
version of Firefly will first need to upgrade to hammer v0.94.4 or
later before upgrading to Infernalis (or future releases).
All v0.94.x Hammer users are strongly encouraged to upgrade.
Changes
-------
* build/ops: ceph.spec.in: 50-rbd.rules conditional is wrong (#12166, Nathan Cutler)
* build/ops: ceph.spec.in: ceph-common needs python-argparse on older distros, but doesn't require it (#12034, Nathan Cutler)
* build/ops: ceph.spec.in: radosgw requires apache for SUSE only -- makes no sense (#12358, Nathan Cutler)
* build/ops: ceph.spec.in: rpm: cephfs_java not fully conditionalized (#11991, Nathan Cutler)
* build/ops: ceph.spec.in: rpm: not possible to turn off Java (#11992, Owen Synge)
* build/ops: ceph.spec.in: running fdupes unnecessarily (#12301, Nathan Cutler)
* build/ops: ceph.spec.in: snappy-devel for all supported distros (#12361, Nathan Cutler)
* build/ops: ceph.spec.in: SUSE/openSUSE builds need libbz2-devel (#11629, Nathan Cutler)
* build/ops: ceph.spec.in: useless %py_requires breaks SLE11-SP3 build (#12351, Nathan Cutler)
* build/ops: error in ext_mime_map_init() when /etc/mime.types is missing (#11864, Ken Dreyer)
* build/ops: upstart: limit respawn to 3 in 30 mins (instead of 5 in 30s) (#11798, Sage Weil)
* build/ops: With root as default user, unable to have multiple RGW instances running (#10927, Sage Weil)
* build/ops: With root as default user, unable to have multiple RGW instances running (#11140, Sage Weil)
* build/ops: With root as default user, unable to have multiple RGW instances running (#11686, Sage Weil)
* build/ops: With root as default user, unable to have multiple RGW instances running (#12407, Sage Weil)
* cli: ceph: cli throws exception on unrecognized errno (#11354, Kefu Chai)
* cli: ceph tell: broken error message / misleading hinting (#11101, Kefu Chai)
* common: arm: all programs that link to librados2 hang forever on startup (#12505, Boris Ranto)
* common: buffer: critical bufferlist::zero bug (#12252, Haomai Wang)
* common: ceph-object-corpus: add 0.94.2-207-g88e7ee7 hammer objects (#13070, Sage Weil)
* common: do not insert emtpy ptr when rebuild emtpy bufferlist (#12775, Xinze Chi)
* common: [ FAILED ] TestLibRBD.BlockingAIO (#12479, Jason Dillaman)
* common: LibCephFS.GetPoolId failure (#12598, Yan, Zheng)
* common: Memory leak in Mutex.cc, pthread_mutexattr_init without pthread_mutexattr_destroy (#11762, Ketor Meng)
* common: object_map_update fails with -EINVAL return code (#12611, Jason Dillaman)
* common: Pipe: Drop connect_seq increase line (#13093, Haomai Wang)
* common: recursive lock of md_config_t (0) (#12614, Josh Durgin)
* crush: ceph osd crush reweight-subtree does not reweight parent node (#11855, Sage Weil)
* doc: update docs to point to download.ceph.com (#13162, Alfredo Deza)
* fs: ceph-fuse 0.94.2-1trusty segfaults / aborts (#12297, Greg Farnum)
* fs: segfault launching ceph-fuse with bad --name (#12417, John Spray)
* librados: Change radosgw pools default crush ruleset (#11640, Yuan Zhou)
* librbd: correct issues discovered via lockdep / helgrind (#12345, Jason Dillaman)
* librbd: Crash during TestInternal.MultipleResize (#12664, Jason Dillaman)
* librbd: deadlock during cooperative exclusive lock transition (#11537, Jason Dillaman)
* librbd: Possible crash while concurrently writing and shrinking an image (#11743, Jason Dillaman)
* mon: add a cache layer over MonitorDBStore (#12638, Kefu Chai)
* mon: fix crush testing for new pools (#13400, Sage Weil)
* mon: get pools health'info have error (#12402, renhwztetecs)
* mon: implicit erasure code crush ruleset is not validated (#11814, Loic Dachary)
* mon: PaxosService: call post_refresh() instead of post_paxos_update() (#11470, Joao Eduardo Luis)
* mon: pgmonitor: wrong at/near target maxâ reporting (#12401, huangjun)
* mon: register_new_pgs() should check ruleno instead of its index (#12210, Xinze Chi)
* mon: Show osd as NONE in ceph osd map <pool> <object> output (#11820, Shylesh Kumar)
* mon: the output is wrong when runing ceph osd reweight (#12251, Joao Eduardo Luis)
* osd: allow peek_map_epoch to return an error (#13060, Sage Weil)
* osd: cache agent is idle although one object is left in the cache (#12673, Loic Dachary)
* osd: copy-from doesn't preserve truncate_{seq,size} (#12551, Samuel Just)
* osd: crash creating/deleting pools (#12429, John Spray)
* osd: fix repair when recorded digest is wrong (#12577, Sage Weil)
* osd: include/ceph_features: define HAMMER_0_94_4 feature (#13026, Sage Weil)
* osd: is_new_interval() fixes (#10399, Jason Dillaman)
* osd: is_new_interval() fixes (#11771, Jason Dillaman)
* osd: long standing slow requests: connection->session->waiting_for_map->connection ref cycle (#12338, Samuel Just)
* osd: Mutex Assert from PipeConnection::try_get_pipe (#12437, David Zafman)
* osd: pg_interval_t::check_new_interval - for ec pool, should not rely on min_size to determine if the PG was active at the interval (#12162, Guang G Yang)
* osd: PGLog.cc: 732: FAILED assert(log.log.size() == log_keys_debug.size()) (#12652, Sage Weil)
* osd: PGLog::proc_replica_log: correctly handle case where entries between olog.head and log.tail were split out (#11358, Samuel Just)
* osd: read on chunk-aligned xattr not handled (#12309, Sage Weil)
* osd: suicide timeout during peering - search for missing objects (#12523, Guang G Yang)
* osd: WBThrottle::clear_object: signal on cond when we reduce throttle values (#12223, Samuel Just)
* rbd: crash during shutdown after writeback blocked by IO errors (#12597, Jianpeng Ma)
* rgw: add delimiter to prefix only when path is specified (#12960, Sylvain Baubeau)
* rgw: create a tool for orphaned objects cleanup (#9604, Yehuda Sadeh)
* rgw: don't preserve acls when copying object (#11563, Yehuda Sadeh)
* rgw: don't preserve acls when copying object (#12370, Yehuda Sadeh)
* rgw: don't preserve acls when copying object (#13015, Yehuda Sadeh)
* rgw: Ensure that swift keys don't include backslashes (#7647, Yehuda Sadeh)
* rgw: GWWatcher::handle_error -> common/Mutex.cc: 95: FAILED assert(r == 0) (#12208, Yehuda Sadeh)
* rgw: HTTP return code is not being logged by CivetWeb (#12432, Yehuda Sadeh)
* rgw: init_rados failed leads to repeated delete (#12978, Xiaowei Chen)
* rgw: init some manifest fields when handling explicit objs (#11455, Yehuda Sadeh)
* rgw: Keystone Fernet tokens break auth (#12761, Abhishek Lekshmanan)
* rgw: region data still exist in region-map after region-map update (#12964, dwj192)
* rgw: remove trailing :port from host for purposes of subdomain matching (#12353, Yehuda Sadeh)
* rgw: rest-bench common/WorkQueue.cc: 54: FAILED assert(_threads.empty()) (#3896, huangjun)
* rgw: returns requested bucket name raw in Bucket response header (#12537, Yehuda Sadeh)
* rgw: segmentation fault when rgw_gc_max_objs > HASH_PRIME (#12630, Ruifeng Yang)
* rgw: segments are read during HEAD on Swift DLO (#12780, Yehuda Sadeh)
* rgw: setting max number of buckets for user via ceph.conf option (#12714, Vikhyat Umrao)
* rgw: Swift API: X-Trans-Id header is wrongly formatted (#12108, Radoslaw Zarzynski)
* rgw: testGetContentType and testHead failed (#11091, Radoslaw Zarzynski)
* rgw: testGetContentType and testHead failed (#11438, Radoslaw Zarzynski)
* rgw: testGetContentType and testHead failed (#12157, Radoslaw Zarzynski)
* rgw: testGetContentType and testHead failed (#12158, Radoslaw Zarzynski)
* rgw: testGetContentType and testHead failed (#12363, Radoslaw Zarzynski)
* rgw: the arguments 'domain' should not be assigned when return false (#12629, Ruifeng Yang)
* tests: qa/workunits/cephtool/test.sh: don't assume crash_replay_interval=45 (#13406, Sage Weil)
* tests: TEST_crush_rule_create_erasure consistently fails on i386 builder (#12419, Loic Dachary)
* tools: ceph-disk zap should ensure block device (#11272, Loic Dachary)
For more detailed information, see the complete changelog at
http://docs.ceph.com/docs/master/_downloads/v0.94.4.txt
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-0.94.4.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
This is the first Infernalis release candidate. There have been some
major changes since hammer, and the upgrade process is non-trivial.
Please read carefully.
Getting the release candidate
-----------------------------
The v9.1.0 packages are pushed to the development release repositories::
http://download.ceph.com/rpm-testinghttp://download.ceph.com/debian-testing
For for info, see::
http://docs.ceph.com/docs/master/install/get-packages/
Or install with ceph-deploy via::
ceph-deploy install --testing HOST
Known issues
------------
* librbd and librados ABI compatibility is broken. Be careful
installing this RC on client machines (e.g., those running qemu).
It will be fixed in the final v9.2.0 release.
Major Changes from Hammer
-------------------------
* *General*:
* Ceph daemons are now managed via systemd (with the exception of
Ubuntu Trusty, which still uses upstart).
* Ceph daemons run as 'ceph' user instead root.
* On Red Hat distros, there is also an SELinux policy.
* *RADOS*:
* The RADOS cache tier can now proxy write operations to the base
tier, allowing writes to be handled without forcing migration of
an object into the cache.
* The SHEC erasure coding support is no longer flagged as
experimental. SHEC trades some additional storage space for faster
repair.
* There is now a unified queue (and thus prioritization) of client
IO, recovery, scrubbing, and snapshot trimming.
* There have been many improvements to low-level repair tooling
(ceph-objectstore-tool).
* The internal ObjectStore API has been significantly cleaned up in order
to faciliate new storage backends like NewStore.
* *RGW*:
* The Swift API now supports object expiration.
* There are many Swift API compatibility improvements.
* *RBD*:
* The ``rbd du`` command shows actual usage (quickly, when
object-map is enabled).
* The object-map feature has seen many stability improvements.
* Object-map and exclusive-lock features can be enabled or disabled
dynamically.
* You can now store user metadata and set persistent librbd options
associated with individual images.
* The new deep-flatten features allows flattening of a clone and all
of its snapshots. (Previously snapshots could not be flattened.)
* The export-diff command command is now faster (it uses aio). There is also
a new fast-diff feature.
* The --size argument can be specified with a suffix for units
(e.g., ``--size 64G``).
* There is a new ``rbd status`` command that, for now, shows who has
the image open/mapped.
* *CephFS*:
* You can now rename snapshots.
* There have been ongoing improvements around administration, diagnostics,
and the check and repair tools.
* The caching and revocation of client cache state due to unused
inodes has been dramatically improved.
* The ceph-fuse client behaves better on 32-bit hosts.
Distro compatibility
--------------------
We have decided to drop support for many older distributions so that we can
move to a newer compiler toolchain (e.g., C++11). Although it is still possible
to build Ceph on older distributions by installing backported development tools,
we are not building and publishing release packages for ceph.com.
In particular,
* CentOS 7 or later; we have dropped support for CentOS 6 (and other
RHEL 6 derivatives, like Scientific Linux 6).
* Debian Jessie 8.x or later; Debian Wheezy 7.x's g++ has incomplete
support for C++11 (and no systemd).
* Ubuntu Trusty 14.04 or later; Ubuntu Precise 12.04 is no longer
supported.
* Fedora 22 or later.
Upgrading from Firefly
----------------------
Upgrading directly from Firefly v0.80.z is not possible. All clusters
must first upgrade to Hammer v0.94.4 or a later v0.94.z release; only
then is it possible to upgrade to Infernalis 9.2.z.
Note that v0.94.4 isn't released yet, but you can upgrade to a test build
from gitbuilder with::
ceph-deploy install --dev hammer HOST
The v0.94.4 Hammer point release will be out before v9.2.0 Infernalis
is.
Upgrading from Hammer
---------------------
* For all distributions that support systemd (CentOS 7, Fedora, Debian
Jessie 8.x, OpenSUSE), ceph daemons are now managed using native systemd
files instead of the legacy sysvinit scripts. For example,::
systemctl start ceph.target # start all daemons
systemctl status ceph-osd@12 # check status of osd.12
The main notable distro that is *not* yet using systemd is Ubuntu trusty
14.04. (The next Ubuntu LTS, 16.04, will use systemd instead of upstart.)
* Ceph daemons now run as user and group ``ceph`` by default. The
ceph user has a static UID assigned by Fedora and Debian (also used
by derivative distributions like RHEL/CentOS and Ubuntu). On SUSE
the ceph user will currently get a dynamically assigned UID when the
user is created.
If your systems already have a ceph user, upgrading the package will cause
problems. We suggest you first remove or rename the existing 'ceph' user
before upgrading.
When upgrading, administrators have two options:
#. Add the following line to ``ceph.conf`` on all hosts::
setuser match path = /var/lib/ceph/$type/$cluster-$id
This will make the Ceph daemons run as root (i.e., not drop
privileges and switch to user ceph) if the daemon's data
directory is still owned by root. Newly deployed daemons will
be created with data owned by user ceph and will run with
reduced privileges, but upgraded daemons will continue to run as
root.
#. Fix the data ownership during the upgrade. This is the preferred option,
but is more work. The process for each host would be to:
#. Upgrade the ceph package. This creates the ceph user and group. For
example::
ceph-deploy install --stable infernalis HOST
#. Stop the daemon(s).::
service ceph stop # fedora, centos, rhel, debian
stop ceph-all # ubuntu
#. Fix the ownership::
chown -R ceph:ceph /var/lib/ceph
#. Restart the daemon(s).::
start ceph-all # ubuntu
systemctl start ceph.target # debian, centos, fedora, rhel
* The on-disk format for the experimental KeyValueStore OSD backend has
changed. You will need to remove any OSDs using that backend before you
upgrade any test clusters that use it.
Upgrade notes
-------------
* When a pool quota is reached, librados operations now block indefinitely,
the same way they do when the cluster fills up. (Previously they would return
-ENOSPC). By default, a full cluster or pool will now block. If your
librados application can handle ENOSPC or EDQUOT errors gracefully, you can
get error returns instead by using the new librados OPERATION_FULL_TRY flag.
Notable changes
---------------
NOTE: These notes are somewhat abbreviated while we find a less
time-consuming process for generating them.
* build: C++11 now supported
* build: many cmake improvements
* build: OSX build fixes (Yan, Zheng)
* build: remove rest-bench
* ceph-disk: many fixes (Loic Dachary)
* ceph-disk: support for multipath devices (Loic Dachary)
* ceph-fuse: mostly behave on 32-bit hosts (Yan, Zheng)
* ceph-objectstore-tool: many improvements (David Zafman)
* common: bufferlist performance tuning (Piotr Dalek, Sage Weil)
* common: make mutex more efficient
* common: some async compression infrastructure (Haomai Wang)
* librados: add FULL_TRY and FULL_FORCE flags for dealing with full clusters or pools (Sage Weil)
* librados: fix notify completion race (#13114 Sage Weil)
* librados, libcephfs: randomize client nonces (Josh Durgin)
* librados: pybind: fix binary omap values (Robin H. Johnson)
* librbd: fix reads larger than the cache size (Lu Shi)
* librbd: metadata filter fixes (Haomai Wang)
* librbd: use write_full when possible (Zhiqiang Wang)
* mds: avoid emitting cap warnigns before evicting session (John Spray)
* mds: fix expected holes in journal objects (#13167 Yan, Zheng)
* mds: fix SnapServer crash on deleted pool (John Spray)
* mds: many fixes (Yan, Zheng, John Spray, Greg Farnum)
* mon: add cache over MonitorDBStore (Kefu Chai)
* mon: 'ceph osd metadata' can dump all osds (Haomai Wang)
* mon: detect kv backend failures (Sage Weil)
* mon: fix CRUSH map test for new pools (Sage Weil)
* mon: fix min_last_epoch_clean tracking (Kefu Chai)
* mon: misc scaling fixes (Sage Weil)
* mon: streamline session handling, fix memory leaks (Sage Weil)
* mon: upgrades must pass through hammer (Sage Weil)
* msg/async: many fixes (Haomai Wang)
* osd: cache proxy-write support (Zhiqiang Wang, Samuel Just)
* osd: configure promotion based on write recency (Zhiqiang Wang)
* osd: don't send dup MMonGetOSDMap requests (Sage Weil, Kefu Chai)
* osd: erasure-code: fix SHEC floating point bug (#12936 Loic Dachary)
* osd: erasure-code: update to ISA-L 2.14 (Yuan Zhou)
* osd: fix hitset object naming to use GMT (Kefu Chai)
* osd: fix misc memory leaks (Sage Weil)
* osd: fix peek_queue locking in FileStore (Xinze Chi)
* osd: fix promotion vs full cache tier (Samuel Just)
* osd: fix replay requeue when pg is still activating (#13116 Samuel Just)
* osd: fix scrub stat bugs (Sage Weil, Samuel Just)
* osd: force promotion for ops EC can't handle (Zhiqiang Wang)
* osd: improve behavior on machines with large memory pages (Steve Capper)
* osd: merge multiple setattr calls into a setattrs call (Xinxin Shu)
* osd: newstore prototype (Sage Weil)
* osd: ObjectStore internal API refactor (Sage Weil)
* osd: SHEC no longer experimental
* osd: throttle evict ops (Yunchuan Wen)
* osd: upgrades must pass through hammer (Sage Weil)
* osd: use SEEK_HOLE / SEEK_DATA for sparse copy (Xinxin Shu)
* rbd: rbd-replay-prep and rbd-replay improvements (Jason Dillaman)
* rgw: expose the number of unhealthy workers through admin socket (Guang Yang)
* rgw: fix casing of Content-Type header (Robin H. Johnson)
* rgw: fix decoding of X-Object-Manifest from GET on Swift DLO (Radslow Rzarzynski)
* rgw: fix sysvinit script
* rgw: fix sysvinit script w/ multiple instances (Sage Weil, Pavan Rallabhandi)
* rgw: improve handling of already removed buckets in expirer (Radoslaw Rzarzynski)
* rgw: log to /var/log/ceph instead of /var/log/radosgw
* rgw: rework X-Trans-Id header to be conform with Swift API (Radoslaw Rzarzynski)
* rgw: s3 encoding-type for get bucket (Jeff Weber)
* rgw: set max buckets per user in ceph.conf (Vikhyat Umrao)
* rgw: support for Swift expiration API (Radoslaw Rzarzynski, Yehuda Sadeh)
* rgw: user rm is idempotent (Orit Wasserman)
* selinux policy (Boris Ranto, Milan Broz)
* systemd: many fixes (Sage Weil, Owen Synge, Boris Ranto, Dan van der Ster)
* systemd: run daemons as user ceph
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-9.1.0.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy