This major release of Ceph will be the foundation for the next
long-term stable release. There have been many major changes since
the Infernalis (9.2.x) and Hammer (0.94.x) releases, and the upgrade
process is non-trivial. Please read these release notes carefully.
For the complete release notes, please see
http://ceph.com/releases/v10-2-0-jewel-released/
Major Changes from Infernalis
-----------------------------
- *CephFS*:
* This is the first release in which CephFS is declared stable and
production ready! Several features are disabled by default, including
snapshots and multiple active MDS servers.
* The repair and disaster recovery tools are now feature-complete.
* A new cephfs-volume-manager module is included that provides a
high-level interface for creating "shares" for OpenStack Manila
and similar projects.
* There is now experimental support for multiple CephFS file systems
within a single cluster.
- *RGW*:
* The multisite feature has been almost completely rearchitected and
rewritten to support any number of clusters/sites, bidirectional
fail-over, and active/active configurations.
* You can now access radosgw buckets via NFS (experimental).
* The AWS4 authentication protocol is now supported.
* There is now support for S3 request payer buckets.
* The new multitenancy infrastructure improves compatibility with
Swift, which provides a separate container namespace for each
user/tenant.
* The OpenStack Keystone v3 API is now supported. There are a range
of other small Swift API features and compatibility improvements
as well, including bulk delete and SLO (static large objects).
- *RBD*:
* There is new support for mirroring (asynchronous replication) of
RBD images across clusters. This is implemented as a per-RBD
image journal that can be streamed across a WAN to another site,
and a new rbd-mirror daemon that performs the cross-cluster
replication.
* The exclusive-lock, object-map, fast-diff, and journaling features
can be enabled or disabled dynamically. The deep-flatten features
can be disabled dynamically but not re-enabled.
* The RBD CLI has been rewritten to provide command-specific help
and full bash completion support.
* RBD snapshots can now be renamed.
- *RADOS*:
* BlueStore, a new OSD backend, is included as an experimental
feature. The plan is for it to become the default backend in the
K or L release.
* The OSD now persists scrub results and provides a librados API to
query results in detail.
* We have revised our documentation to recommend *against* using
ext4 as the underlying filesystem for Ceph OSD daemons due to
problems supporting our long object name handling.
Major Changes from Hammer
-------------------------
- *General*:
* Ceph daemons are now managed via systemd (with the exception of
Ubuntu Trusty, which still uses upstart).
* Ceph daemons run as 'ceph' user instead of 'root'.
* On Red Hat distros, there is also an SELinux policy.
- *RADOS*:
* The RADOS cache tier can now proxy write operations to the base
tier, allowing writes to be handled without forcing migration of
an object into the cache.
* The SHEC erasure coding support is no longer flagged as
experimental. SHEC trades some additional storage space for faster
repair.
* There is now a unified queue (and thus prioritization) of client
IO, recovery, scrubbing, and snapshot trimming.
* There have been many improvements to low-level repair tooling
(ceph-objectstore-tool).
* The internal ObjectStore API has been significantly cleaned up in order
to faciliate new storage backends like BlueStore.
- *RGW*:
* The Swift API now supports object expiration.
* There are many Swift API compatibility improvements.
- *RBD*:
* The ``rbd du`` command shows actual usage (quickly, when
object-map is enabled).
* The object-map feature has seen many stability improvements.
* The object-map and exclusive-lock features can be enabled or disabled
dynamically.
* You can now store user metadata and set persistent librbd options
associated with individual images.
* The new deep-flatten features allow flattening of a clone and all
of its snapshots. (Previously snapshots could not be flattened.)
* The export-diff command is now faster (it uses aio). There is also
a new fast-diff feature.
* The --size argument can be specified with a suffix for units
(e.g., ``--size 64G``).
* There is a new ``rbd status`` command that, for now, shows who has
the image open/mapped.
- *CephFS*:
* You can now rename snapshots.
* There have been ongoing improvements around administration, diagnostics,
and the check and repair tools.
* The caching and revocation of client cache state due to unused
inodes has been dramatically improved.
* The ceph-fuse client behaves better on 32-bit hosts.
Distro compatibility
--------------------
Starting with Infernalis, we have dropped support for many older
distributions so that we can move to a newer compiler toolchain (e.g.,
C++11). Although it is still possible to build Ceph on older
distributions by installing backported development tools, we are not
building and publishing release packages for ceph.com.
We now build packages for the following distributions and architectures:
- x86_64:
* CentOS 7.x. We have dropped support for CentOS 6 (and other RHEL 6
derivatives, like Scientific Linux 6).
* Debian Jessie 8.x. Debian Wheezy 7.x's g++ has incomplete support
for C++11 (and no systemd).
* Ubuntu Xenial 16.04 and Trusty 14.04. Ubuntu Precise 12.04 is no
longer supported.
* Fedora 22 or later.
- aarch64 / arm64:
* Ubuntu Xenial 16.04.
Upgrading from Infernalis or Hammer
-----------------------------------
* We now recommend against using ``ext4`` as the underlying file
system for Ceph OSDs, especially when RGW or other users of long
RADOS object names are used. For more information about why, please
see `Filesystem Recommendations`_.
If you have an existing cluster that uses ext4 for the OSDs but uses only
RBD and/or CephFS, then the ext4 limitations will not affect you. Before
upgrading, be sure add the following to ``ceph.conf`` to allow the OSDs to
start::
osd max object name len = 256
osd max object namespace len = 64
Keep in mind that if you set these lower object name limits and
later decide to use RGW on this cluster, it will have problems
storing S3/Swift objects with long names. This startup check can also be
disabled via the below option, although this is not recommended::
osd check max object name len on startup = false
.. _Filesystem Recommendations: ../configuration/filesystem-recommendations
* There are no major compatibility changes since Infernalis. Simply
upgrading the daemons on each host and restarting all daemons is
sufficient.
* The rbd CLI no longer accepts the deprecated '--image-features' option
during create, import, and clone operations. The '--image-feature'
option should be used instead.
* The rbd legacy image format (version 1) is deprecated with the Jewel release.
Attempting to create a new version 1 RBD image will result in a warning.
Future releases of Ceph will remove support for version 1 RBD images.
* The 'send_pg_creates' and 'map_pg_creates' mon CLI commands are
obsolete and no longer supported.
* A new configure option 'mon_election_timeout' is added to specifically
limit max waiting time of monitor election process, which was previously
restricted by 'mon_lease'.
* CephFS filesystems created using versions older than Firefly (0.80) must
use the new 'cephfs-data-scan tmap_upgrade' command after upgrading to
Jewel. See 'Upgrading' in the CephFS documentation for more information.
* The 'ceph mds setmap' command has been removed.
* The default RBD image features for new images have been updated to
enable the following: exclusive lock, object map, fast-diff, and
deep-flatten. These features are not currently supported by the RBD
kernel driver nor older RBD clients. They can be disabled on a per-image
basis via the RBD CLI, or the default features can be updated to the
pre-Jewel setting by adding the following to the client section of the Ceph
configuration file::
rbd default features = 1
* The rbd legacy image format (version 1) is deprecated with the Jewel
release.
* After upgrading, users should set the 'sortbitwise' flag to enable the new
internal object sort order::
ceph osd set sortbitwise
This flag is important for the new object enumeration API and for
new backends like BlueStore.
* The rbd CLI no longer permits creating images and snapshots with potentially
ambiguous names (e.g. the '/' and '@' characters are disallowed). The
validation can be temporarily disabled by adding "--rbd-validate-names=false"
to the rbd CLI when creating an image or snapshot. It can also be disabled
by adding the following to the client section of the Ceph configuration file::
rbd validate names = false
Upgrading from Hammer
---------------------
* All cluster nodes must first upgrade to Hammer v0.94.4 or a later
v0.94.z release; only then is it possible to upgrade to Jewel
10.2.z.
* For all distributions that support systemd (CentOS 7, Fedora, Debian
Jessie 8.x, OpenSUSE), ceph daemons are now managed using native systemd
files instead of the legacy sysvinit scripts. For example,::
systemctl start ceph.target # start all daemons
systemctl status ceph-osd@12 # check status of osd.12
The main notable distro that is *not* yet using systemd is Ubuntu trusty
14.04. (The next Ubuntu LTS, 16.04, will use systemd instead of upstart.)
* Ceph daemons now run as user and group ``ceph`` by default. The
ceph user has a static UID assigned by Fedora and Debian (also used by
derivative distributions like RHEL/CentOS and Ubuntu). On SUSE the same
UID/GID as in Fedora and Debian will be used, *provided it is not already
assigned*. In the unlikely event the preferred UID or GID is assigned to a
different user/group, ceph will get a dynamically assigned UID/GID.
If your systems already have a ceph user, upgrading the package will cause
problems. We suggest you first remove or rename the existing 'ceph' user
and 'ceph' group before upgrading.
When upgrading, administrators have two options:
#. Add the following line to ``ceph.conf`` on all hosts::
setuser match path = /var/lib/ceph/$type/$cluster-$id
This will make the Ceph daemons run as root (i.e., not drop
privileges and switch to user ceph) if the daemon's data
directory is still owned by root. Newly deployed daemons will
be created with data owned by user ceph and will run with
reduced privileges, but upgraded daemons will continue to run as
root.
#. Fix the data ownership during the upgrade. This is the
preferred option, but it is more work and can be very time
consuming. The process for each host is to:
#. Upgrade the ceph package. This creates the ceph user and group. For
example::
ceph-deploy install --stable jewel HOST
#. Stop the daemon(s).::
service ceph stop # fedora, centos, rhel, debian
stop ceph-all # ubuntu
#. Fix the ownership::
chown -R ceph:ceph /var/lib/ceph
#. Restart the daemon(s).::
start ceph-all # ubuntu
systemctl start ceph.target # debian, centos, fedora, rhel
Alternatively, the same process can be done with a single daemon
type, for example by stopping only monitors and chowning only
``/var/lib/ceph/mon``.
* The on-disk format for the experimental KeyValueStore OSD backend has
changed. You will need to remove any OSDs using that backend before you
upgrade any test clusters that use it.
* When a pool quota is reached, librados operations now block indefinitely,
the same way they do when the cluster fills up. (Previously they would return
-ENOSPC.) By default, a full cluster or pool will now block. If your
librados application can handle ENOSPC or EDQUOT errors gracefully, you can
get error returns instead by using the new librados OPERATION_FULL_TRY flag.
* The return code for librbd's rbd_aio_read and Image::aio_read API methods no
longer returns the number of bytes read upon success. Instead, it returns 0
upon success and a negative value upon failure.
* 'ceph scrub', 'ceph compact' and 'ceph sync force' are now DEPRECATED. Users
should instead use 'ceph mon scrub', 'ceph mon compact' and
'ceph mon sync force'.
* 'ceph mon_metadata' should now be used as 'ceph mon metadata'. There is no
need to deprecate this command (same major release since it was first
introduced).
* The `--dump-json` option of "osdmaptool" is replaced by `--dump json`.
* The commands of "pg ls-by-{pool,primary,osd}" and "pg ls" now take "recovering"
instead of "recovery", to include the recovering pgs in the listed pgs.
Upgrading from Firefly
----------------------
Upgrading directly from Firefly v0.80.z is not recommended. It is
possible to do a direct upgrade, but not without downtime, as all OSDs
must be stopped, upgraded, and then restarted. We recommend that
clusters be first upgraded to Hammer v0.94.6 or a later v0.94.z
release; only then is it possible to upgrade to Jewel 10.2.z for an
online upgrade (see below).
To do an offline upgrade directly from Firefly, all Firefly OSDs must
be stopped and marked down before any Jewel OSDs will be allowed
to start up. This fencing is enforced by the Jewel monitor, so
you should use an upgrade procedure like:
#. Upgrade Ceph on monitor hosts
#. Restart all ceph-mon daemons
#. Set noout::
ceph osd set noout
#. Upgrade Ceph on all OSD hosts
#. Stop all ceph-osd daemons
#. Mark all OSDs down with something like::
ceph osd down `seq 0 1000`
#. Start all ceph-osd daemons
#. Let the cluster settle and then unset noout::
ceph osd unset noout
#. Upgrade and restart any remaining daemons (ceph-mds, radosgw)
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-10.2.0.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
Hi everyone,
The third (and likely final) Jewel release candidate is out. We have a
very small number of remaining blocker issues and a bit of final polish
before we publish Jewel 10.2.0, probably next week.
There are no known issues with this release that are serious enough to
warn about here. Greg is adding some CephFS checks so that admins don't
accidentally start using less-stable features, there is a pending OSD
startup check to ensure that the OSD's underlying fs can handle the
configured max object name length, and there are a few bug to squash with
the new rgw replication configuration process. Otherwise, it's looking
pretty good!
This is your last chance to some deployment testing before the final
release.
Draft release notes are here:
http://docs.ceph.com/docs/master/release-notes/
Thanks!
sage
Hi,
ext4 has never been recommended, but we did test it. After Jewel is out,
we would like explicitly recommend *against* ext4 and stop testing it.
Why:
Recently we discovered an issue with the long object name handling that is
not fixable without rewriting a significant chunk of FileStores filename
handling. (There is a limit in the amount of xattr data ext4 can store in
the inode, which causes problems in LFNIndex.)
We *could* invest a ton of time rewriting this to fix, but it only affects
ext4, which we never recommended, and we plan to deprecate FileStore once
BlueStore is stable anyway, so it seems like a waste of time that would be
better spent elsewhere.
Also, by dropping ext4 test coverage in ceph-qa-suite, we can
significantly improve time/coverage for FileStore on XFS and on BlueStore.
The long file name handling is problematic anytime someone is storing
rados objects with long names. The primary user that does this is RGW,
which means any RGW cluster using ext4 should recreate their OSDs to use
XFS. Other librados users could be affected too, though, like users
with very long rbd image names (e.g., > 100 characters), or custom
librados users.
How:
To make this change as visible as possible, the plan is to make ceph-osd
refuse to start if the backend is unable to support the configured max
object name (osd_max_object_name_len). The OSD will complain that ext4
cannot store such an object and refuse to start. A user who is only using
RBD might decide they don't need long file names to work and can adjust
the osd_max_object_name_len setting to something small (say, 64) and run
successfully. They would be taking a risk, though, because we would like
to stop testing on ext4.
Is this reasonable? If there significant ext4 users that are unwilling to
recreate their OSDs, now would be the time to speak up.
Thanks!
sage
Hi all,
We've pushed 10.1.1, a second release candidate for Jewel. This fixes
another round of bugs, and we are getting pretty close to a final release.
There are a few known issues to watch out for:
- Old CephFS clusters will mangle the layouts with this release; the
fix was committed just after it was cut. Wait for the next RC or the
release if you're upgrading a cluster that has a CephFS data pool as pool
0.
- The upstart ceph-mds-all.conf file is missing.
Also, this is the first release build that includes arm64/aarch64 packages
for Ubuntu Xenial 16.04. Yay! The CentOS7 builds are waiting on
the EPEL repos; hopefully that will happen soon.
Please review the release notes before trying:
http://docs.ceph.com/docs/master/release-notes/
Thanks!
sage