Ceph-announce April 2016

ceph-announce@ceph.io

3 participants
4 discussions

by Sage Weil

This major release of Ceph will be the foundation for the next long-term stable release. There have been many major changes since the Infernalis (9.2.x) and Hammer (0.94.x) releases, and the upgrade process is non-trivial. Please read these release notes carefully. For the complete release notes, please see http://ceph.com/releases/v10-2-0-jewel-released/ Major Changes from Infernalis ----------------------------- - *CephFS*: * This is the first release in which CephFS is declared stable and production ready! Several features are disabled by default, including snapshots and multiple active MDS servers. * The repair and disaster recovery tools are now feature-complete. * A new cephfs-volume-manager module is included that provides a high-level interface for creating "shares" for OpenStack Manila and similar projects. * There is now experimental support for multiple CephFS file systems within a single cluster. - *RGW*: * The multisite feature has been almost completely rearchitected and rewritten to support any number of clusters/sites, bidirectional fail-over, and active/active configurations. * You can now access radosgw buckets via NFS (experimental). * The AWS4 authentication protocol is now supported. * There is now support for S3 request payer buckets. * The new multitenancy infrastructure improves compatibility with Swift, which provides a separate container namespace for each user/tenant. * The OpenStack Keystone v3 API is now supported. There are a range of other small Swift API features and compatibility improvements as well, including bulk delete and SLO (static large objects). - *RBD*: * There is new support for mirroring (asynchronous replication) of RBD images across clusters. This is implemented as a per-RBD image journal that can be streamed across a WAN to another site, and a new rbd-mirror daemon that performs the cross-cluster replication. * The exclusive-lock, object-map, fast-diff, and journaling features can be enabled or disabled dynamically. The deep-flatten features can be disabled dynamically but not re-enabled. * The RBD CLI has been rewritten to provide command-specific help and full bash completion support. * RBD snapshots can now be renamed. - *RADOS*: * BlueStore, a new OSD backend, is included as an experimental feature. The plan is for it to become the default backend in the K or L release. * The OSD now persists scrub results and provides a librados API to query results in detail. * We have revised our documentation to recommend *against* using ext4 as the underlying filesystem for Ceph OSD daemons due to problems supporting our long object name handling. Major Changes from Hammer ------------------------- - *General*: * Ceph daemons are now managed via systemd (with the exception of Ubuntu Trusty, which still uses upstart). * Ceph daemons run as 'ceph' user instead of 'root'. * On Red Hat distros, there is also an SELinux policy. - *RADOS*: * The RADOS cache tier can now proxy write operations to the base tier, allowing writes to be handled without forcing migration of an object into the cache. * The SHEC erasure coding support is no longer flagged as experimental. SHEC trades some additional storage space for faster repair. * There is now a unified queue (and thus prioritization) of client IO, recovery, scrubbing, and snapshot trimming. * There have been many improvements to low-level repair tooling (ceph-objectstore-tool). * The internal ObjectStore API has been significantly cleaned up in order to faciliate new storage backends like BlueStore. - *RGW*: * The Swift API now supports object expiration. * There are many Swift API compatibility improvements. - *RBD*: * The ``rbd du`` command shows actual usage (quickly, when object-map is enabled). * The object-map feature has seen many stability improvements. * The object-map and exclusive-lock features can be enabled or disabled dynamically. * You can now store user metadata and set persistent librbd options associated with individual images. * The new deep-flatten features allow flattening of a clone and all of its snapshots. (Previously snapshots could not be flattened.) * The export-diff command is now faster (it uses aio). There is also a new fast-diff feature. * The --size argument can be specified with a suffix for units (e.g., ``--size 64G``). * There is a new ``rbd status`` command that, for now, shows who has the image open/mapped. - *CephFS*: * You can now rename snapshots. * There have been ongoing improvements around administration, diagnostics, and the check and repair tools. * The caching and revocation of client cache state due to unused inodes has been dramatically improved. * The ceph-fuse client behaves better on 32-bit hosts. Distro compatibility -------------------- Starting with Infernalis, we have dropped support for many older distributions so that we can move to a newer compiler toolchain (e.g., C++11). Although it is still possible to build Ceph on older distributions by installing backported development tools, we are not building and publishing release packages for ceph.com. We now build packages for the following distributions and architectures: - x86_64: * CentOS 7.x. We have dropped support for CentOS 6 (and other RHEL 6 derivatives, like Scientific Linux 6). * Debian Jessie 8.x. Debian Wheezy 7.x's g++ has incomplete support for C++11 (and no systemd). * Ubuntu Xenial 16.04 and Trusty 14.04. Ubuntu Precise 12.04 is no longer supported. * Fedora 22 or later. - aarch64 / arm64: * Ubuntu Xenial 16.04. Upgrading from Infernalis or Hammer ----------------------------------- * We now recommend against using ``ext4`` as the underlying file system for Ceph OSDs, especially when RGW or other users of long RADOS object names are used. For more information about why, please see `Filesystem Recommendations`_. If you have an existing cluster that uses ext4 for the OSDs but uses only RBD and/or CephFS, then the ext4 limitations will not affect you. Before upgrading, be sure add the following to ``ceph.conf`` to allow the OSDs to start:: osd max object name len = 256 osd max object namespace len = 64 Keep in mind that if you set these lower object name limits and later decide to use RGW on this cluster, it will have problems storing S3/Swift objects with long names. This startup check can also be disabled via the below option, although this is not recommended:: osd check max object name len on startup = false .. _Filesystem Recommendations: ../configuration/filesystem-recommendations * There are no major compatibility changes since Infernalis. Simply upgrading the daemons on each host and restarting all daemons is sufficient. * The rbd CLI no longer accepts the deprecated '--image-features' option during create, import, and clone operations. The '--image-feature' option should be used instead. * The rbd legacy image format (version 1) is deprecated with the Jewel release. Attempting to create a new version 1 RBD image will result in a warning. Future releases of Ceph will remove support for version 1 RBD images. * The 'send_pg_creates' and 'map_pg_creates' mon CLI commands are obsolete and no longer supported. * A new configure option 'mon_election_timeout' is added to specifically limit max waiting time of monitor election process, which was previously restricted by 'mon_lease'. * CephFS filesystems created using versions older than Firefly (0.80) must use the new 'cephfs-data-scan tmap_upgrade' command after upgrading to Jewel. See 'Upgrading' in the CephFS documentation for more information. * The 'ceph mds setmap' command has been removed. * The default RBD image features for new images have been updated to enable the following: exclusive lock, object map, fast-diff, and deep-flatten. These features are not currently supported by the RBD kernel driver nor older RBD clients. They can be disabled on a per-image basis via the RBD CLI, or the default features can be updated to the pre-Jewel setting by adding the following to the client section of the Ceph configuration file:: rbd default features = 1 * The rbd legacy image format (version 1) is deprecated with the Jewel release. * After upgrading, users should set the 'sortbitwise' flag to enable the new internal object sort order:: ceph osd set sortbitwise This flag is important for the new object enumeration API and for new backends like BlueStore. * The rbd CLI no longer permits creating images and snapshots with potentially ambiguous names (e.g. the '/' and '@' characters are disallowed). The validation can be temporarily disabled by adding "--rbd-validate-names=false" to the rbd CLI when creating an image or snapshot. It can also be disabled by adding the following to the client section of the Ceph configuration file:: rbd validate names = false Upgrading from Hammer --------------------- * All cluster nodes must first upgrade to Hammer v0.94.4 or a later v0.94.z release; only then is it possible to upgrade to Jewel 10.2.z. * For all distributions that support systemd (CentOS 7, Fedora, Debian Jessie 8.x, OpenSUSE), ceph daemons are now managed using native systemd files instead of the legacy sysvinit scripts. For example,:: systemctl start ceph.target # start all daemons systemctl status ceph-osd@12 # check status of osd.12 The main notable distro that is *not* yet using systemd is Ubuntu trusty 14.04. (The next Ubuntu LTS, 16.04, will use systemd instead of upstart.) * Ceph daemons now run as user and group ``ceph`` by default. The ceph user has a static UID assigned by Fedora and Debian (also used by derivative distributions like RHEL/CentOS and Ubuntu). On SUSE the same UID/GID as in Fedora and Debian will be used, *provided it is not already assigned*. In the unlikely event the preferred UID or GID is assigned to a different user/group, ceph will get a dynamically assigned UID/GID. If your systems already have a ceph user, upgrading the package will cause problems. We suggest you first remove or rename the existing 'ceph' user and 'ceph' group before upgrading. When upgrading, administrators have two options: #. Add the following line to ``ceph.conf`` on all hosts:: setuser match path = /var/lib/ceph/$type/$cluster-$id This will make the Ceph daemons run as root (i.e., not drop privileges and switch to user ceph) if the daemon's data directory is still owned by root. Newly deployed daemons will be created with data owned by user ceph and will run with reduced privileges, but upgraded daemons will continue to run as root. #. Fix the data ownership during the upgrade. This is the preferred option, but it is more work and can be very time consuming. The process for each host is to: #. Upgrade the ceph package. This creates the ceph user and group. For example:: ceph-deploy install --stable jewel HOST #. Stop the daemon(s).:: service ceph stop # fedora, centos, rhel, debian stop ceph-all # ubuntu #. Fix the ownership:: chown -R ceph:ceph /var/lib/ceph #. Restart the daemon(s).:: start ceph-all # ubuntu systemctl start ceph.target # debian, centos, fedora, rhel Alternatively, the same process can be done with a single daemon type, for example by stopping only monitors and chowning only ``/var/lib/ceph/mon``. * The on-disk format for the experimental KeyValueStore OSD backend has changed. You will need to remove any OSDs using that backend before you upgrade any test clusters that use it. * When a pool quota is reached, librados operations now block indefinitely, the same way they do when the cluster fills up. (Previously they would return -ENOSPC.) By default, a full cluster or pool will now block. If your librados application can handle ENOSPC or EDQUOT errors gracefully, you can get error returns instead by using the new librados OPERATION_FULL_TRY flag. * The return code for librbd's rbd_aio_read and Image::aio_read API methods no longer returns the number of bytes read upon success. Instead, it returns 0 upon success and a negative value upon failure. * 'ceph scrub', 'ceph compact' and 'ceph sync force' are now DEPRECATED. Users should instead use 'ceph mon scrub', 'ceph mon compact' and 'ceph mon sync force'. * 'ceph mon_metadata' should now be used as 'ceph mon metadata'. There is no need to deprecate this command (same major release since it was first introduced). * The `--dump-json` option of "osdmaptool" is replaced by `--dump json`. * The commands of "pg ls-by-{pool,primary,osd}" and "pg ls" now take "recovering" instead of "recovery", to include the recovering pgs in the listed pgs. Upgrading from Firefly ---------------------- Upgrading directly from Firefly v0.80.z is not recommended. It is possible to do a direct upgrade, but not without downtime, as all OSDs must be stopped, upgraded, and then restarted. We recommend that clusters be first upgraded to Hammer v0.94.6 or a later v0.94.z release; only then is it possible to upgrade to Jewel 10.2.z for an online upgrade (see below). To do an offline upgrade directly from Firefly, all Firefly OSDs must be stopped and marked down before any Jewel OSDs will be allowed to start up. This fencing is enforced by the Jewel monitor, so you should use an upgrade procedure like: #. Upgrade Ceph on monitor hosts #. Restart all ceph-mon daemons #. Set noout:: ceph osd set noout #. Upgrade Ceph on all OSD hosts #. Stop all ceph-osd daemons #. Mark all OSDs down with something like:: ceph osd down `seq 0 1000` #. Start all ceph-osd daemons #. Let the cluster settle and then unset noout:: ceph osd unset noout #. Upgrade and restart any remaining daemons (ceph-mds, radosgw) Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-10.2.0.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy

8 years

v10.1.2 Jewel release candidate release

by Sage Weil

Hi everyone, The third (and likely final) Jewel release candidate is out. We have a very small number of remaining blocker issues and a bit of final polish before we publish Jewel 10.2.0, probably next week. There are no known issues with this release that are serious enough to warn about here. Greg is adding some CephFS checks so that admins don't accidentally start using less-stable features, there is a pending OSD startup check to ensure that the OSD's underlying fs can handle the configured max object name length, and there are a few bug to squash with the new rgw replication configuration process. Otherwise, it's looking pretty good! This is your last chance to some deployment testing before the final release. Draft release notes are here: http://docs.ceph.com/docs/master/release-notes/ Thanks! sage

8 years

Deprecating ext4 support

by Sage Weil

Hi, ext4 has never been recommended, but we did test it. After Jewel is out, we would like explicitly recommend *against* ext4 and stop testing it. Why: Recently we discovered an issue with the long object name handling that is not fixable without rewriting a significant chunk of FileStores filename handling. (There is a limit in the amount of xattr data ext4 can store in the inode, which causes problems in LFNIndex.) We *could* invest a ton of time rewriting this to fix, but it only affects ext4, which we never recommended, and we plan to deprecate FileStore once BlueStore is stable anyway, so it seems like a waste of time that would be better spent elsewhere. Also, by dropping ext4 test coverage in ceph-qa-suite, we can significantly improve time/coverage for FileStore on XFS and on BlueStore. The long file name handling is problematic anytime someone is storing rados objects with long names. The primary user that does this is RGW, which means any RGW cluster using ext4 should recreate their OSDs to use XFS. Other librados users could be affected too, though, like users with very long rbd image names (e.g., > 100 characters), or custom librados users. How: To make this change as visible as possible, the plan is to make ceph-osd refuse to start if the backend is unable to support the configured max object name (osd_max_object_name_len). The OSD will complain that ext4 cannot store such an object and refuse to start. A user who is only using RBD might decide they don't need long file names to work and can adjust the osd_max_object_name_len setting to something small (say, 64) and run successfully. They would be taking a risk, though, because we would like to stop testing on ext4. Is this reasonable? If there significant ext4 users that are unwilling to recreate their OSDs, now would be the time to speak up. Thanks! sage

8 years

v10.1.1 Jewel candidate released

by Sage Weil

Hi all, We've pushed 10.1.1, a second release candidate for Jewel. This fixes another round of bugs, and we are getting pretty close to a final release. There are a few known issues to watch out for: - Old CephFS clusters will mangle the layouts with this release; the fix was committed just after it was cut. Wait for the next RC or the release if you're upgrading a cluster that has a CephFS data pool as pool 0. - The upstart ceph-mds-all.conf file is missing. Also, this is the first release build that includes arm64/aarch64 packages for Ubuntu Xenial 16.04. Yay! The CentOS7 builds are waiting on the EPEL repos; hopefully that will happen soon. Please review the release notes before trying: http://docs.ceph.com/docs/master/release-notes/ Thanks! sage

8 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

Ceph-announce April 2016