Sage Weil <sage@newdream.net> wrote on 18.07.2019 17:00:18:
> On Thu, 18 Jul 2019, Ulrich Weigand wrote:
> > That's what we've tried to do, and it does indeed make work Ceph a lot
> > better on IBM Z (still some issues, but probably unrelated).  Once we've
> > gotten a bit farther with testing, we can certainly send patches to
> > that effect.
>
> Great, thanks!

I've now made significant progress.  In fact, with my current patch
set I can build Ceph on Z and run the "ctest" suite with zero fails:

  100% tests passed, 0 tests failed out of 169

In addition to the endian-related changes, which I'll describe in
detail below, I needed three more patches fixing minor testing
issues, as well as a critical fix for crashes in the Boost "lockfree"
library on Z.

I'll be preparing patches for submission via issues and pull requests,
but I've never contributed to Ceph before, so I may have to first get
more familar with how the project operates ...   I'd appreciate any
comments or tips, e.g. on how to (possibly) split up the patch etc.

(See attached file: ceph-endian.diff)

This patch fixes many problems with Ceph on big-endian systems due to
missed endian conversions in various places. The primary change is:

- Remove all instances of __le16/__le32/__le64 and replace them with
  ceph_le16/ceph_le32/ceph_le64 instead; the latter will perform
  automatic byte swapping on access if required. (Note that in select
  places I believe using a byte-swapping type is unnecessary, e.g.
  for local variables or parameters; in those cases I've changed
  existing __le type uses to plain unsigned integer types instead.)

- The exception is three header files (ceph_fs.h, msgr.h, rados.h),
  which are shared with the Linux kernel. These files continue to
  use the __le types, but redefine them to ceph_le types when
  compiled for user-space. This is now done in those files themselves
  instead of in types.h, so that also direct includers are covered.

In addition, I noticed inconsistencies with how __le and/or ceph_le
types were initialized. Given that ceph_le currently has a public
member, it can be initialized as an aggregate, which means the
responsibility for byte swapping is on each user. Some places did
indeed perform byte swapping (either via init_le16/32/64 or directly
via mswab), but not all.

To ensure consistency, I'm suggesting to change ceph_le to make its
member private, which will ensure any value must be set by the
assignment operator (which always byte-swaps). Ideally, we'd also
have a constructor; but that collides with the way the __le types
are used in the shared headers as part of anonymous structs/unions.
GCC does not allow any type with constructor in such places.

For convenience, I've therefore changed the init_le functions to
return ceph_le types, so that you can at least initialize ceph_le
variables to the return values of init_le. All places where a
ceph_le (or formerly __le) variable needs to be initialized are
changed to use this new method; all direct uses of mswab were
eliminated.

Also, in a very small number of places existing code actually
tried to manually byte-swap accesses to __le types using the
leXX_to_cpu routines (but that was a no-op anyway ...). With
the ceph_le types, that now works correctly automatically, so I've
removed that code.


In addition, the patch contains a number of changes to work around
follow-on problems from the decisions made above. Specifically:

- Now that ceph_fs.h always uses C++ types (in user space), it cannot
  be included in C files any more. There was exactly one of those,
  src/mds/locks.c, and it only used a few CEPH_CAP_... constants from
  the header. To fix this, I've simply duplicated those definitions;
  as those values appear to be unchangeable ABI constants, I'd not
  expect that duplication to cause any problems later on. (Of course,
  any alternative suggestions are appreciated!)

- Even in the absence of constructors, the new ceph_le types are now
  no longer aggregate types; this causes a few issues when interacting
  with compiler extensions, in particular anonymous structs/unions.
  GCC no longer allows (without warning) the case where a packed struct
  contains *non-packed* structs that contain ceph_le. In practice,
  all those structs and substructs were always intended to be completely
  packed (and actually are, those substructs where the attribute was
  missing happen to be naturally packed), so I've simply added the
  missing attributes.

- Similarly, GCC no longer allows a std::array to be an element of a
  packed struct that also contains a non-aggregate type (like the new
  ceph_le). This happens in one file, msg/async/frames_v2.h. I've
  replaced the std::array with a plain C array with the same layout.

- Also, the following code now gives warnings due to overwriting a
  private member via memcpy:
    copy_from_legacy_head(struct ceph_mds_request_head *head,
              struct ceph_mds_request_head_legacy *legacy) {
      memcpy(&(head->oldest_client_tid), legacy, sizeof(*legacy));
  I'm now simply using a struct assignment instead.

- Several places in os/Transaction.h use operators like += or ++
  on (what is now) ceph_le members, but the class doesn't implement
  those. I guess it would be possible to add them at the class level,
  but since it's just the one file, and in general it seems preferable
  to not perform too many operations on byte-swapped types anyway,
  I've simply expanded those operations in place.

- Because ceph_le has no constructor, it is impossible to construct
  a constexpr variable of any type containing ceph_le.  This only
  occurs in one test case, where I simply removed the constexpr.


Finally, I've run into three places where endian handling looked simply
wrong, completely independently of the __le vs. ceph_le issue:

- librbd/operation/ResizeRequest.cc:ResizeRequest<I>::send_update_header
  has the comment:
    // NOTE: format 1 image headers are not stored in fixed endian format
  This seems just wrong, both from looking at test failures, and from
  looking at code handling those format 1 image headers in the Linux
  kernel driver (which uses le_to_cpu for those fields).

- struct PGTempMap in osd/OSDMap.h tracks a number of int32_t pointers
  pointing into a buffer list. But that list was generated via encode,
  which means int members are bytes-swapped. Fixed by using ceph_le32
  pointers instead.

- Similarly, struct HeaderHelper in tools/immutable_object_cache/Types.h
  is used to overlay buffer list data, so it should use ceph_le32
  instead of uint32_t.


Bye,
Ulrich