Hello,
I have setup two separate Ceph clusters with RGW instance each and trying to achieve
multisite data synchronization. Primary runs 13.2.5, slave runs 14.2.2 (I have upgraded
slave side from 14.2.1 due to known data corruption during transfer due to curl errors). I
have emptied slave zone and allowed sync to run from beginning to the end. Then I have
recalculated MD5 hashes over original data and over data in slave zones and found that in
some cases they do not match. Data corruption is evident.
Comparison of data byte for byte shows that some parts of data just moved around (for
example file has correct bytes from 0 to 513k and then exactly the same bytes from
position of 513k up to 1026k - feels like as some sort of buffer issue). File size is
correct. I have read the RGW sources and could not find anything what can cause such sort
of behavior however I could not find one single piece of code which I would consider
critical: during FetchRemote RGW obtains object's etag from remote RGW but apparently
there nothing what recalculates MD5 from actual data and comparing it to etag received to
ensure that data was correctly transferred. Even broken data then stored to local cluster
and into bucket index. Then there just nothing further to prevent wrong data to reach end
user downloading from slave zone.
Such check should happen regardless of sync backend in use: objects with broken data
should not be stored in bucket index at all. No one needs broken data and it is better
just to fail object sync to allow it to try again later instead of storing faulty data and
then allowing end users to download it.
Is it me who cannot see such sort of check or it is actually missing?! And if it is not
there at all I think it should be quite high on TODO list.
Regards,
Vladimir