hey Gal and Eric,
in today's standup, we discussed the version of our apache arrow
submodule. it's currently pinned at 6.0.1, which was tagged in nov.
2021. the centos9 builds are using the system package
libarrow-devel-9.0.0. arrow's upstream recently tagged an 11.0.0
release
as far as i know, there still aren't any system packages for ubuntu,
so we're likely to be stuck with the submodule for quite a while. how
do guys want to handle these updates? is it worth trying to update
before the reef release?
Even tho EOL for pacific has expired on 6/1/23, I'd like to gather dev
leads' feedback on what open PRs should be considered for the last
point release.
It will help to determine the necessity of 16.2.14.
TIA
YuriW
Hi,
We have a customer with an abnormally large number of "osd_snap /
purged_snap_{pool}_{snapid}" keys in monstore db: almost 40
million. Among other problems it causes a very long mon
synchronization on startup.
Our understanding is that the cause is that a mirroring snapshot
creation is very frequently interrupted in their environment, most
likely due to connectivity issues between the sites. The assumption is
based on the fact that they have a lot of rbd "trash" snapshots, which
may happen when an rbd snapshot removal is interrupted. (A mirroring
snapshot creation usually includes removal of some older snapshot to keep
the total number of the image mirroring snapshots under the limit).
We removed all "trash" snapshots manually, so currently they have a
limited number of "expected" snapshots but the number of purged_snap
keys is still the same large.
So, our understanding is that if an rbd snapshot creation is
frequently interrupted there is a chance it will be interrupted in or
just after SnapshotCreateRequest::send_allocate_snap_id [1], when
it requests a new snap id from the mon. As a result this id is not
tracked by rbd and never removed, and snap id holes like this make
"purged_snap_{pool}_{snapid}" ranges never merge.
To confirm that this scenario is likely I ran the following simple test
that interrupted rbd mirror snapshot creation at random time:
for i in `seq 500`;do
rbd mirror image snapshot test&
PID=$!
sleep $((RANDOM % 5)).$((RANDOM % 10))
kill $PID && sleep 30
done
Running this with debug_rbd=30, from the rbd client logs I see that it
was interrupted in send_allocate_snap_id 74 times, which is (surprisingly)
very high.
And after the experiment, and after removing the rbd image with all
tracked snapshots (i.e having the pool with no known rbd snapshots),
I see "purged_snap_{pool}_{snapid}" keys for ranges that I believe will
never be merged.
So the questions are:
1) Is there a way we could improve this to avoid monstore growing large?
2) How can we fix the current situation in the cluster? Would it be safe
enough to just run `ceph-kvstore-tool rocksdb store.db rm-prefix osd_snap`
to remove all osd_snap keys (including purged_epoch keys)? Due to
large db size I don't think it would be possible to selectively remove
keys with `ceph-kvstore-tool rocksdb store.db rm {prefix} {key}`
command and we may use only the `rm-prefix` command. Looking at the
code and actually trying it in a test environment it seems like it could
work, but I may be missing something dangerous here?
If (1) is not possible, then maybe we could provide a tool/command
for users to clean the keys if they observe this issue?
[1] https://github.com/ceph/ceph/blob/e45272df047af71825445aeb6503073ba06123b0/…
Thanks,
--
Mykola Golub
Details of this release are summarized here:
https://tracker.ceph.com/issues/62231#note-1
Seeking approvals/reviews for:
smoke - Laura, Radek
rados - Neha, Radek, Travis, Ernesto, Adam King
rgw - Casey
fs - Venky
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade-clients:client-upgrade* - in progress
powercycle - Brad
Please reply to this email with approval and/or trackers of known
issues/PRs to address them.
bookworm distro support is an outstanding issue.
TIA
YuriW
Hi folks,
I noticed that the encode/decode functions enforce versions in order to achieve backward compatibility and provide a upgrade path forward. However, I'd like to confirm the standard of practice around the use of versions in this case. If decode() function states the compatv is 9, there should be no code inside that handles the case of struct_v < 9, since this condition should never be satisfied. Is this the right understanding? I saw this block of code in RGWUserInfo::decode():
void decode(bufferlist::const_iterator& bl) {
DECODE_START_LEGACY_COMPAT_LEN_32(22, 9, 9, bl);
if (struct_v >= 2) {
uint64_t old_auid;
decode(old_auid, bl);
}
std::string access_key;
std::string secret_key;
decode(access_key, bl);
decode(secret_key, bl);
if (struct_v < 6) {
RGWAccessKey k;
k.id = access_key;
k.key = secret_key;
access_keys[access_key] = k;
}
I don't see why we need to handle the case of struct_v < 6 when compatv is 9. Is it safe to assume that this if statement is a dead code? If so, could we also assume that the following if block in its encode() function should be removed, too?
void encode(bufferlist& bl) const {
ENCODE_START(22, 9, bl);
encode((uint64_t)0, bl); // old auid
std::string access_key;
std::string secret_key;
if (!access_keys.empty()) {
std::map<std::string, RGWAccessKey>::const_iterator iter = access_keys.begin();
const RGWAccessKey& k = iter->second;
access_key = k.id;
secret_key = k.key;
}
Thanks,Yixin
Hi,
I'm using Pacific v16.2.10 container image, deployed by cephadm.
I used to manually build config file for rgw, deploy rgw, put config file in place
and restart rgw. It works fine.
Now, I'd like to put rgw config into config db. I tried with client.rgw, but the config
is not taken by rgw. Also "config show" doesn't work. It always says "no config state".
```
# ceph orch ps | grep rgw
rgw.qa.ceph-1.hzfrwq ceph-1 10.250.80.100:80 running (10m) 10m ago 53m 51.4M - 16.2.10 32214388de9d 13169a213bc5
# ceph config get client.rgw | grep frontends
client.rgw basic rgw_frontends beast port=8086 *
# ceph config show rgw.qa.ceph-1.hzfrwq
Error ENOENT: no config state for daemon rgw.qa.ceph-1.hzfrwq
# ceph config show client.rgw.qa.ceph-1.hzfrwq
Error ENOENT: no config state for daemon client.rgw.qa.ceph-1.hzfrwq
# radosgw-admin --show-config -n client.rgw.qa.ceph-1.hzfrwq | grep frontends
rgw_frontends = beast port=7480
```
Any clues what I am missing here?
Thanks!
Tony
This is the third and possibly last release candidate for Reef.
The Reef release comes with a new RockDB version (7.9.2) [0], which
incorporates several performance improvements and features. Our
internal testing doesn't show any side effects from the new version,
but we are very eager to hear community feedback on it. This is the
first release to have the ability to tune RockDB settings per column
family [1], which allows for more granular tunings to be applied to
different kinds of data stored in RocksDB. A new set of settings has
been used in Reef to optimize performance for most kinds of workloads
with a slight penalty in some cases, outweighed by large improvements
in use cases such as RGW, in terms of compactions and write
amplification. We would highly encourage community members to give
these a try against their performance benchmarks and use cases. The
detailed list of changes in terms of RockDB and BlueStore can be found
in https://pad.ceph.com/p/reef-rc-relnotes.
If any of our community members would like to help us with performance
investigations or regression testing of the Reef release candidate,
please feel free to provide feedback via email or in
https://pad.ceph.com/p/reef_scale_testing. For more active
discussions, please use the #ceph-at-scale slack channel in
ceph-storage.slack.com.
This RC has gone thru partial testing due to issues we are
experiencing in the sepia lab.
Please try it out and report any issues you encounter. Happy testing!
Thanks,
YuriW
Get the release from
* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-18.1.3.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: f594a0802c34733bb06e5993bc4bdb085c9a5f3f