Hello everyone,
Casey had raised a question recently regarding the design of the Cache
Driver <https://github.com/ceph/ceph/pull/52019>. He was under the
impression that the D4N directory should be storing object metadata (and
attributes) while the Cache Driver should be storing data only.
Currently, the D4N Directory only stores metadata specific to
directory-related operations, which is why S3 attributes like bucket size,
mtime, and others are not added to the object's directory entry. Dan has
also mentioned the prospect of moving SAL attributes into the directory
rather than the cache backend; so this is worth discussing in more detail.
If you have any thoughts or additional comments on this topic, please let
me know.
Sincerely,
Samarah Uriarte
Hi Folks,
Looks like this email didn't go out earlier this morning, so re-sending.
Today, Arun Raghunath is presenting his work on refactoring OSD
control/data separation for NVMe-oF. Should be a very interesting talk!
Etherpad:
https://pad.ceph.com/p/performance_weekly
Meeting URL:
https://meet.jit.si/ceph-performance
Mark
--
Best Regards,
Mark Nelson
Head of R&D (USA)
Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson(a)clyso.com
We are hiring: https://www.clyso.com/jobs/
Today we discussed:
- Delegating more privileges for internal hardware to allow on-call
folks to fix issues.
- Maybe using CephFS for the teuthology VM /home directory (it became
full on Friday night)
- Preparation for Open Source Day: we are seeking "low-hanging-fruit"
tickets for new developers to try fixing.
- Reef is released! Time for blog posts. We are gathering options from PTLs.
- Ceph organization Github plan migration from the "bronze legacy
plan" to the FOSS "free" plan. There is some uncertainty about
surprise drawbacks, Ernesto is continuing his investigation.
- Case is updating contributors to generate accurate credits for the
new reef release: https://github.com/ceph/ceph/pull/52868
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Even tho EOL for pacific has expired on 6/1/23, I'd like to gather dev
leads' feedback on what open PRs should be considered for the last
point release.
It will help to determine the necessity of 16.2.14.
TIA
YuriW
Hi,
We have a customer with an abnormally large number of "osd_snap /
purged_snap_{pool}_{snapid}" keys in monstore db: almost 40
million. Among other problems it causes a very long mon
synchronization on startup.
Our understanding is that the cause is that a mirroring snapshot
creation is very frequently interrupted in their environment, most
likely due to connectivity issues between the sites. The assumption is
based on the fact that they have a lot of rbd "trash" snapshots, which
may happen when an rbd snapshot removal is interrupted. (A mirroring
snapshot creation usually includes removal of some older snapshot to keep
the total number of the image mirroring snapshots under the limit).
We removed all "trash" snapshots manually, so currently they have a
limited number of "expected" snapshots but the number of purged_snap
keys is still the same large.
So, our understanding is that if an rbd snapshot creation is
frequently interrupted there is a chance it will be interrupted in or
just after SnapshotCreateRequest::send_allocate_snap_id [1], when
it requests a new snap id from the mon. As a result this id is not
tracked by rbd and never removed, and snap id holes like this make
"purged_snap_{pool}_{snapid}" ranges never merge.
To confirm that this scenario is likely I ran the following simple test
that interrupted rbd mirror snapshot creation at random time:
for i in `seq 500`;do
rbd mirror image snapshot test&
PID=$!
sleep $((RANDOM % 5)).$((RANDOM % 10))
kill $PID && sleep 30
done
Running this with debug_rbd=30, from the rbd client logs I see that it
was interrupted in send_allocate_snap_id 74 times, which is (surprisingly)
very high.
And after the experiment, and after removing the rbd image with all
tracked snapshots (i.e having the pool with no known rbd snapshots),
I see "purged_snap_{pool}_{snapid}" keys for ranges that I believe will
never be merged.
So the questions are:
1) Is there a way we could improve this to avoid monstore growing large?
2) How can we fix the current situation in the cluster? Would it be safe
enough to just run `ceph-kvstore-tool rocksdb store.db rm-prefix osd_snap`
to remove all osd_snap keys (including purged_epoch keys)? Due to
large db size I don't think it would be possible to selectively remove
keys with `ceph-kvstore-tool rocksdb store.db rm {prefix} {key}`
command and we may use only the `rm-prefix` command. Looking at the
code and actually trying it in a test environment it seems like it could
work, but I may be missing something dangerous here?
If (1) is not possible, then maybe we could provide a tool/command
for users to clean the keys if they observe this issue?
[1] https://github.com/ceph/ceph/blob/e45272df047af71825445aeb6503073ba06123b0/…
Thanks,
--
Mykola Golub
Hi Ceph developers,
Is there a part of the codebase you want cleaned up, but can never find
time for? Is there a page in the documentation that needs updating? Perhaps
you've noticed a need for additional logging in one of the manager modules.
If so, please create a tracker issue and mark it with the "low-hanging-fruit
<https://tracker.ceph.com/projects/ceph/issues?fields%5B%5D=issue_tags&opera…>"
tag [1]!
"Low-hanging-fruit" trackers benefit the community by providing a foothold
for beginner developers, while also reducing the accumulation of smaller
issues that experienced developers have to deal with as the project grows
and develops.
On *September 22nd, 2023*, Ceph will participate in Grace Hopper
Celebration Open Source Day
<https://ghc.anitab.org/programs-and-awards/open-source-day/> [2], an
all-day hack-a-thon where participants of all levels contribute and learn
about open source. You can help by triaging existing issues as
"low-hanging-fruit", or by creating new "low-hanging-fruit" issues. These
issues will benefit Open Source Day participants, as well as future
beginner developers in the Ceph project.
Any Ceph project component is encouraged to participate. Feel free to reach
out to me with any questions!
Thanks,
- Laura Flores
1. "low-hanging-fruit" tag:
https://tracker.ceph.com/projects/ceph/issues?fields%5B%5D=issue_tags&opera…
2. Grace Hopper Celebration Open Source Day:
https://ghc.anitab.org/programs-and-awards/open-source-day/
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804
Hi,
We know snapshot is on a point of time. Is this point of time tracked internally by
some sort of sequence number, or the timestamp showed by "snap ls", or something else?
I noticed that when "deep cp", the timestamps of all snapshot are changed to copy-time.
Say I create a snapshot at 1PM and make a copy at 3PM, the timestamp of snapshot in
the copy is 3PM. If I rollback the copy to this snapshot, I'd assume it will actually bring me
back to the state of 1PM. Is that correct?
If the above is true, I won't be able to rely on timestamp to track snapshots.
Say I create a snapshot every hour and make a backup by copy at the end of the day.
Then the original image is damaged and backup is used to restore the work. On this
backup image, how do I know which snapshot was on 1PM, which was on 2PM, etc.?
Any advices to track snapshots properly in such case?
I can definitely build something else to help on this, but I'd like to know how much
Ceph can support it.
Thanks!
Tony
Hi,
There is a snap ID for each snapshot. How is this ID allocated, sequentially?
Did some tests, it seems this ID is per pool, starting from 4 and always going up.
Is that correct?
What's the max of this ID?
What's going to happen when ID reaches the max, going back to start from 4 again?
Thanks!
Tony