Thank you, Igor. I will try to see how to collect the
perf values. Not
sure about restarting all OSDs as it's a production cluster, is there
a less invasive way?
/Z
On Tue, 9 May 2023 at 23:58, Igor Fedotov <igor.fedotov(a)croit.io> wrote:
Hi Zakhar,
Let's leave questions regarding cache usage/tuning to a different
topic for now. And concentrate on performance drop.
Could you please do the same experiment I asked from Nikola once
your cluster reaches "bad performance" state (Nikola, could you
please use this improved scenario as well?):
- collect perf counters for every OSD
- reset perf counters for every OSD
- leave the cluster running for 10 mins and collect perf counters
again.
- Then restart OSDs one-by-one starting with the worst OSD (in
terms of subop_w_lat from the prev step). Wouldn't be sufficient
to reset just a few OSDs before the cluster is back to normal?
- if partial OSD restart is sufficient - please leave the
remaining OSDs run as-is without reboot.
- after the restart (no matter partial or complete one - the key
thing it's should successful) reset all the perf counters and
leave the cluster run for 30 mins and collect perf counters again.
- wait 24 hours and collect the counters one more time
- share all four counters snapshots.
Thanks,
Igor
On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote:
Don't mean to hijack the thread, but I
may be observing something
similar with 16.2.12: OSD performance noticeably peaks after OSD
restart and then gradually reduces over 10-14 days, while commit
and apply latencies increase across the board.
Non-default settings are:
"bluestore_cache_size_hdd": {
"default": "1073741824",
"mon": "4294967296",
"final": "4294967296"
},
"bluestore_cache_size_ssd": {
"default": "3221225472",
"mon": "4294967296",
"final": "4294967296"
},
...
"osd_memory_cache_min": {
"default": "134217728",
"mon": "2147483648",
"final": "2147483648"
},
"osd_memory_target": {
"default": "4294967296",
"mon": "17179869184",
"final": "17179869184"
},
"osd_scrub_sleep": {
"default": 0,
"mon": 0.10000000000000001,
"final": 0.10000000000000001
},
"rbd_balance_parent_reads": {
"default": false,
"mon": true,
"final": true
},
All other settings are default, the usage is rather simple
Openstack / RBD.
I also noticed that OSD cache usage doesn't increase over time
(see my message "Ceph 16.2.12, bluestore cache doesn't seem to be
used much" dated 26 April 2023, which received no comments),
despite OSDs are being used rather heavily and there's plenty of
host and OSD cache / target memory available. It may be worth
checking if available memory is being used in a good way.
/Z
On Mon, 8 May 2023 at 22:35, Igor Fedotov <igor.fedotov(a)croit.io>
wrote:
Hey Nikola,
On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
OK, starting collecting those for all OSDs..
I have hour samples of all OSDs perf dumps loaded in DB, so
I can easily
examine,
sort, whatever..
You didn't reset the counters every hour, do you? So having
average
subop_w_latency growing that way means the current values
were much
higher than before.
Curious if subop latencies were growing for every OSD or just
a subset
(may be even just a single one) of them?
Next time you reach the bad state please do the following if
possible:
- reset perf counters for every OSD
- leave the cluster running for 10 mins and collect perf
counters again.
- Then start restarting OSD one-by-one starting with the
worst OSD (in
terms of subop_w_lat from the prev step). Wouldn't be
sufficient to
reset just a few OSDs before the cluster is back to normal?
> currently values for avgtime are around
0.0003 for
subop_w_lat and 0.001-0.002
for
op_w_lat
OK, so there is no visible trend on op_w_lat, still between
0.001 and 0.002
subop_w_lat seems to have increased since yesterday though!
I see values
from
0.0004 to as high as 0.001
If some other perf data might be interesting, please let me
know..
During OSD restarts, I noticed strange thing - restarts on
first 6
machines
went smooth, but then on another 3, I saw rocksdb
logs
recovery on all SSD
OSDs. but first didn't see any mention of
daemon crash in
ceph -s
later, crash info appeared, but only about 3 daemons (in
total, at least
20
of them crashed though)
crash report was similar for all three OSDs:
[root@nrbphav4a ~]# ceph crash info
2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
{
"backtrace": [
"/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
"(BlueStore::_txc_create(BlueStore::Collection*,
BlueStore::OpSequencer*, std::__cxx11::list<Context*,
std::allocator<Context*> >*,
boost::intrusive_ptr<TrackedOp>)+0x413) [0x55a1c9d07c43]",
"(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
std::vector<ceph::os::Transaction,
std::allocator<ceph::os::Transaction> >&,
boost::intrusive_ptr<TrackedOp>,
ThreadPool::TPHandle*)+0x22b) [0x55a1c9d27e9b]",
"(ReplicatedBackend::submit_transaction(hobject_t const&,
object_stat_sum_t const&, eversion_t const&,
std::unique_ptr<PGTransaction,
std::default_delete<PGTransaction> >&&, eversion_t const&,
eversion_t const&, std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> >&&,
std::optional<pg_hit_set_history_t>&, Context*, unsigned
long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x8ad)
[0x55a1c9bbcfdd]",
"(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
"(PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext,
std::default_delete<PrimaryLogPG::OpContext> >)+0x57)
[0x55a1c99d6777]",
"(PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0xb73)
[0x55a1c99da883]",
"/usr/bin/ceph-osd(+0x58794e)
[0x55a1c992994e]",
"(CommonSafeTimer<std::mutex>::timer_thread()+0x11a)
[0x55a1c9e226aa]",
"/usr/bin/ceph-osd(+0xa80eb1)
[0x55a1c9e22eb1]",
"/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
"/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
],
"ceph_version": "17.2.6",
"crash_id":
"2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
"entity_name":
"osd.98",
"os_id": "almalinux",
"os_name": "AlmaLinux",
"os_version": "9.0 (Emerald Puma)",
"os_version_id": "9.0",
"process_name": "ceph-osd",
"stack_sig":
"b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
> "timestamp": "2023-05-08T17:45:47.056675Z",
> "utsname_hostname": "nrbphav4h",
> "utsname_machine": "x86_64",
> "utsname_release": "5.15.90lb9.01",
> "utsname_sysname": "Linux",
> "utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET
2023"
> }
> I was trying to figure out why this particular 3 nodes
could behave differently
and found out from colleagues, that those 3 nodes
were
added to cluster lately
with direct install of 17.2.5 (others were
installed
15.2.16 and later upgraded)
> not sure whether this is
related to our problem though..
> I see very similar crash
reported
here:https://tracker.ceph.com/issues/56346
> so I'm not reporting..
> Do you think this might
somehow be the cause of the
problem? Anything else I should
check in perf dumps or elsewhere?
Hmm... don't know yet. Could you please last 20K lines prior
the crash
from e.g two sample OSDs?
And the crash isn't permanent, OSDs are able to start after the
second(?) shot, aren't they?
> with best regards
> nik
--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web <https://croit.io/> | LinkedIn
<http://linkedin.com/company/croit> |
Youtube
<https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
Twitter <https://twitter.com/croit_io>
Meet us at the SC22 Conference! Learn more
<https://croit.io/croit-sc22>
Technology Fast50 Award Winner by Deloitte
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>!
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web <https://croit.io/> | LinkedIn
<http://linkedin.com/company/croit> | Youtube
<https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
Twitter <https://twitter.com/croit_io>
Meet us at the SC22 Conference! Learn more
<https://croit.io/croit-sc22>
Technology Fast50 Award Winner by Deloitte
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>!
<https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>