Hi,
since some days I try to debug a problem with snaptrimming under
nautilus.
I have a cluster with Nautilus (v14.2.10) , 44 Nodes á 24 OSDs á 14 TB
I create every day a snapshot for 7 days.
Every time the old snapshot is deleting I have bad IO performcance and blocked requests for several seconds until the snaptrim is done.
Settings like snaptrim_sleep and osd_pg_max_concurrent_snap_trims don't affect this behavior.
In the debug_osd 10/10 log I see the following:
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 prio 196 cost 0 latency 0.019545 osd_repop_reply(client.22731418.0:615257 3.636 e22457/22372) v2 pg pg[3.636( v 22457'100855 (21737'97756,22457'100855] local-lis/les=22372/22374 n=27762 ec=2842/2839 lis/c 22372/22372 les/c/f 22374/22374/0 22372/22372/22343) [411,36,956,763] r=0 lpr=22372 luod=22457'100854 crt=22457'100855 lcod 22457'100853 mlcod 22457'100853 active+clean+snaptrim_wait trimq=[1d~1]]
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 finish
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 prio 127 cost 0 latency 0.043165 MOSDScrubReserve(2.2645 RELEASE e22457) v1 pg pg[2.2645( empty local-lis/les=22359/22364 n=0 ec=2403/2403 lis/c 22359/22359 les/c/f 22364/22367/0 22359/22359/22359) [379,411,884,975] r=1 lpr=22359 crt=0'0 active mbc={}]
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 finish
2020-07-27 11:45:50.039 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99491 (21594'96426,22457'99491] local-lis/les=22359/22362 n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 crt=22457'99491 lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer posting
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99493 (21594'96426,22457'99493] local-lis/les=22359/22362 n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 luod=22457'99491 crt=22457'99493 lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer complete
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 prio 127 cost 663 latency 7.761823 osd_repop(osd.217.0:3025 3.1ca5 e22457/22378) v2 pg pg[3.1ca5( v 22457'100370 (21716'97357,22457'100370] local-lis/les=22378/22379 n=27532 ec=2855/2839 lis/c 22378/22378 les/c/f 22379/22379/0 22378/22378/22378) [217,411,551,1055] r=1 lpr=22378 luod=0'0 lua=22294'100006 crt=22457'100370 lcod 22457'100369 active mbc={}]
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 finish
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 prio 127 cost 0 latency 7.494296 MOSDScrubReserve(2.37e2 REQUEST e22457) v1 pg pg[2.37e2( empty local-lis/les=22355/22356 n=0 ec=2412/2412 lis/c 22355/22355 les/c/f 22356/22356/0 22355/22355/22355) [245,411,834,768] r=1 lpr=22355 crt=0'0 active mbc={}]
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 finish
the dequeueing of ops works without pauses until the „snap_trimmer posting“ and „snap_trimmer complete“ loglines. This task takes in this example about 7 Seconds. The following operations which are dequeued have now a latency of about this time.
I tried to drill down this in the code. (Developers are asked here)
It seems, that the PG will be locked for every operation.
The snap_trimmer posting and complete message comes from „osd/PrimaryLogPG.cc“ on line 4700. This indicates me, that the process of deleting a snapshot object will sometimes take some time.
After further poking around. I see in „osd/SnapMapper.cc“ the method „SnapMapper::get_next_objects_to_trim“ which takes several seconds to get finished. I followed this further to the „common/map_cacher.hpp“ to the line 94: „int r = driver->get_next(key, &store);“
From there I lost the path.
The slowness is not on all OSDs at the same time. Somteime, this few OSDs are affected, sometimes some others. Restart of an OSD does not help.
With luminous and filestore, snapshot deletion was not an issue at all.
With nautilus and bluestore this is not acceptable for my usecase.
I don‘t know so far, if this is a bluestore specific problem or some general issue.
I wonder a bit why there are no other who have this problem.
Regards
Manuel
The tabs are the features in the app that assist you in performing variety of tasks. But if you can’t use the tabs, then you must get in touch with the customer care and talk to a Cash App representative to get the issue resolved. In addition to that, you can also use tech support to get the matter resolved.https://www.yahoohome-page.com/cash-app-customer-service/
To perform any operation you first need to get inside the app and that can only be done by tapping on the icon. But if the icon is unresponsive, then you can get assistance by tech support sites or you can also try rebooting your device. You can talk to a Cash App representative to get the issue resolved.https://www.experts-support.com/cash-app-customer-service/
The touch Id feature of the app lets you approve the transaction by recognizing your fingerprints. But if the Id isn’t working, then you can use the assistance that is provided by various tech support sites or you can dial the tech support number to talk to a Cash App representative in order to get the error fixed.https://gettosupport.net/cash-app-customer-service/
One of the features of the application to help you with filling the nuances of the recipient is the scanner. Notwithstanding, if you can't use it as a result of some goof, by then you can get the fundamental assistance from the customer care site by picking to talk to a Cash App representative. Another choice is to look at to the help arrange for outlines.https://gettosupport.net/cash-app-payment-failed/
Affirmation is a key bit of the application. Before long, if you can't confirm a trade, by then you can use a few plans from the customer care by choosing to talk to a Cash App representative and get the tech issue settled. You can in like manner don't extra a second to inspect to the help site.https://www.pcmonks.net/blog/how-to-get-cash-app-refund/
The touch Id feature of the application is one of the most colossal features as it gives you the adaptability to embrace a trade. If you can't, by then you can use the help and help that is found in the customer brain and pick to talk to a Cash App representative or you can research to the particular help site for additional assistance.https://www.pcmonks.net/blog/how-to-cancel-cash-app-payment/
Yes, no problem
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.verges(a)croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx
Am Do., 6. Aug. 2020 um 12:13 Uhr schrieb Marc Roos <
M.Roos(a)f1-outsourcing.eu>:
>
>
> I can just add 4Kn drives to my existing setup not? Since this
> technology is only specific to how the osd daemon is talking to the
> disk?
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
Hi,
Any idea what is going on if the "bluestore_cache_autotune" true, "bluestore_cache_size" 0 but the server has only nvme?
Because if the bluestore_cache_size 0 it means it will pick ssd or hdd, but if no ssd and hdd, then what will be going on if autotune true?
How I should size this number? Any help?
Also is it good to use numa or not? On the internet different information is going around this topic.
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.