On 25/05/2023 01.40, 胡 玮文 wrote:
Hi Hector,
Not related to fragmentation. But I see you mentioned CephFS, and your OSDs are at high
utilization. Is your pool NEAR FULL? CephFS write performance is severely degraded if the
pool is NEAR FULL. Buffered write will be disabled, and every single write() system call
needs to wait for reply from OSD.
If this is the case, use “ceph osd set-nearfull-ratio” to get normal performance.
I learned about this after the issue; they did become nearfull at one
point and I changed the threshold, but I don't think this explains the
behavior I was seeing because I was trying to do bulk writes (which
should use very large write sizes even without buffering). What happened
was usually a single OSD would immediately go to 100% utilization, but
not the rest, which is what I'd expect if that one OSD was the one with
the most fragmented free space ending up pathologically slowing down writes.
- Hector