[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

11 Oct 2023

I need to ask here: where exactly do you observe the hundreds of GB written per day? Are
the mon logs huge? Is it the mon store? Is your cluster unhealthy?

We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and about 800 librbd
clients. Per week our mon logs are  about 70M, the cluster logs about 120M , the audit
logs about 70M and I see between 100-200Kb/s writes to the mon store. That's in the
lower-digit GB range per day. Hundreds of GB per day sound completely over the top on a
healthy cluster, unless you have MGR modules changing the OSD/cluster map continuously.

Is autoscaler running and doing stuff?
Is balancer running and doing stuff?
Is backfill going on?
Is recovery going on?
Is your ceph version affected by the "excessive logging to MON store" issue that
was present starting with pacific but should have been addressed by now?

@Eugen: Was there not an option to limit logging to the MON store?

For information to readers, we followed old recommendations from a Dell white paper for
building a ceph cluster and have a 1TB Raid10 array on 6x write intensive SSDs for the MON
stores. After 5 years we are below 10% wear. Average size of the MON store for a healthy
cluster is 500M-1G, but we have seen this ballooning to 100+GB in degraded conditions.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Zakhar Kirpichenko &lt;zakhar(a)gmail.com&gt;
Sent: Wednesday, October 11, 2023 12:00 PM
To: Eugen Block
Cc: ceph-users(a)ceph.io
Subject: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

Thank you, Eugen.

I'm interested specifically to find out whether the huge amount of data
written by monitors is expected. It is eating through the endurance of our
system drives, which were not specced for high DWPD/TBW, as this is not a
documented requirement, and monitors produce hundreds of gigabytes of
writes per day. I am looking for ways to reduce the amount of writes, if
possible.

/Z

On Wed, 11 Oct 2023 at 12:41, Eugen Block &lt;eblock(a)nde.ag&gt; wrote:

...
  Hi,

 what you report is the expected behaviour, at least I see the same on
 all clusters. I can't answer why the compaction is required that
 often, but you can control the log level of the rocksdb output:

 ceph config set mon debug_rocksdb 1/5 (default is 4/5)

 This reduces the log entries and you wouldn't see the manual
 compaction logs anymore. There are a couple more rocksdb options but I
 probably wouldn't change too much, only if you know what you're doing.
 Maybe Igor can comment if some other tuning makes sense here.

 Regards,
 Eugen

 Zitat von Zakhar Kirpichenko &lt;zakhar(a)gmail.com&gt;om>:

  Any input from anyone, please?

 On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko &lt;zakhar(a)gmail.com&gt;  wrote:

> Any input from anyone, please?
>
> It's another thing that seems to be rather poorly documented: it's 
unclear
 > what to expect, what 'normal'
behavior should be, and what can be done
> about the huge amount of writes by monitors.
>
> /Z
>
> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko &lt;zakhar(a)gmail.com&gt; 
wrote:
 >
>> Hi,
>>
>> Monitors in our 16.2.14 cluster appear to quite often run "manual
>> compaction" tasks:
>>
>> debug 2023-10-09T09:30:53.888+0000 7f48a329a700  4 rocksdb:  EVENT_LOG_v1
 >> {"time_micros":
1696843853892760, "job": 64225, "event": 
"flush_started",
 >> "num_memtables": 1,
"num_entries": 715, "num_deletes": 251,
>> "total_data_size": 3870352, "memory_usage": 3886744,
"flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:30:53.904+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700  4 rocksdb: (Original  Log
 >> Time 2023/10/09-09:30:53.910204) 
[db_impl/db_impl_compaction_flush.cc:2516]
 >> [default] Manual compaction from level-0
to level-5 from 'paxos ..  'paxos;
 >> will stop at (end)
>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700  4 rocksdb: (Original  Log
 >> Time 2023/10/09-09:30:53.911004) 
[db_impl/db_impl_compaction_flush.cc:2516]
 >> [default] Manual compaction from level-5
to level-6 from 'paxos ..  'paxos;
 >> will stop at (end)
>> debug 2023-10-09T09:32:08.956+0000 7f48a329a700  4 rocksdb:  EVENT_LOG_v1
 >> {"time_micros":
1696843928961390, "job": 64228, "event": 
"flush_started",
 >> "num_memtables": 1,
"num_entries": 1580, "num_deletes": 502,
>> "total_data_size": 8404605, "memory_usage": 8465840,
"flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:32:08.972+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700  4 rocksdb: (Original  Log
 >> Time 2023/10/09-09:32:08.977739) 
[db_impl/db_impl_compaction_flush.cc:2516]
 >> [default] Manual compaction from level-0
to level-5 from 'logm ..  'logm;
 >> will stop at (end)
>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700  4 rocksdb: (Original  Log
 >> Time 2023/10/09-09:32:08.978512) 
[db_impl/db_impl_compaction_flush.cc:2516]
 >> [default] Manual compaction from level-5
to level-6 from 'logm ..  'logm;
 >> will stop at (end)
>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.028+0000 7f48a329a700  4 rocksdb:  EVENT_LOG_v1
 >> {"time_micros":
1696844009033151, "job": 64231, "event": 
"flush_started",
 >> "num_memtables": 1,
"num_entries": 1430, "num_deletes": 251,
>> "total_data_size": 8975535, "memory_usage": 9035920,
"flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:33:29.044+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700  4 rocksdb: (Original  Log
 >> Time 2023/10/09-09:33:29.049585) 
[db_impl/db_impl_compaction_flush.cc:2516]
 >> [default] Manual compaction from level-0
to level-5 from 'paxos ..  'paxos;
 >> will stop at (end)
>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700  4 rocksdb: (Original  Log
 >> Time 2023/10/09-09:33:29.050355) 
[db_impl/db_impl_compaction_flush.cc:2516]
 >> [default] Manual compaction from level-5
to level-6 from 'paxos ..  'paxos;
 >> will stop at (end)
>>
>> I have removed a lot of interim log messages to save space.
>>
>> During each compaction the monitor process writes approximately 500-600
>> MB of data to disk over a short period of time. These writes add up to 
tens
 >> of gigabytes per hour and hundreds of
gigabytes per day.
>>
>> Monitor rocksdb and compaction options are default:
>>
>>     "mon_compact_on_bootstrap": "false",
>>     "mon_compact_on_start": "false",
>>     "mon_compact_on_trim": "true",
>>     "mon_rocksdb_options":
>> 
"write_buffer_size=33554432,compression=kNoCompression,level_compaction_dynamic_level_bytes=true",

 Is this expected behavior? Is this something I can adjust in order to
 extend the system storage life?

 Best regards,
 Zakhar

  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io 

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io
 _______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes