Background: In nautilus, bluestore started maintaining usage stats on a
per-pool basis. BlueStore OSDs created before nautilus lack these stats.
Running a ceph-bluestore-tool repair can calculate the usage so that
the OSD can maintain and report them going forward.
There are two options:
- bluestore_warn_on_legacy_statfs (bool, default: true), which makes the
cluster issue a health warning when there are OSDs that have legacy stats.
- bluestore_no_per_pool_stats_tolerance (enum enforce, until_fsck,
until_repair, default: until_repair).
'until_fsck' will tolerate the legacy but fsck will fail
'until_repair' will tolerate the legacy but fsck will pass
'enforce' will tolerate the legacy but disable the warning
The octopus addition of per-pool omap usage tracking presents an identical
problem: a new tracking ability in bluestore that reqires a conversion to
enable after upgrade.
I think that we can simplify these settings and make them less confusing,
still with two options:
- bluestore_fsck_error_on_no_per_pool_omap (bool, default: false). During
fsck, we can either generate a 'warning' about non-per-pool omap, or an
error. Generate a warning by default, which means that the fsck return
code can indicate success.
- bluestore_warn_on_no_per_pool_omap (bool, default: true). At runtime, we
can generate a health warning if the OSD is using the legacy non-per-pool
omap.
The overall default behavior is the same as we have with the
legacy_statfs: OSDs still work, fsck passes, and we generate a health
warning.
Setting bluestore_warn_on_no_per_pool_omap=false is the same, AFAICS, as
setting bluestore_no_per_pool_stats_tolerance=enforce. (Except maybe
repair won't do the conversion? I don't see why we'd ever not want to
do the conversion, though.)
Setting bluestore_fsck_error_on_no_per_pool_omap=true is the same, AFAICS,
as bluestore_no_per_pool_stats_tolerance=until_fsck.
Overall, this seems simpler and easier for a user to understand.
Realistically, the only option I expect a user will ever change is
bluestore_warn_on_no_per_pool_omap=false to make the health warning go
away after an upgrade.
What do you think? Should I convert the legacy_statfs to behave the same
way?
sage