[ceph-users] Re: Wrong %USED and MAX AVAIL stats for pool

9 Nov 2019

I can confirm that this is a failure in ceph 14.2.4 dashboard - as i am seeing this also
when i check the free spare under "pools"

Am 8. Oktober 2019 07:54:58 MESZ schrieb "Yordan Yordanov (Innologica)"
&lt;Yordan.Yordanov(a)innologica.com&gt;om>:
...
 Hi Igor,

Thank you for responding. In this case this looks like a breaking
change. I know of two applications that are now incorrectly displaying
the pool usage and capacity, It looks like they both rely on the USED
field to be divided by the number of replicas. One of those application
is actually the Ceph Dashboard. The other is OpenNebula
https://docs.opennebula.org/5.6/deployment/open_cloud_storage_setup/ceph_ds….
See the screenshot from Ceph Dashboard - https://imgur.com/vFFxsti. It
is stating that we have used 88% of the available space, because it
wrongly assumes that the pool capacity is 47.7TB + 6.7TB = 54.4TB,
while it should be more like (47.7TB/3) + 6.7TB = 22.6TB. It's
absolutely the same story with our OpenNebula instance -
https://imgur.com/MOLbo4g. I'm not sure exactly which update broke
this, but it was definitely working correctly before.
I looked at OpenNebula's code for ceph datastore monitoring and found
that it's parsing the XML output of ceph df --format xml, so it looks
like this changed too.

From file: /var/lib/one/remotes/tm/ceph/monitor:
# ------------ Compute datastore usage -------------

MONITOR_SCRIPT=$(cat <<EOF
$CEPH df --format xml
EOF
)

MONITOR_DATA=$(ssh_monitor_and_log $HOST "$MONITOR_SCRIPT" 2>&1)
MONITOR_STATUS=$?

if [ "$MONITOR_STATUS" = "0" ]; then
    XPATH="${DRIVER_PATH}/../../datastore/xpath.rb --stdin"
    echo -e "$(rbd_df_monitor ${MONITOR_DATA} ${POOL_NAME})"
else
    echo "$MONITOR_DATA"
    exit $MONITOR_STATUS
fi

From file: /var/lib/one/remotes/datastore/ceph/ceph_utils.sh

#--------------------------------------------------------------------------------
# Parse the output of rbd df in xml format and generates a monitor
string for a
# Ceph pool. You **MUST** define XPATH util before using this function
#   @param $1 the xml output of the command
#   @param $2 the pool name
#--------------------------------------------------------------------------------
rbd_df_monitor() {

    local monitor_data i j xpath_elements pool_name bytes_used free

    monitor_data=$1
    pool_name=$2

    while IFS= read -r -d '' element; do
        xpath_elements[i++]="$element"
    done < <(echo $monitor_data | $XPATH \
        "/stats/pools/pool[name = \"${pool_name}\"]/stats/bytes_used"
\
          "/stats/pools/pool[name =
\"${pool_name}\"]/stats/max_avail")

    bytes_used="${xpath_elements[j++]:-0}"
    free="${xpath_elements[j++]:-0}"

    cat << EOF | tr -d '[:blank:][:space:]'
        USED_MB=$(($bytes_used / 1024**2))\n
        TOTAL_MB=$((($bytes_used + $free) / 1024**2))\n
        FREE_MB=$(($free / 1024**2))\n
EOF
}

I believe Ceph Dashboard is doing the same, because the results are the
same.

Best Regards,

On 7 Oct 2019, at 19:03, Igor Fedotov
<ifedotov@suse.de<mailto:ifedotov@suse.de>> wrote:

Hi Yordan,

this is mimic documentation and these snippets aren't valid for
Nautilus any more.  They are still present  in Nautilus pages though..

Going to create a corresponding ticket to fix that.

Relevant Nautilus changes for 'ceph df [detail]' command can be found
in Nautilus release notes:
https://docs.ceph.com/docs/nautilus/releases/nautilus/

In short - USED field accounts for all the overhead data including
replicas etc. It's STORED field which now represents pure data user put
into a pool.

Thanks,

Igor

On 10/2/2019 8:33 AM, Yordan Yordanov (Innologica) wrote:
The documentation states:
https://docs.ceph.com/docs/mimic/rados/operations/monitoring/

The POOLS section of the output provides a list of pools and the
notional usage of each pool. The output from this section DOES NOT
reflect replicas, clones or snapshots. For example, if you store an
object with 1MB of data, the notional usage will be 1MB, but the actual
usage may be 2MB or more depending on the number of replicas, clones
and snapshots.

However in our case we are clearly seeing the USAGE field multiplying
the total object sizes to the number of replicas.

[root@blackmirror ~]# ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL      USED       RAW USED     %RAW USED
    hdd       80 TiB     34 TiB     46 TiB       46 TiB         58.10
    TOTAL     80 TiB     34 TiB     46 TiB       46 TiB         58.10

POOLS:
POOL      ID     STORED      OBJECTS     USED        %USED     MAX
AVAIL
one        2      15 TiB       4.05M      46 TiB     68.32       7.2
TiB
bench      5     250 MiB          67     250 MiB         0        22
TiB

[root@blackmirror ~]# rbd du -p one
NAME           PROVISIONED USED
...
<TOTAL>             20 TiB  15 TiB

This is causing several apps (including ceph dashboard) to display
inaccurate percentages, because they calculate the total pool capacity
as USED + MAX AVAIL, which in this case yields 53.2TB, which is way
off. 7.2TB is about 13% of that, so we receive alarms and this is
bugging us for quite some time now.

_______________________________________________
ceph-users mailing list --
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to
ceph-users-leave@ceph.io<mailto:ceph-users-leave@ceph.io> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Wrong %USED and MAX AVAIL stats for pool