CephFS ghost usage/inodes

List overview All Threads
Download

newer

older

Radosgw/Objecter behaviour for...

Re: CephFS with cache-tier...

Florian Pritz

14 Jan 2020 14 Jan '20

2:15 a.m.

Hi, When we tried putting some load on our test cephfs setup by restoring a backup in artifactory, we eventually ran out of space (around 95% used in `df` = 3.5TB) which caused artifactory to abort the restore and clean up. However, while a simple `find` no longer shows the files, `df` still claims that we have around 2.1TB of data on the cephfs. `df -i` also shows 2.4M used inodes. When using `du -sh` on a top-level mountpoint, I get 31G used, which is data that is still really here and which is expected to be here. Consequently, we also get the following warning:

...

MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (38711) is more than 231.802 times cluster average (167)

We are running ceph 14.2.5. We have snapshots enabled on cephfs, but there are currently no active snapshots listed by `ceph daemon mds.$hostname dump snaps --server` (see below). I can't say for sure if we created snapshots during the backup restore.

...

{ "last_snap": 39, "last_created": 38, "last_destroyed": 39, "pending_noop": [], "snaps": [], "need_to_purge": {}, "pending_update": [], "pending_destroy": [] }

We only have a single CephFS. We use the pool_namespace xattr for our various directory trees on the cephfs. `ceph df` shows:

...

POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 24.97 3.1 TiB

`ceph daemon mds.$hostname perf dump | grep stray` shows:

...

"num_strays": 0, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 5097138, "strays_enqueued": 5097138, "strays_reintegrated": 0, "strays_migrated": 0,

`rados -p cephfs_data df` shows:

...

POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR cephfs_data 2.1 TiB 2477540 0 4955080 0 0 0 10699626 6.9 TiB 86911076 35 TiB 0 B 0 B total_objects 29718 total_used 329 GiB total_avail 7.5 TiB total_space 7.8 TiB

When I combine the usage and the free space shown by `df` we would exceed our cluster size. Our test cluster currently has 7.8TB total space with a replication size of 2 for all pools. With 2.1TB "used" on the cephfs according to `df` + 3.1TB being shows as "free" I get 5.2TB total size. This would mean >10TB of data when accounted for replication. Clearly this can't fit on a cluster with only 7.8TB of capacity. Do you have any ideas why we see so many objects and so much reported usage? Is there any way to fix this without recreating the cephfs? Florian -- Florian Pritz Research Industrial Systems Engineering (RISE) Forschungs-, Entwicklungs- und Großprojektberatung GmbH Concorde Business Park F 2320 Schwechat Austria E-Mail: florian.pritz(a)rise-world.com Web: www.rise-world.com Firmenbuch: FN 280353i Landesgericht Korneuburg UID: ATU62886416

Attachments:

signature.asc (application/pgp-signature — 833 bytes)

Show replies by thread

Patrick Donnelly

14 Jan 14 Jan

8:32 a.m.

On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz <florian.pritz(a)rise-world.com> wrote:

...

`ceph daemon mds.$hostname perf dump | grep stray` shows: > "num_strays": 0, > "num_strays_delayed": 0, > "num_strays_enqueuing": 0, > "strays_created": 5097138, > "strays_enqueued": 5097138, > "strays_reintegrated": 0, > "strays_migrated": 0,

Can you also paste the purge queue ("pq") perf dump? It's possible the MDS has hit an ENOSPC condition that caused the MDS to go read-only. This would prevent the MDS PurgeQueue from cleaning up. Do you see a health warning that the MDS is in this state? Is so, please try restarting the MDS. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Oskar Malnowicz

9:29 a.m.

Hello Patrick, "purge_queue": { "pq_executing_ops": 0, "pq_executing": 0, "pq_executed": 5097138 }, We already restarted the MDS daemons, but no change. There are no other health warnings than that one what Florian already mentioned. cheers Oskar Am 14.01.20 um 17:32 schrieb Patrick Donnelly:

...

On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz <florian.pritz(a)rise-world.com> wrote:

Patrick Donnelly

9:48 a.m.

Please try flushing the journal: ceph daemon mds.foo flush journal The problem may be caused by this bug: https://tracker.ceph.com/issues/43598 As for what to do next, you would likely need to recover the deleted inodes from the data pool so you can retry deleting the files: https://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#recover… On Tue, Jan 14, 2020 at 9:30 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

...

On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz <florian.pritz(a)rise-world.com> wrote:

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

-- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Oskar Malnowicz

11:24 a.m.

$ ceph daemon mds.who flush journal { "message": "", "return_code": 0 } $ cephfs-table-tool 0 reset session { "0": { "data": {}, "result": 0 } } $ cephfs-table-tool 0 reset snap { "result": 0 } $ cephfs-table-tool 0 reset inode { "0": { "data": {}, "result": 0 } } $ cephfs-journal-tool --rank=cephfs_test1:0 journal reset old journal was 98282151365~92872 new journal start will be 98285125632 (2881395 bytes past old end) writing journal head writing EResetJournal entry done $ cephfs-data-scan init Inode 0x0x1 already exists, skipping create. Use --force-init to overwrite the existing object. Inode 0x0x100 already exists, skipping create. Use --force-init to overwrite the existing object. Should i run with --force-init flag ? Am 14.01.20 um 18:48 schrieb Patrick Donnelly:

...

On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz <florian.pritz(a)rise-world.com> wrote:

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

-- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Patrick Donnelly

11:32 a.m.

On Tue, Jan 14, 2020 at 11:24 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

...

No, that shouldn't be necessary. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Oskar Malnowicz

11:40 a.m.

i run this commands, but still the same problems $ cephfs-data-scan scan_extents cephfs_data $ cephfs-data-scan scan_inodes cephfs_data $ cephfs-data-scan scan_links 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap updating last_snap 1 -> 27 $ cephfs-data-scan cleanup cephfs_data do you have other ideas ? Am 14.01.20 um 20:32 schrieb Patrick Donnelly:

...

On Tue, Jan 14, 2020 at 11:24 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

No, that shouldn't be necessary.

Patrick Donnelly

11:44 a.m.

On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

...

i run this commands, but still the same problems

Which problems?

...

$ cephfs-data-scan scan_extents cephfs_data $ cephfs-data-scan scan_inodes cephfs_data $ cephfs-data-scan scan_links 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap updating last_snap 1 -> 27 $ cephfs-data-scan cleanup cephfs_data do you have other ideas ?

After you complete this, you should see the deleted files in your file system tree (if this is indeed the issue). What's the output of `du -hc`? -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Oskar Malnowicz

11:58 a.m.

as florian already wrote, `du -hc` shows a total usage of 31G, but `ceph df` show us an usage of 2.1 </ mds># du -hs 31G # ceph df cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 25.00 3.1 TiB Am 14.01.20 um 20:44 schrieb Patrick Donnelly:

...

On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

i run this commands, but still the same problems

Which problems?

After you complete this, you should see the deleted files in your file system tree (if this is indeed the issue). What's the output of `du -hc`?

Patrick Donnelly

12:06 p.m.

...

On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

i run this commands, but still the same problems

Which problems?

After you complete this, you should see the deleted files in your file system tree (if this is indeed the issue). What's the output of `du -hc`?

-- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Oskar Malnowicz

12:19 p.m.

this was the new state. the results are equal to florians $ time cephfs-data-scan scan_extents cephfs_data cephfs-data-scan scan_extents cephfs_data 1.86s user 1.47s system 21% cpu 15.397 total $ time cephfs-data-scan scan_inodes cephfs_data cephfs-data-scan scan_inodes cephfs_data 2.76s user 2.05s system 26% cpu 17.912 total $ time cephfs-data-scan scan_links cephfs-data-scan scan_links 0.13s user 0.11s system 31% cpu 0.747 total $ time cephfs-data-scan scan_links cephfs-data-scan scan_links 0.13s user 0.12s system 33% cpu 0.735 total $ time cephfs-data-scan cleanup cephfs_data cephfs-data-scan cleanup cephfs_data 1.64s user 1.37s system 12% cpu 23.922 total mds / $ du -sh 31G $ df -h ip1,ip2,ip3:/ 5.2T 2.1T 3.1T 41% /storage/cephfs_test1 $ ceph df detail RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 7.8 TiB 7.5 TiB 312 GiB 329 GiB 4.14 TOTAL 7.8 TiB 7.5 TiB 312 GiB 329 GiB 4.14 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 25.00 3.1 TiB N/A N/A 2.48M 0 B 0 B cephfs_metadata 7 7.3 MiB 379 7.3 MiB 0 3.1 TiB N/A N/A 379 0 B 0 B Am 14.01.20 um 21:06 schrieb Patrick Donnelly:

...

On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote:

i run this commands, but still the same problems

Which problems?

After you complete this, you should see the deleted files in your file system tree (if this is indeed the issue). What's the output of `du -hc`?

Oskar Malnowicz

1:36 p.m.

i just restartet the mds daemons and now they crash during the boot. -36> 2020-01-14 22:33:17.880 7fc9bbeaa700 2 mds.0.13470 Booting: 0: opening inotable -35> 2020-01-14 22:33:17.880 7fc9bbeaa700 2 mds.0.13470 Booting: 0: opening sessionmap -34> 2020-01-14 22:33:17.880 7fc9bbeaa700 2 mds.0.13470 Booting: 0: opening mds log -33> 2020-01-14 22:33:17.880 7fc9bbeaa700 5 mds.0.log open discovering log bounds -32> 2020-01-14 22:33:17.880 7fc9bbeaa700 2 mds.0.13470 Booting: 0: opening purge queue (async) -31> 2020-01-14 22:33:17.880 7fc9bbeaa700 4 mds.0.purge_queue open: opening -30> 2020-01-14 22:33:17.880 7fc9bbeaa700 1 mds.0.journaler.pq(ro) recover start -29> 2020-01-14 22:33:17.880 7fc9bb6a9700 4 mds.0.journalpointer Reading journal pointer '400.00000000' -28> 2020-01-14 22:33:17.880 7fc9bbeaa700 1 mds.0.journaler.pq(ro) read_head -27> 2020-01-14 22:33:17.880 7fc9bbeaa700 2 mds.0.13470 Booting: 0: loading open file table (async) -26> 2020-01-14 22:33:17.880 7fc9c58a5700 10 monclient: get_auth_request con 0x55aa83436d80 auth_method 0 -25> 2020-01-14 22:33:17.880 7fc9bbeaa700 2 mds.0.13470 Booting: 0: opening snap table -24> 2020-01-14 22:33:17.884 7fc9c58a5700 10 monclient: get_auth_request con 0x55aa83437680 auth_method 0 -23> 2020-01-14 22:33:17.884 7fc9c60a6700 10 monclient: get_auth_request con 0x55aa83437200 auth_method 0 -22> 2020-01-14 22:33:17.884 7fc9bceac700 1 mds.0.journaler.pq(ro) _finish_read_head loghead(trim 805306368, expire 807199928, write 807199928, stream_format 1). probing for end of log (from 807199928)... -21> 2020-01-14 22:33:17.884 7fc9bceac700 1 mds.0.journaler.pq(ro) probing for end of the log -20> 2020-01-14 22:33:17.884 7fc9bb6a9700 1 mds.0.journaler.mdlog(ro) recover start -19> 2020-01-14 22:33:17.884 7fc9bb6a9700 1 mds.0.journaler.mdlog(ro) read_head -18> 2020-01-14 22:33:17.884 7fc9bb6a9700 4 mds.0.log Waiting for journal 0x200 to recover... -17> 2020-01-14 22:33:17.884 7fc9c68a7700 10 monclient: get_auth_request con 0x55aa83437f80 auth_method 0 -16> 2020-01-14 22:33:17.884 7fc9c60a6700 10 monclient: get_auth_request con 0x55aa83438400 auth_method 0 -15> 2020-01-14 22:33:17.892 7fc9bbeaa700 1 mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 98280931328, expire 98282151365, write 98282247624, stream_format 1). probing for end of log (from 98282247624)... -14> 2020-01-14 22:33:17.892 7fc9bbeaa700 1 mds.0.journaler.mdlog(ro) probing for end of the log -13> 2020-01-14 22:33:17.892 7fc9bceac700 1 mds.0.journaler.pq(ro) _finish_probe_end write_pos = 807199928 (header had 807199928). recovered. -12> 2020-01-14 22:33:17.892 7fc9bceac700 4 mds.0.purge_queue operator(): open complete -11> 2020-01-14 22:33:17.892 7fc9bceac700 1 mds.0.journaler.pq(ro) set_writeable -10> 2020-01-14 22:33:17.892 7fc9bbeaa700 1 mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 98283021535 (header had 98282247624). recovered. -9> 2020-01-14 22:33:17.892 7fc9bb6a9700 4 mds.0.log Journal 0x200 recovered. -8> 2020-01-14 22:33:17.892 7fc9bb6a9700 4 mds.0.log Recovered journal 0x200 in format 1 -7> 2020-01-14 22:33:17.892 7fc9bb6a9700 2 mds.0.13470 Booting: 1: loading/discovering base inodes -6> 2020-01-14 22:33:17.892 7fc9bb6a9700 0 mds.0.cache creating system inode with ino:0x100 -5> 2020-01-14 22:33:17.892 7fc9bb6a9700 0 mds.0.cache creating system inode with ino:0x1 -4> 2020-01-14 22:33:17.896 7fc9bbeaa700 2 mds.0.13470 Booting: 2: replaying mds log -3> 2020-01-14 22:33:17.896 7fc9bbeaa700 2 mds.0.13470 Booting: 2: waiting for purge queue recovered -2> 2020-01-14 22:33:17.908 7fc9ba6a7700 -1 log_channel(cluster) log [ERR] : ESession.replay sessionmap v 7561128 - 1 > table 0 -1> 2020-01-14 22:33:17.912 7fc9ba6a7700 -1 /build/ceph-14.2.5/src/mds/journal.cc: In function 'virtual void ESession::replay(MDSRank*)' thread 7fc9ba6a7700 time 2020-01-14 22:33:17.912135 /build/ceph-14.2.5/src/mds/journal.cc: 1655: FAILED ceph_assert(g_conf()->mds_wipe_sessions) Am 14.01.20 um 21:19 schrieb Oskar Malnowicz:

...

I'm asking that you get the new state of the file system tree after recovering from the data pool. Florian wrote that before I asked you to do this... How long did it take to run the cephfs-data-scan commands? On Tue, Jan 14, 2020 at 11:58 AM Oskar Malnowicz <oskar.malnowicz(a)rise-world.com> wrote: > as florian already wrote, `du -hc` shows a total usage of 31G, but `ceph > df` show us an usage of 2.1 > > </ mds># du -hs > 31G > > # ceph df > cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 25.00 3.1 TiB > > Am 14.01.20 um 20:44 schrieb Patrick Donnelly: >> On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz >> <oskar.malnowicz(a)rise-world.com> wrote: >>> i run this commands, but still the same problems >> Which problems? >> >>> $ cephfs-data-scan scan_extents cephfs_data >>> >>> $ cephfs-data-scan scan_inodes cephfs_data >>> >>> $ cephfs-data-scan scan_links >>> 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap updating last_snap 1 >>> -> 27 >>> >>> $ cephfs-data-scan cleanup cephfs_data >>> >>> do you have other ideas ? >> After you complete this, you should see the deleted files in your file >> system tree (if this is indeed the issue). What's the output of `du >> -hc`? >>

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Oskar Malnowicz

2 p.m.

i executed the commands from above again ("Recovery from missing metadata objects") and now the mds daemons start. still the same situation like before :( Am 14.01.20 um 22:36 schrieb Oskar Malnowicz:

...

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Oskar Malnowicz

15 Jan 15 Jan

6:50 a.m.

the situation is: health: HEALTH_WARN 1 pools have many more objects per pg than average $ ceph health detail MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (315399) is more than 1227.23 times cluster average (257) $ ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 7.8 TiB 7.4 TiB 326 GiB 343 GiB 4.30 TOTAL 7.8 TiB 7.4 TiB 326 GiB 343 GiB 4.30 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 2.2 TiB 2.52M 2.2 TiB 26.44 3.0 TiB cephfs_metadata 7 9.7 MiB 379 9.7 MiB 0 3.0 TiB the stored value of the "cephfs_data" pool is 2.2TiB. This must be wrong. When i execute "du -sh" from the MDS root "/" i get an usage: $ du -sh 31G . "df -h" shows: $ df -h Filesystem Size Used Avail Use% Mounted on ip1,ip2,ip3:/ 5.2T 2.2T 3.0T 43% /storage/cephfs It says that "Used" ist 2.2T but "du" shows 31G the pg_num from the "cephfs_data" pool is now 8. Autoscale suggest me to set this parameter to 512 $ ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE cephfs_metadata 9994k 2.0 7959G 0.0000 1.0 8 off cephfs_data 2221G 2.0 7959G 0.5582 1.0 8 512 off after setting pg_num to 512 the situation is: $ ceph health detail HEALTH_WARN 1 pools have many more objects per pg than average MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (4928) is more than 100.571 times cluster average (49) $ ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 7.8 TiB 7.4 TiB 329 GiB 346 GiB 4.34 TOTAL 7.8 TiB 7.4 TiB 329 GiB 346 GiB 4.34 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 30 GiB 2.52M 61 GiB 0.99 3.0 TiB cephfs_metadata 7 9.8 MiB 379 20 MiB 0 3.0 TiB The "stored" value changed from 2.2TiB to 30GiB !!! This should be the correct usage/size. When i execute "du -sh" from the MDS root "/" i get again an usage: $ du -sh 31G and "df -h" shows again $ df -h Filesystem Size Used Avail Use% Mounted on ip1,ip2,ip3:/ 5.2T 2.2T 3.0T 43% /storage/cephfs It says that "Used" ist 2.2T but "du" shows 31G Can anybody explain me whats the problem ? Am 14.01.20 um 11:15 schrieb Florian Pritz:

...

MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (38711) is more than 231.802 times cluster average (167)

{ "last_snap": 39, "last_created": 38, "last_destroyed": 39, "pending_noop": [], "snaps": [], "need_to_purge": {}, "pending_update": [], "pending_destroy": [] }

We only have a single CephFS. We use the pool_namespace xattr for our various directory trees on the cephfs. `ceph df` shows:

POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 24.97 3.1 TiB

`ceph daemon mds.$hostname perf dump | grep stray` shows:

"num_strays": 0, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 5097138, "strays_enqueued": 5097138, "strays_reintegrated": 0, "strays_migrated": 0,

`rados -p cephfs_data df` shows:

When I combine the usage and the free space shown by `df` we would exceed our cluster size. Our test cluster currently has 7.8TB total space with a replication size of 2 for all pools. With 2.1TB "used" on the cephfs according to `df` + 3.1TB being shows as "free" I get 5.2TB total size. This would mean >10TB of data when accounted for replication. Clearly this can't fit on a cluster with only 7.8TB of capacity. Do you have any ideas why we see so many objects and so much reported usage? Is there any way to fix this without recreating the cephfs? Florian _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Oskar Malnowicz

8:07 a.m.

i think there is something wrong with the cephfs_data pool. i created a new pool "cephfs_data2" and copied data from the "cephfs_data" to the "cephfs_data2" pool by using this command: $ rados cppool cephfs_data cephfs_data2 $ ceph df detail RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 7.8 TiB 7.4 TiB 390 GiB 407 GiB 5.11 TOTAL 7.8 TiB 7.4 TiB 390 GiB 407 GiB 5.11 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR cephfs_data 6 30 GiB 2.52M 61 GiB 1.02 2.9 TiB N/A N/A 2.52M 0 B 0 B cephfs_data2 20 30 GiB 11.06k 61 GiB 1.02 2.9 TiB N/A N/A 11.06k 0 B 0 B cephfs_metadata 7 9.8 MiB 379 20 MiB 0 2.9 TiB N/A N/A 379 0 B 0 B in the new pool the stored amount is also 30GiB, but object count and dirty count are significant smaller. i think that in the "cephfs_data" pool are something like "orpahned" objects. but how can i cleanup that pool ? Am 14.01.20 um 11:15 schrieb Florian Pritz:

...

MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (38711) is more than 231.802 times cluster average (167)

{ "last_snap": 39, "last_created": 38, "last_destroyed": 39, "pending_noop": [], "snaps": [], "need_to_purge": {}, "pending_update": [], "pending_destroy": [] }

We only have a single CephFS. We use the pool_namespace xattr for our various directory trees on the cephfs. `ceph df` shows:

POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 24.97 3.1 TiB

`ceph daemon mds.$hostname perf dump | grep stray` shows:

"num_strays": 0, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 5097138, "strays_enqueued": 5097138, "strays_reintegrated": 0, "strays_migrated": 0,

`rados -p cephfs_data df` shows:

When I combine the usage and the free space shown by `df` we would exceed our cluster size. Our test cluster currently has 7.8TB total space with a replication size of 2 for all pools. With 2.1TB "used" on the cephfs according to `df` + 3.1TB being shows as "free" I get 5.2TB total size. This would mean >10TB of data when accounted for replication. Clearly this can't fit on a cluster with only 7.8TB of capacity. Do you have any ideas why we see so many objects and so much reported usage? Is there any way to fix this without recreating the cephfs? Florian _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Oskar Malnowicz

23 Jan 23 Jan

7:08 a.m.

Any other ideas ?

...

Am 15.01.2020 um 15:50 schrieb Oskar Malnowicz <oskar.malnowicz(a)rise-world.com>om>: the situation is: health: HEALTH_WARN 1 pools have many more objects per pg than average $ ceph health detail MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (315399) is more than 1227.23 times cluster average (257) $ ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 7.8 TiB 7.4 TiB 326 GiB 343 GiB 4.30 TOTAL 7.8 TiB 7.4 TiB 326 GiB 343 GiB 4.30 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 2.2 TiB 2.52M 2.2 TiB 26.44 3.0 TiB cephfs_metadata 7 9.7 MiB 379 9.7 MiB 0 3.0 TiB the stored value of the "cephfs_data" pool is 2.2TiB. This must be wrong. When i execute "du -sh" from the MDS root "/" i get an usage: $ du -sh 31G . "df -h" shows: $ df -h Filesystem Size Used Avail Use% Mounted on ip1,ip2,ip3:/ 5.2T 2.2T 3.0T 43% /storage/cephfs It says that "Used" ist 2.2T but "du" shows 31G the pg_num from the "cephfs_data" pool is now 8. Autoscale suggest me to set this parameter to 512 $ ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE cephfs_metadata 9994k 2.0 7959G 0.0000 1.0 8 off cephfs_data 2221G 2.0 7959G 0.5582 1.0 8 512 off after setting pg_num to 512 the situation is: $ ceph health detail HEALTH_WARN 1 pools have many more objects per pg than average MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (4928) is more than 100.571 times cluster average (49) $ ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 7.8 TiB 7.4 TiB 329 GiB 346 GiB 4.34 TOTAL 7.8 TiB 7.4 TiB 329 GiB 346 GiB 4.34 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 30 GiB 2.52M 61 GiB 0.99 3.0 TiB cephfs_metadata 7 9.8 MiB 379 20 MiB 0 3.0 TiB The "stored" value changed from 2.2TiB to 30GiB !!! This should be the correct usage/size. When i execute "du -sh" from the MDS root "/" i get again an usage: $ du -sh 31G and "df -h" shows again $ df -h Filesystem Size Used Avail Use% Mounted on ip1,ip2,ip3:/ 5.2T 2.2T 3.0T 43% /storage/cephfs It says that "Used" ist 2.2T but "du" shows 31G Can anybody explain me whats the problem ? Am 14.01.20 um 11:15 schrieb Florian Pritz:

MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool cephfs_data objects per pg (38711) is more than 231.802 times cluster average (167)

{ "last_snap": 39, "last_created": 38, "last_destroyed": 39, "pending_noop": [], "snaps": [], "need_to_purge": {}, "pending_update": [], "pending_destroy": [] }

We only have a single CephFS. We use the pool_namespace xattr for our various directory trees on the cephfs. `ceph df` shows:

POOL ID STORED OBJECTS USED %USED MAX AVAIL cephfs_data 6 2.1 TiB 2.48M 2.1 TiB 24.97 3.1 TiB

`ceph daemon mds.$hostname perf dump | grep stray` shows:

"num_strays": 0, "num_strays_delayed": 0, "num_strays_enqueuing": 0, "strays_created": 5097138, "strays_enqueued": 5097138, "strays_reintegrated": 0, "strays_migrated": 0,

`rados -p cephfs_data df` shows:

When I combine the usage and the free space shown by `df` we would exceed our cluster size. Our test cluster currently has 7.8TB total space with a replication size of 2 for all pools. With 2.1TB "used" on the cephfs according to `df` + 3.1TB being shows as "free" I get 5.2TB total size. This would mean >10TB of data when accounted for replication. Clearly this can't fit on a cluster with only 7.8TB of capacity. Do you have any ideas why we see so many objects and so much reported usage? Is there any way to fix this without recreating the cephfs? Florian _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

1568

days inactive

1577

days old

ceph-users@ceph.io

Manage subscription

15 comments

3 participants

tags (0)

participants (3)

Florian Pritz
Oskar Malnowicz
Patrick Donnelly